Safe use of unix signals with multiprocessing module in python
[ Dear reader, since you are interested in this blog post, I am assuming you are familiar with signal
module, which lets you catch unix signals from within your python script. I am also assuming you are an enthusiast parallel processing fan boy like me and try to make the best use of python's parallel processsing frame work - multiprocessing
in day to day programming. :) ]
The conflict
signal
is a great module! multiprocessing
is too. But together they behave aginst each other if you are not careful about their interaction. If you ever tried to use both of them in a single script, you know what I mean.
In a long running multiprocessing application, safe and clean exit of the child processes is a must. There are a lot of ways to achieve this. One common way is to send an Event
object to the child process. The child will occassionally check whether that exit event is set or not and then exit if necessary. Here's a demo script implementing this pattern, where the parent process feeds some data to a child worker process -
#!/usr/bin/env python3
import time
import queue
import signal
from multiprocessing import Process, Event, Queue
# The worker is intentionally too much lazy!
def lazy_ass_worker(exit_event, work_queue):
while not exit_event.is_set():
try: work = work_queue.get(timeout=1.0)
except queue.Empty: continue
print("I did job {} already! :)".format(work))
print("A small nap won't hurt anyone!")
time.sleep(1.0)
print("Doing cleanup before leaving ...")
exit_event = Event()
work_queue = Queue()
# Spawn the worker process.
cp= Process(target=lazy_ass_worker, args=(exit_event, work_queue),)
cp.start()
# Send some integers to the worker process.
for x in range(100):
work_queue.put(x)
# We wait for CTRL+C from the user.
try: signal.pause()
except KeyboardInterrupt:
# Since our worker is too delicate, we should notify it with the
# exit event and then wait for it's safe arraival / joining.
exit_event.set()
cp.join()
At the end of the script, I naively tried to catch KeyboardInterrupt
exception(which is generated by default as a response to SIGINT
signal) in the parent process and then tried to notify the child about exit condition so that it can do all the clean up before exiting. But if you actually run the above script, and press CTRL+C
during the execution, the line Doing cleanup before leaving ...
is never printed. Here's what happens in my computer -
oscar@notebook ~ % python3 demo.py
I did job 0 already! :)
A small nap won't hurt anyone!
I did job 1 already! :)
A small nap won't hurt anyone!
^CProcess Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.4/multiprocessing/process.py", line 254, in _bootstrap
self.run()
File "/usr/lib/python3.4/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "demo.py", line 16, in lazy_ass_worker
time.sleep(1.0)
KeyboardInterrupt
oscar@notebook ~ %
What just happened?
Signals are propagated down the process tree. - that is what has happened! Even if you catch a signal in the parent process, child processes still receive and handle that signal. This comes in conflict with the kind of pattern we used in the above demo, where synchronization primitives from multiprocessing
module is used for safe cleanup.
The workaround
Every child process spawned by the multiprocessing module inherits signal handlers from the parent process. If we set the signal handlers to SIG_IGN
for our target signals before spawning new processes, the child processes will ignore the signals. With this strategy, our demo needs some minor modifications -
# Save a reference to the original signal handler for SIGINT.
default_handler = signal.getsignal(signal.SIGINT)
# Set signal handling of SIGINT to ignore mode.
signal.signal(signal.SIGINT, signal.SIG_IGN)
exit_event = Event()
work_queue = Queue()
# Spawn the worker process.
cp= Process(target=lazy_ass_worker, args=(exit_event, work_queue),)
cp.start()
# Since we spawned all the necessary processes already,
# restore default signal handling for the parent process.
signal.signal(signal.SIGINT, default_handler)
In the above code, after all the necessary process spawnings are done, the default signal handler is restored. If you use custom signal handlers, they should be defined at this stage. Note that using some facilities of multiprocessing
module, such as Queue
, Manager
etc implicitly spawns additional processes. They should be be taken care of in a similar manner.
Caveats
Blocking important termination signals like SIGINT
, SIGTERM
etc. in the child process is problematic at early stages of development. Programming errors or runtime errors in the code can leave your development system dirty with lots of zombie processes. In that case, just kill them with a SIGKILL
, as it can not be ignored.