I have experienced a very weird issue that I just can't explain when dealing with printing to a file from multiple processes (started with the subprocess module). The behavior I am seeing is that some of my output is slightly truncated and some of it is just completely missing. I am using a slightly modified version of Alex Martelli's solution for thread safe printing found here ¿Cómo obtengo una impresión segura para subprocesos en Python 2.6?. The main difference is in the write method. To guarantee that output is not interleaved between the multiple processes writing to the same file I buffer the output and only write when I see a newline.
import sys import threading tls = threading.local() class ThreadSafeFile(object): """ @author: Alex Martelli @see: https://stackoverflow.com/questions/3029816/how-do-i-get-a-thread-safe-print-in-python-2-6 @summary: Allows for safe printing of output of multi-threaded programs to stdout. """ def __init__(self, f): self.f = f self.lock = threading.RLock() self.nesting = 0 self.dataBuffer = "" def _getlock(self): self.lock.acquire() self.nesting += 1 def _droplock(self): nesting = self.nesting self.nesting = 0 for i in range(nesting): self.lock.release() def __getattr__(self, name): if name == 'softspace': return tls.softspace else: raise AttributeError(name) def __setattr__(self, name, value): if name == 'softspace': tls.softspace = value else: return object.__setattr__(self, name, value) def write(self, data): self._getlock() self.dataBuffer += data if data == '\n': self.f.write(self.dataBuffer) self.f.flush() self.dataBuffer = "" self._droplock() def flush(self): self.f.flush()
It should also be noted that to get this to behave abnormally it is going to require either a lot of time or a machine with multiple processors or cores. I ran the offending program in my test suite ~7000 times on a single processor machine before it reported a failure. This program that I've created to demonstrate the issue I've been experiencing in my test suite also seems to work on a single processor machine, but when you execute it on a multicore or multiprocessor machine it will certainly fail.
The following program shows the issue and it is somewhat more involved than I wanted it to be, but I wanted to preserve enough of the behavior of my programs as possible.
The code for process 1 main.py
import subprocess, sys, socket, time, random from threadSafeFile import ThreadSafeFile sys.stdout = ThreadSafeFile(sys.__stdout__) usage = "python main.py nprocs niters" workerFilename = "/path/to/worker.py" def startMaster(n, iters): host = socket.gethostname() for i in xrange(n): #set up ~synchronization between master and worker sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.bind((host,0)) sock.listen(1) socketPort = sock.getsockname() cmd = 'ssh %s python %s %s %d %d %d' % \ (host, workerFilename, host, socketPort, i, iters) proc = subprocess.Popen(cmd.split(), shell=False, stdout=None, stderr=None) conn, addr = sock.accept() #wait for worker process to start conn.recv(1024) for j in xrange(iters): #do very bursty i/o for k in xrange(iters): print "master: %d iter: %d message: %d" % (n,i, j) #sleep for some amount of time between .02s and .5s time.sleep(1 * (random.randint(1,50) / float(100))) #wait for worker to finish conn.recv(1024) sock.close() proc.kill() def main(nprocs, niters): startMaster(nprocs, niters) if __name__ == "__main__": if len(sys.argv) != 3: print usage sys.exit(1) nprocs = int(sys.argv) niters = int(sys.argv) main(nprocs, niters)
code for process 2 worker.py
import sys, socket,time, random, time from threadSafeFile import ThreadSafeFile usage = "python host port id iters" sys.stdout = ThreadSafeFile(sys.__stdout__) def main(host, port, n, iters): #tell master to start sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect((host, port)) sock.send("begin") for i in xrange(iters): #do bursty i/o for j in xrange(iters): print "worker: %d iter: %d message: %d" % (n,i, j) #sleep for some amount of time between .02s and .5s time.sleep(1 * (random.randint(1,50) / float(100))) #tell master we are done sock.send("done") sock.close() if __name__ == "__main__": if len(sys.argv) != 5: print usage sys.exit(1) host = sys.argv port = int(sys.argv) n = int(sys.argv) iters = int(sys.argv) main(host,port,n,iters)
When testing I ran main.py as follows:
python main.py 1 75 > main.out
The resulting file should be of length 75*75*2 = 11250 lines of the format:
(master|worker): %d iter: %d message: %d
Most of the time it is short 20-30 lines, but I have seen on occasion the program having the appropriate number of lines. After further investigation of the rare successes some of the lines are being truncated with something like:
ter: %d message: %d
Another interesting aspect to this is that when starting the ssh process using multiprocessing instead of subprocess this program behaves as intended. Some may just say why bother using subprocess when multiprocessing works fine. Unfortunately, it is the academic in me that really wants to know why this is behaving abnormally. Any thoughts and/or insights would be very appreciated. Thanks.
***edit Ben I understand that threadSafeFile uses different locks per process, but I need it in my larger project for 2 reasons.
1) Each process may have multiple threads that will be writing to stdout even though this example does not. So I need to guarantee both safety at the thread level and at the process level.
2) If I don't make sure that when stdout gets flushed that there is a '\n' at the end of the buffer then there is going to be some potential execution trace where process 1 writes its buffer to a file without a trailing '\n' and then process 2 comes in and writes its buffer. Now we have lines interleaving and that's not what I want.
I also understand that this mechanism makes things a bit restrictive for what can be printed. Right now, in my stage of development of this project, restrictiveness is ok. When I can guarantee correctness I can start to relax the restrictions.
Your comment about locking inside of the conditional check if data == '\n' is incorrect. If the lock goes inside the conditional check then threadSafeFile is no longer thread safe in the general case. If any thread can add to the data buffer then there will be a race condition at dataBuffer += data as this is not an atomic operation. Perhaps your comment is simply related to this example in which we only have 1 thread per process, but if that's the case then we don't even need a lock at all.
In regards to OS level locks, my understanding was that multiple programs were able to safely write to the same file on a unix platform iff the number of bytes being written was smaller than the size of the internal buffer. Shouldn't the OS take care of all of the necessary locking for me in this case?
preguntado el 27 de agosto de 11 a las 22:08
In each process you create a
sys.stdout, each of which has a lock, but they're una experiencia diferente locks; there's nothing connecting the locks used in all the different processes. So you're getting the same effect as if you used no locks at all; no process is ever going to be blocked by a lock held in another process, since they all have their own.
The only reason this works when run on a single processor machine is the buffering you do to queue up writes until a newline is encountered. This means that each line of output is written all in one go. On a uniprocessor, it's not unlikely that the OS will decide to switch processes in the middle of a bunch of successive calls to
write, which would trash your data. But if the output is all written in chunks of a single line and you don't care about the order in which lines end up in the file, then it's very very unlikely for a context switch to happen in the middle of an operation you care about. Not theoretically impossible though, so I wouldn't call this code correct even for a uniprocessor.
ThreadSafeFile is very specifically only hilo safe. It relies on the fact that the program only has a soltero
ThreadSafeFile object for each file it's writing to. So any writes to that file are going to be going through that single object, synchronizing upon the lock.
When you have multiple processes, you don't have the shared global memory that threads in a single process do. So each process necessarily has its own separate
ThreadSafeFile(sys.stdout) object. This is exactly the same mistake as if you had used threads and spawned N threads, each of which created its own
I have no idea how this works when you use multiprocessing, because you haven't posted the code you used to do that. But my understanding is that this would still fail, for all the same reasons, if you used multiprocessing in such a way that each process created its own fresh
ThreadSafeFile. Maybe you're not doing that in the version that uses multiprocessing?
What you need to do is arrange for the synchronization object (the lock) to be connected somehow. The multiprocessing module can do this for you. Note in el ejemplo aquí how the lock is created once and then passed in to each new process as it is created. (This still results in 10 different lock objects in 10 different processes of course, but what Python must be doing behind the scenes is creating an OS-level lock and then making each of the copied Python-level lock objects refer to the single OS-level lock).
If you want to do this with subprocessing, where you're just starting totally independent worker commands from separate scripts, then you'll need some way to get them all talking to a single OS-level lock. I don't know of anything in the standard library that helps you do that. I would just use multiprocessing.
As another thought, your buffering and locking code looks a little suspicious too. What happens if something calls
sys.stdout.write("foo\n")? I'm not certain, but at a guess this is only working because the implementation of
sys.stdout.write on whatever you're printing, then call it again with a single newline. There is absolutely no reason it has to do this! It could just as easily assemble a single string of output in memory and then only call
sys.stdout.write once. Plus, what happens if you need to print a block of multiple lines that need to go together in the output?
Another problem is that you acquire the lock the first time a process writes to the buffer, continue to hold it as the buffer is filled, then write the line, and finally release the lock. If your lock actually worked and a process took a long time between starting a line and finishing it it would block all other processes from even buffering up their writes! Maybe that's sort of what you want, if the intention that when a process starts writing something it gets a guarantee that its output will hit the file next. But in that case, you don't even need the buffering at all. I think you should be acquiring the lock just after
if data == '\n':, and then you wouldn't need all that code tracking the nesting level either.