Friday, October 12, 2007

Broken locks

In one of my recent projects I came across this scenario:
On one *nix machine there are two processes: a non-Java
process and a Java application. The (external) process writes several text files to "well-known" locations (typically every file is put in its own directory). Also the file names are known in advance to a certain degree (there can be a name pattern specified instead of complete name).
The files are written on a regular basis (daily or hourly) but the exact timing is not known.
The Java application has to:
  • continually poll these directories and look for any new file(s) using the provided name or file pattern;
  • as soon as it detects a new file, reads and "process" it (the exact nature of process is unimportant);
  • after the file processing is done, move the file to another directory -- mainly for archival purposes.

A number of questions appeared:

  1. What exactly does it mean "new file [version]"?
  2. How do we know when the external process finished writing the input file? Suppose external process starts writing the file. Should the Java process wait for the writer process to finish before it starts reading? If yes, for how long? [perhaps one should ask: is required that reader process has to wait for writer to finish?]
  3. How do we prevent external process to start a new write of the input file while our Java process is in the middle of reading it?
  4. How do we make sure the Java process won't move the file (once it's done with it) while the writer process writes a new version of the file?

Question 1 is a matter of comparing file creation dates; it should be enough to compare the creation/modification date with a previous value. If the date is greater then we have a new file.

The question 2 was answered by comparing two successive takes of the file length; if they match then one can assume that writing ended. The time interval between those two measurements was established empirically by running the program on various input files. If there are still detected differences between the two successive length values the Java process (thread) sleeps and then tries again.

To answer questions 3 and 4 successfully some sort of access protocol must be established between the involved parties, because the file moving requirement makes the Java process a "writer" too. Therefore the processes must synchronize on the shared resource. This -I think- is a typical problem that IPC mechanisms ([FIFO]pipes, shared memory, sockets) solve.

Unfortunately none of the above IPC mechanisms was adopted (possibly because of lack of time or knowledge). Instead the team chose file locks as they would prevent -they thought- input file(s) being clobbered by multiple processes. Moreover, because nobody knew at that time what would be the writer process(es), only the Java program was considered. So, the Java program had to do the following:

  1. Tries to obtain exclusive access (exclusive file lock) to the input file;
  2. IF step 1 succeeds, reads the file and do something with the data (unimportant for this discussion) ELSE try later (sleeps a certain amount of time) ;
  3. AFTER step 2 completes, tries to move the input file to a back-up directory.
  4. Relinquish the exclusive lock on file, allowing other process(es) to write to it.
[IMPORTANT ASSUMPTION: no other process can write to the file while the Java process is holding an exclusive lock]

The above scheme was implemented and tested on a Windows machine. The tests were successful; everything seemed to work well. BUT as soon as the program was deployed on the *nix machine, weird things started to happen. The "writer" process happily wrote to the file while the "reader" read it - even though the file lock was held by the Java process!

Reality check: The assumption does not hold on *nix machines!

Read what FileLock javadocs has to say:
"This file-locking API is intended to map directly to the native locking facility of the underlying operating system. Thus the locks held on a file should be visible to all programs that have access to the file, regardless of the language in which those programs are written.

Whether or not a lock actually prevents another program from accessing the content of the locked region is system-dependent and therefore unspecified. The native file-locking facilities of some systems are merely advisory, meaning that programs must cooperatively observe a known locking protocol in order to guarantee data integrity [my emphasis - this is the case (of some?) of Unix platforms]. On other systems native file locks are mandatory, meaning that if one program locks a region of a file then other programs are actually prevented from accessing that region in a way that would violate the lock. On yet other systems, whether native file locks are advisory or mandatory is configurable on a per-file basis. To ensure consistent and correct behavior across platforms, it is strongly recommended that the locks provided by this API be used as if they were advisory locks.

On some systems, acquiring a mandatory lock on a region of a file prevents that region from being mapped into memory, and vice versa. Programs that combine locking and mapping should be prepared for this combination to fail.

On some systems, closing a channel releases all locks held by the Java virtual machine on the underlying file regardless of whether the locks were acquired via that channel or via another channel open on the same file. It is strongly recommended that, within a program, a unique channel be used to acquire all locks on any given file."

I wrote a small Java program that uses file locks and tested it on an AIX machine. The program reads a text file line by line and then echoes each line to the console. To make things more interesting (to be read: to allow the tester to alter the file using an editor), I added a delay of 1.5 seconds between every two reads. Here it is:



import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.RandomAccessFile;
import java.nio.channels.Channels;
import java.nio.channels.FileChannel;
import java.nio.channels.FileLock;
/**
*
* This class ``slow reads'' a file (whose path was given as argument) and echoes it to console.
* Demonstrates that on a UNIX system the exclusive file locking semantics cannot be preserved.
* Use it with a reasonably medium-size file (~8000 chars).
* Run it on a UNIX system with the command
java -cp . SlowReader testinput.txt

* While the program runs, open a navigator/editor and alter the file (towards the end).
* (You should be able to save the changes performed in the editor)
* You will see that the program's output changes accordingly.
* @author Val
*/
public class SlowReader {

private static final int TIME_TO_WAIT_BEFORE_WRITE = 1500;
private static final int BUFFER_SIZE = 1024;

/**
* @param args
* @throws IOException
*/
public static void main(String[] args) throws IOException {

if(args.length == 0 || args[0] == null) {
return;
}

File f = new File(args[0]);
if(f.exists() && f.canRead()) {
FileChannel fileChannel = null;
FileLock lock = null;
RandomAccessFile raf = new RandomAccessFile(f, "rw");
fileChannel = raf.getChannel();
lock = fileChannel.tryLock(0L, Long.MAX_VALUE, false);

if (lock == null) {
//couldn't lock the file :( Try later
System.err.println("Cannot obtain exclusive access to file");
return;
}
InputStream in = Channels.newInputStream(fileChannel);
OutputStream out = System.out;
try {
// Transfer bytes from in to out
byte[] buf = new byte[BUFFER_SIZE];
int len;
while ((len = in.read(buf)) > 0) {
Thread.sleep(TIME_TO_WAIT_BEFORE_WRITE);
out.write(buf, 0, len);
}

} catch (IOException e) {
System.err.println(e);
} catch (InterruptedException e) {
// ignored
} finally {
try {
if (in != null) {
in.close(); //this will close the channel too
}
} catch (IOException ioex) {
System.err.println(ioex);
}
}

}

}

}

On the AIX machine where I tested the above program the input file gets easily overwritten by another process. Needless to say, on Windows platforms the file cannot be written once the Java process has got the lock..

Conclusion

The FileLock is deceiving! The "write once, run everywhere" slogan has been proved once again to be just marketing bullshit.

P.S. I wonder what happens on Linux platforms. One of these days I will test it on my Ubuntu box.