[Ecls-list] open :supersede and rename-file advise

Thu Nov 29 20:47:34 UTC 2007

On Nov 29, 2007 1:47 PM, Richard M Kreuter <kreuter at progn.net> wrote:
> "Geo Carncross" writes:
> > On Nov 29, 2007 2:25 AM, Richard M Kreuter <kreuter at progn.net> wrote:
> > > In my previous message, I forgot to mention an important issue in
> > > implementing "lazy superseding" on Unix, and perhaps elsewhere.  On
> > > Unix, open files can be shared among multiple processes.
> >
> > This is true, but without the FFI, ECL has no way of doing this, so
> > this is assuming we implement an POSIX:FORK routine.
>
> Well, users already have access to the FFI, and so may be expected to be
> forking already.

They don't just need to fork, but also try working with ECL's stream.
That means they're implementing a fork that goes back into the calling
process. stdio on many platforms misbehaves if its buffers are full
before a fork() so I don't think expectations are terribly high even
for the users that are forking.

> If ECL's current stream implementation does the
> obvious things on Unix, programs can open files and fork now, but some
> such programs will break if certain lazy-superseding strategies are
> introduced.

I still think this class of program is very rare. They'd have two real
unix processes cooperating to see a partially-written file. And they
would have to be written in lisp.

>  (ISTM that this argues for offering an :IF-EXISTS
> :TRUNCATE, so that even if :NEW-VERSION, :SUPERSEDE, :RENAME, or
> :RENAME-AND-DELETE are changed to give nicer behavior, it will be
> possible modify programs to explicitly request the current semantics,
> even if those semantics yield programs that are not robust in various
> ways.)

Adding :TRUNCATE sounds fine to me.

> > > (2) The implementation could try to do the superseding only when the
> > >     last process closes the file.  AFAICT, doing this in general
> > >     requires subsuming the functionality of lsof, which is probably far
> > >     more work than is practical.
> >
> > Actually, all processes could
> > flock((tmp=dup(fd)),LOCK_SH),close(fd),fd=tmp; at POSIX:FORK time.
> >
> > To test to see if you're the last process, simply check to see if you
> > can upgrade the lock. If flock(fd,LOCK_EX|LOCK_NB); succeeds, then
> > you're the last lock holder. If it fails, it's someone else's job.
>
> I think there are some details that need to be looked at for this
> approach:
>
> (1) flock isn't in POSIX/SUSv3, and so it is not required to be present
>     on all Unices.  (It may be present on all Unixes that ECL runs on; I
>     don't know.)

This is a good point. Does anyone know the answer to this?

If you associated a fifo with each stream you can close the read end
and try to write to it (nonblocking of course!). If the write()
generated SIGPIPE/EPIPE then there are no other readers (processes)
interested. That's give equivalent information but waste a pair of fds
on those systems.

> (2) In order to do this operation at fork-time, it will be necessary for
>     the implementation to maintain a global data structure containing
>     all open streams that need special fork-time treatment.  I don't
>     know whether ECL already has such a structure; if it does not,
>     introducing this sort of thing may require some hair to ensure that
>     the structure is maintained correctly in the face of interrupts,
>     threads, etc., so that file descriptors don't leak.

Well, it doesn't actually need to be done *at* fork time. The
SIGPIPE/EPIPE trick can be performed at close-time That'd keep
POSIX:FORK cheap, but wastes a few fds.

What's more important? A faster fork when using :supersede? or saving
two fds per concurrent :supersede?

>     (Also, note that some users seem to expect garbage collection to
>     close unreferenced streams.  Introducing such a global data
>     structure implies that all streams always have references, so the
>     data structure must either use weak references, or else invalidate
>     the naive user's expectation that streams will be closed
>     automatically.)

ECL's standard finalizer already calls cl_close()

> (3) Given flock and sufficient stream bookkeeping, doing this work at
>     fork-time will make forking more expensive than not doing this work
>     at fork-time, at least when the program has been opening
>     lazy-superseding files.  Some users might expect forking to be about
>     as expensive for Lisp as it is for C.

See my response to (2), but more on that: I'm not sure many users
fork() with lisp, with files open for :supersede, and have written,
and use the stream on each side differently. I don't know if it's
necessary to optimize for this.

> I don't say that these mean your approach isn't good; only that it's
> worthwhile to be mindful of the tradeoffs involved.

No, this is helpful. I'm not taking offense to any of this, and in
fact I'm glad you're making me think about this more. There's
obviously a lot of questions here which is why I started this by
asking for advise instead of a quick patch :)