extending uiop/run-program

Mon Aug 15 12:22:47 UTC 2016

> On 13 Aug 2016, at 02:59, Faré <fahree at gmail.com> wrote:
> 
> Dear Elias,

Dear Faré,

> first, I'd like to reiterate how positively impressed I am by the
> quality of your work on improving uiop:run-program.

thank you!

>>  https://gitlab.common-lisp.net/epipping/asdf/commits/run-program-messy-with-rebasing
> 
> 
>> I’ve also opened a merge request at
>> 
>>  https://gitlab.common-lisp.net/asdf/asdf/merge_requests/3
>> 
> 
> Remaining issues I see:
> * missing docstrings for file-stream-p & file-or-synonym-stream-p (and
> any other exported function you introduce).
> * to avoid code duplication in
> 5cd51f5cd26cd29d14e5a57dfaed2b0b5602a685 I would use nest or something
> to separate the lispworks 7 specific code without duplicating the
> common code.

I’ve now arrived at

  https://gitlab.common-lisp.net/epipping/asdf/commit/ffd8d8564bdc6551f2b682e7683bf16d6d459161

(to be squashed into an earlier commit) that I’m rather happy with.

> * Also, is #+lispworks7 future-proof? Is there a feature for 7-or-later?

I don’t think so, no. It seems that each version has a feature corresponding to its own version but not others. We could maybe provide our own, though. We cannot know what versioning scheme they’ll use in the future but it shouldn’t be too difficult to find out what features lispworks has defined in the past. So a

  (lispworks7-or-greater-p)

could probably be implemented as

  (not (or #+(or lispworks4 lispworks5 lispworks6) t))

or something along those lines.

> * I was going to suggest adding a not-implemented error. I see you
> did. Congrats! Could you put it in utility.lisp, though? It's a
> general-purpose error to have. Same for parameter-error.

Sure, I’ll move it.

There are unfortunately quite a few more issues that I’ve become aware of and I’m not quite sure how to handle them. So I’ll describe them here for now. At the very least, they’re then archived somewhere.

(A) The first is with the :overwrite default for :if-output-exists. If :output is a file, it will be opened, and open will be passed :overwrite for :if-exists. That answers the question of what one should expect to be able to specify for :if-output-exists, too.

What the user has no direct control over through the run-program interface is what happens if the file does not exist. There is :if-output-does-not-exist even though open has :if-does-not-exist. Such a flag is not strictly necessary, however, because if left unspecified, :if-does-not-exist takes a value that depends on :if-exists. If :if-exists is :overwrite, :if-does-not-exist will default to :error. There’s also the :supersede argument to :if-output-exists that is identical to :overwrite except that the default for :if-does-not-exist will not be :error but :create.

I would assume that these terms are well established in the CL community so that anyone who reads “the default for :if-output-exists is :overwrite” will know (1) he/she needs to make sure the file exists, that (2) he/she can expect an error to occur if it does not and (3) that :overwrite can be replaced with :supersede if (1) and (2) are undesirable.

Unfortunately, that’s not always what happens with run-program. Clozure CL and SBCL behave exactly this way. Others do not. In particular, CLISP, e.g. does not accept :supersede. Instead, its :overwrite behaves like :supersede. In general, things are a bit murky. CLISP was the only platform I tested on where :supersede was not accepted. But other platforms simply do not appear to distinguish between :supersede and :overwrite, so that no error will occur with either, if the file already exists (this affects at least Allegro CL, CMU CL, ECL, LispWorks).

So since all of them appear to support the behaviour of :supersede (even though not necessarily by that name) and not all of them support :overwrite, it might make sense to change the default from :overwrite to :supersede and then (like other normaliser functions) have a translator that turns :supersede into :overwrite for CLISP. This would unfortunately also affect the (exported) run-program function. On the plus side, it would only change behaviour that previously could not be relied upon anyway.

(B) Process termination

Even though it’s possible to kill a process on nearly ever platform somehow (not on CLISP, except for processes spawned through ext::launch) there are multiple tiers of support for that: From native to external: If you have a process in MKCL, you can terminate it through its process object, on unix or windows. That’s the ideal situation.
The worst-possible situation that can still be handled if needed is when all you have is the PID: you end up calling `taskkill` on windows, through cmd.exe, or `kill` on unix, through the shell. I only recently started thinking about potential issues that this might involve other than performance, not only in this extreme case but also in cases that lie half-way between the MKCL situation and the worst-possible scenario, e.g. SBCL’s.

If you obtain the PID of an arbitrary external process and then try to kill it through that PID at a later point in time, the process may no longer be running. It may not even exist anymore. Worse yet, its PID may have been recycled by the system and assigned to a newly spawned process that you end up killing by accident!

Fortunately, I am not trying to solve this general problem but the simpler problem of killing a process spawned through run-program, which will necessarily be a child process. Such a process will signal its parent with SIGCHLD when it terminates and stick around as a zombie until it is waited for.

So while the (interesting but not critical) transition from the running state to the zombie state lies beyond our control, the transition from the zombie state (in which we can safely send signals to the process without any effect) to no state (where the PID can be recycled by the OS whenever it deems it appropriate and sending signals could be dangerous) only occurs when requested by the user.

The fact that processes become zombies after termination and require cleanup is made explicit in Allegro CL and MKCL. In both, you need to call sys:reap-os-subprocess or mkcl:join-process, respectively. If you do not, you will eventually run out of PIDs, even if only a single process is ever running at the same time. But at least you do not send signals where they don’t belong.

((B') I’m not sure what the best way to handle this requirement of reaping is. %wait-process-result currently calls both functions, so it would be possible to require that asynchronously spawned processes are waited on at some point, mirroring to some extent the requirement that every C program that calls fork() should call wait() at some point. So the name would be rather appropriate even.)

On other platforms, e.g. CMU CL, ECL, and SBCL, the handler for the SIGCHLD signal calls wait() so that the user has no control over the zombie->gone transition. Both in ECL and SBCL, there are locks around the list of active processes that the handler acquires(*). SBCL’s process-kill does not currently acquire this lock, however, and a process termination facility is only at the planning stage for ECL at this point. CMU CL does not even have such a lock. So even even though CMU CL and SBCL come with a process-kill function, you cannot currently be sure if you’re sending your signal to the process that you meant to send it to (it might hit the process in running or zombie state, no process at all, or a new process!).
As unlikely as it may be that such a problem will occur (an OS should not be too eager to hand out the same PID right away after all), since there is no upper limit on how much damage this could cause, I’m thinking we might want to have another function (process-termination-safe-p) for users who don’t just want to run UIOP on their NetBSD toaster.

(*) I would like to thank Stas Boukarev (stassats on irc/github) and Daniel Kochmanski (jackdaniel/dkochmanski on irc/github) publicly here for helping me understand this issue in #sbcl and #ecl, respectively.

(C) We’ll also need a process-close function, I think, to close any stream requested through :stream arguments to :input/:output/:error-output. I’m adding one right now.

>> As for CLASP: Getting it to compile is currently rather difficult and as promising and
>> interesting as it sounds as a project: I would currently describe it as experimental.
>> Even though I’ve now (after multiple attempts on multiple different linux distributions)
>> managed to compile it, it currently has issues that keep me from even loading my
>> test script.
>> 
>> I’ve spoken to Nicolas Hafner aka Shinmera (on irc and github) about this and he
>> allowed me to quote him as having said:
>> 
>> [..] I wouldn't bother with clasp right now. If you break UIOP in any substantial way
>> we'll notice it. If not, then, well, you can check back once things are running more
>> smoothly again. [..] ASDF is being tested somewhat automatically. If you break
>> anything seriously, we'll notice [..]
>> 
> Excellent! It's pleasing to see both you and they have your shit together.
> 
>> The function has been exported since at least January of 2008: that’s how far back the history of the
>> file that contains such exports goes here:
>> 
>>  https://github.com/Clozure/ccl/commits/master/lib/ccl-export-syms.lisp
>> 
> Wow, I was going to say I'm impressed they migrated from their
> badly-done svn to git, but I see it's a mirror and the original is
> still svn. Meh.
> 
>> I’ve gone with a class now (please let me know if you prefer a struct).
>> That gets rid of nconc and (Robert had already suggested it earlier for
>> this reason) allows one to check if an object is a process). (Actually,
>> I’ve called it process-info because it’s more than just a process but
>> that’s all up for debate of course).
>> 
> Perfect, especially with the semi-reflective use of slot-value. Much cleaner.
> Don't kid yourself into believing that it's either faster or less
> (source or binary) code, though.
> 
>>> Yes, the names are acceptable. If you are going to support them as
>>> stable interfaces, you can remove the % and export the symbols. Be
>>> sure to have adequate docstrings, then.
>> 
>> These are two outstanding issues: I have not exported anything yet
>> and nothing’s documented. On my TODO list.
>> 
> Thanks a lot!
> 
>> I’ve now moved the test to uiop/test. Is that also acceptable? (I didn’t mean to deviate
>> from what you said, I just hadn’t read it carefully enough)
>> 
> Well, then you have to modify the test framework to find those tests
> where they are.
> (And if you're courageous, you can also update the alternate test
> framework I wrote.)
> 
>> Up until now, I only used the tests to check for myself if the code I’d written works
>> as expected. I’ve now given the test suite a workover so that it’ll handle unexpected
>> errors, too, and provide a summary. The issue is, though: A test suite that anyone
>> further downstream (be it end users or package maintainers for a distribution) would
>> want to run should have a certain character. You wouldn’t want it to sometimes skip 80%
>> of the subtests and you wouldn’t want it to fail any tests (that upstream already knows
>> about). Yet that’s precisely what happens right now.
>> 
> The ASDF test framework has some mechanism for having known-failures.
> 
>> CLISP’s ext:run-program only supports a tiny subset of the %run-program interface.(*)
>> CMU CL and MKCL both have multiple bugs (I’ve filed issues for those) that currently make
>> quite a few tests fail, hang, or even lead to a crash of the interpreter. I’ve thus had to
>> explicitly skip the hanging and crashing ones.
>> 
> Welcome to the world of CL portability hacking :-(
> 
>> (*) There’s also the private ext::launch which does many things ext:run-program doesn’t
>> but then also doesn’t sport an actual superset of features (e.g. you cannot have it
>> read from a file. unless you do it manually by opening a file stream and passing that
>> stream back to the process. but then you’re in charge of closing the stream, also
>> with :wait nil…)
>> While we’re on that topic: ABCL has a sys:run-program function which is not currently
>> put to use by uiop/run-program. I mean to look into that, too.
>> 
> I hadn't looked at CLISP's ext::launch (wasn't aware of it).
> I believe I gave a cursory look at ABCL's sys:run-program before I
> decided that (at least at the time) it didn't have enough features to
> do what I want and I might as well fall back to using system.

That’s probably because it does not have :input, :output, :error specifiers (or :if-*-exists/does-not-exist): It behaves as if all three were set to :stream. The three streams are consequently always returned. It does not fit very well with what the other implementations do, indeed.

>> I’ve tried to download the free version of Scieneer Lisp by the way but that apparently
>> requires approval by someone in the company and they’ve never contacted me. So
>> I hope to be able to add clasp to that list at some point and also abcl, but scl probably
>> won’t make it.
>> 
> Last I knew, Douglas Crosher seems to have stopped hacking Scieneer
> and to be angry at me and/or free software hackers in general.
> 
>>> PS: Are you interested in becoming new maintainer for UIOP?
>> 
>> I’d certainly like to help and invest time. I’m not sure if you’re asking “a maintainer”
>> or “the maintainer”. Because there are certainly large parts of UIOP that I know nothing
>> about and issues such as the re-initialisation of argv after dumping an image of cmucl
>> sound intimidating, requiring a lot more understanding of common lisp, cmucl, or uiop
>> (I don’t even know which one of those three I know too little about, potentially all three)
>> than I currently have. So I think my answer should be: “a maintainer”: yes, but
>> (pardon if you that’s not what you asked) “the maintainer”: not at this point in time.
>> 
> I was thinking about "the maintainer", but you could start being "a
> maintainer", i.e. have commit rights. I still think you should get
> your code reviewed in a branch, though, especially since you're doing
> such a good job at it!

Feature branches are perfect for me.

Elias

PS: I posted some of my early thoughts on the race condition that code such as (when (process-alive-p p) (terminate-process p)) would be affected by as a comment to the merge request. But I would assume no notification emails were sent out. (I’ve now removed the comment)

> You probably know more about the issue with argv on CMUCL than I do at
> this point: it's mostly fallen off my cache, and any knowledge I had
> was reified in uiop + cl-launch.
> 
> Robert, are you OK with granting Elias the commit bit on ASDF?
> 
> —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org
> Be yourself, everybody else is taken. — Oscar Wilde