extending uiop/run-program

Thu Aug 18 10:00:46 UTC 2016

Dear Faré, dear Robert, dear list,

> On 15 Aug 2016, at 15:25, Elias Pipping <pipping.elias at icloud.com> wrote:
> 
>>>> * Also, is #+lispworks7 future-proof? Is there a feature for 7-or-later?
>>> 
>>> I don’t think so, no. It seems that each version has a feature corresponding to its own version but not others. We could maybe provide our own, though. We cannot know what versioning scheme they’ll use in the future but it shouldn’t be too difficult to find out what features lispworks has defined in the past. So a
>>> 
>>> (lispworks7-or-greater-p)
>>> 
>>> could probably be implemented as
>>> 
>>> (not (or #+(or lispworks4 lispworks5 lispworks6) t))
>>> 
>>> or something along those lines.
>>> 
>> Yuck. I'd rather query SYSTEM::*MAJOR-VERSION-NUMBER* in a #..
> 
> I didn’t know about that variable. The resulting code would be nicer but it’s still a private symbol which might disappear in the future. Personally, I’d prefer to go with the uglier but more official version.

Indeed, http://www.lispworks.com/documentation/lw70/LW/html/lw-373.htm explicitly mentions that

(defvar *feature-added-in-LispWorks-7.0*
  #+(or lispworks4 lispworks5 lispworks6) nil
  #-(or lispworks4 lispworks5 lispworks6) t)

is the proper way to check for LispWorks 7.0 and above. Based on that, I’ve added

   (defmacro %if-on-lispworks7+ (action-if action-else)
     #+(and lispworks (not (or lispworks4 lispworks5 lispworks6))) action-if
     #-(and lispworks (not (or lispworks4 lispworks5 lispworks6))) action-else)

in https://gitlab.common-lisp.net/epipping/asdf/commit/cbeb4b3c75ad777f668a48d09b874ae0e4cc16ed and started using it. The name might not be ideal.
We might also want to replace this with a more general version checking macro for LispWorks in the future but I don’t yet see the need for that.
I wasn’t sure where to put it: os.lisp is currently most fitting but still wrong and I was reluctant to add a new module so I left it in run-program.lisp for now.

>>> (A) The first is with the :overwrite default for :if-output-exists. [...]
>>> So since all [implementations] appear to support the behaviour of :supersede (even though not necessarily by that name) and not all of them support :overwrite, it might make sense to change the default from :overwrite to :supersede and then (like other normaliser functions) have a translator that turns :supersede into :overwrite for CLISP. This would unfortunately also affect the (exported) run-program function. On the plus side, it would only change behaviour that previously could not be relied upon anyway.
>>> 
>> Yes, that would be the Right Thing(tm).
>> The entire purpose of UIOP is to provide some portable abstractions
>> with stable semantics abstracting over implementation specifics,
>> rather than functions the meaning of which is actually not portable.
>> See my ASDF3 essay on this topic.
>> 
>>> (B) Process termination
>>> 
>>> Even though it’s possible to kill a process on nearly ever platform somehow (not on CLISP, except for processes spawned through ext::launch) there are multiple tiers of support for that: From native to external: If you have a process in MKCL, you can terminate it through its process object, on unix or windows. That’s the ideal situation.
>>> 
>> If it helps with CLISP, maybe ext::launch should be the default on CLISP.
>> I admit I don't care too much about CLISP, that hasn't been actively
>> maintained for years.
>> (It still ships with ASDF 2.33, for John McCarthy's sake.)
> 
> It appears there are some attempts to breathe new life into CLISP, though. See e.g.
> 
>  https://sourceforge.net/u/musteresel/clisp/
> 
>>> The worst-possible situation that can still be handled if needed is when all you have is the PID: you end up calling `taskkill` on windows, through cmd.exe, or `kill` on unix, through the shell. I only recently started thinking about potential issues that this might involve other than performance, not only in this extreme case but also in cases that lie half-way between the MKCL situation and the worst-possible scenario, e.g. SBCL’s.
>>> 
>>> If you obtain the PID of an arbitrary external process and then try to kill it through that PID at a later point in time, the process may no longer be running. It may not even exist anymore. Worse yet, its PID may have been recycled by the system and assigned to a newly spawned process that you end up killing by accident!
>>> 
>> Note that on Unix, the race condition between killing a process
>> and the process dying then its PID being recycled is inherent,
>> whatever abstraction MKCL may try to provide that SBCL doesn’t.
> 
> Yes, but we’re not dealing with this general problem as I tried to outline in the paragraph after that (quoting myself):
> 
>  Fortunately, I am not trying to solve this general problem but the simpler problem of killing a process spawned through run-program, which will necessarily be a child process. Such a process will signal its parent with SIGCHLD when it terminates and stick around as a zombie until it is waited for.
> 
> So the issue is solvable here, no?

I still think that killing a child process can be safe if the SIGCHLD handler (which would call wait()) acquires the same lock as the process killing function. But maybe I’m missing something.

>> The only mitigating behavior would be for the kernel to insert a pause
>> before a PID could be used, and hope that programs never try kill a PID
>> more than that pause duration after having checked the process was alive
>> (and even then, if the process is kill -STOP'ped between the check and the kill,
>> there's nothing that can prevent the race condition).

From my understanding, the user has control over this pause: Until he or she calls wait(), the process will be in zombie state and its PID will be unavailable to the OS for recycling, no?

>> Of course, 16-bit PIDs make the situation particularly tight on Unix
>> 
>>> ((B') I’m not sure what the best way to handle this requirement of reaping is. %wait-process-result currently calls both functions, so it would be possible to require that asynchronously spawned processes are waited on at some point, mirroring to some extent the requirement that every C program that calls fork() should call wait() at some point. So the name would be rather appropriate even.)
>>> 
>> Yes, it is fair to require that asynchronous UIOP:RUN-PROGRAM users
>> should always call wait-process (or whatever you call the API
>> function) and not rely on the implementation doing it for them.

I’ve now documented this. Indeed, there’s documentation for

- the now exported function wait-process-result: https://gitlab.common-lisp.net/epipping/asdf/commit/7243fe25bb8bf9164d3bd659f42d146fcb87174b
 - the new, exported terminate-process: https://gitlab.common-lisp.net/epipping/asdf/commit/fa289e53700d92ff8f7c2da38b98b0d22832cee6
 - the new, exported process-alive-p: https://gitlab.common-lisp.net/epipping/asdf/commit/4a1339e8e8ce83408251c9264a7d8b5fe5393976
 - the new, exported process-close: https://gitlab.common-lisp.net/epipping/asdf/commit/0cc3f20e7fcc1ae9250ce241f1ac2741b0e26094

I’m not yet sure what the best way to export the asynchronous process spawning functionality is. %run-program is not supposed to replace run-program after all. So I either need to extend run-program or add another wrapper that calls %run-program in a different fashion. In terms of return values, the latter would be more consistent: run-program would return an exit code and the new function (`start`?) would return a process-info object.

I’ve also carried out miscellaneous smaller fixes(*). Parameter errors now occur early and in the same place: https://gitlab.common-lisp.net/epipping/asdf/commit/cf47cd04a509cd923f2368f9840492b26ed764aa

I’ve plugged the test into the existing test suite, so that it can be run e.g. via

  make; ./test/run-tests.sh sbcl run-program-unix-tests.script

I’ll still have to mark all the known failures as such. Come to think of it, I should probably handle those known issues not just in the test suite but also in run-program itself, albeit maybe through another error class (e.g. 'known-bug rather than 'parameter-error). Or maybe not (does it help to know that a feature is not missing but broken on your platform? it doesn’t really…).

(*) Especially after all the rebasing, I think the merge request at

  https://gitlab.common-lisp.net/asdf/asdf/merge_requests/3

continues to be the best way to inspect the changes I’ve made, be it commit-by commit or in terms of overall changes.

Elias