Multiple processing compiling the same file

Jim Newton jnewton at lrde.epita.fr
Wed Jan 31 10:43:16 UTC 2018


Hi Faré, 

Thanks for taking the time to understand my comments.  I’ve tried to respond to some
of your questions below.  Sorry if my original post wasn’t explicit enough to give enough
explanation for what I’m trying to do.


>>>> If I run several sbcl processes on different nodes in my compute cluster, it might happen that
>>>> two different runs notice the same file needs to be recompiled (via asdf),
>>>> and they might try to compile it at the same time.  What is the best way to prevent this?
>>>> 
> You mean that this machines share the same host directory? Interesting.
> 

Yes, the cluster shares some disk, and shares home directory.    And I believe two cores
on the same physical host share the /tmp, but I’m not 100% sure about that.


>>>> 
> That's an option. It is expensive, though: it means no sharing of fasl
> files between hosts. If you have cluster of 200 machines, that means
> 200x the disk space.

With regard to the question of efficient reuse of fasl files: this is completely irrelevant for my case.   My
code takes hours (10 to 12 hours worst case) to run, but only 20 seconds (or less) to compile.  I’m very happy to completely
remove the fasl files and regenerate them before each 10 hour run.  (note to self: I need to double check that
I do in fact delete the fasl files every time.)   Besides, my current flow allows my simply to git-check-in a change, and
re-lauch the code on the cluster in batch.   I don’t really want to add an error-prone manual local-build-and-deploy step
if that can be avoided, unless of course there is some great advantage to that approach.

> 
> What about instead building your application as an executable and
> delivering that to the cluster?

One difficulty about your build-then-deliver suggestion is that my local machine is running mac-os, and the cluster is
running linux.   I don’t think I can build linux executables on my mac. 


>> 
> You can have different ASDF_OUTPUT_TRANSLATIONS or
> asdf:*output-translations-parameter*
> on each machine, or you can indeed have the user cache depend on
> uiop:hostname and more.
> 

This is what I’ve ended up doing.  And it seems to work.  Here is the code
I have inserted into all my scripts.

(let ((home (directory-namestring (user-homedir-pathname)))
      (uid (sb-posix:getuid))
      (pid  (sb-posix:getpid)))
  (setf asdf::*user-cache* (ensure-directories-exist (format nil "/tmp~A~D/~D/" home uid pid))))




> The Right Thing™ is still to build and test then deploy, rather than
> deploy then build.

In response to your suggestion about build then deploy.  This seems very dangerous and error prone to me.
For example,what if different hosts want to run the same source code but with different optimization settings?  
This is a real possibility, as some of my processes are running with profiling (debug 3) and collecting profiling results,
and others are running super optimized (speed 3) code to try to find the fastest something-or-other. 

I don’t even know whether it is possible create the .asd files so that changing a optimization declaration will trigger
everything depending on it to be recompiled.  And If I think i’ve written my .asd files as such, how would I know
whether they are really correct? 

It is not the case currently, but may very well be in the future that I want different jobs in the cluster running different
git branches of my code code.  That would be a nightmare to manage if I try to share fasl files.

> Using Bazel, you might even be able to build in parallel on your cluster.

Basel sounds interesting, but I don’t really see the advantage of building in parallel when it only
takes a few seconds to build, but half a day to execute.

> I still don't understand why your use case uses deploy-then-build
> rather than build-then-deploy.


I hope it is now clear why I can’t.  (1) local machine is mac-os while cluster is linux 
(2) different jobs in cluster are using different optimization settings. (3) future enhancement
to have different cluster nodes running different branches of the code.

Kind regards
Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/asdf-devel/attachments/20180131/62b6309c/attachment.html>


More information about the asdf-devel mailing list