[armedbear-devel] some questions about r12503

Mon Feb 22 20:20:42 UTC 2010

On 2/22/10 6:21 PM, Ville Voutilainen wrote:
> The separate pathname matcher for jars looks odd to me. I'd expect the
> listing to give whatever it gives, and the
> matching condition in directory.lisp to filter it. Is jar filtering so
> different that it requires a different matcher?

LIST-DIRECTORY lists the jar directory contents including directories, 
while MATCH-WILD-JAR-PATHNAME simply uses PATHNAME-MATCH-P to determine 
what to return.   Since jar entries which are directories always have a 
trailing "/" which is not true for pathnames on the filesystem (#p"/tmp" 
could be a file or a directory) the two are not always equivalent.

The jar pathname part of LIST-DIRECTORY is currently unused.  I 
implemented it first, tried to patch the Lisp in "directory.lisp" to use 
it, but ran into problems that weren't understandable.  I stepped back, 
and noticed that the algorithm for wildcard matching for filesystems was 
fundamentally different from jar files (see next comment), implemented 
that algorithm as MATCH-WILD-JAR-PATHNAME, saw that it worked well 
enough, and went with that for a commit.

Overall, I do suspect that the way I implemented jar pathnames is not 
totally optimal, but in the last six weeks I have not been able to 
improve on the basic design of using a list for DEVICE.  Often there are 
points in reworking 'Pathname.java' where I felt "Why I am doing this 
same sort of code again?  Surely this is a sign a fundamental problem in 
abstraction."  Sometimes I found a better way, sometimes not, but I was 
never able to come up with a better basic assumption (to use DEVICE as a 
list of pathnames for the jar file, DIRECTORY as the relative path 
within that jar).  I have come to the conclusion that implementing jar 
pathnames the way I did pushes a lot of complexity to the associated 
primitives in Pathname.java, but ultimately makes quite a bit easier on 
the user of this abstraction.  As evidence for this, I would argue that 
my approach *has* dramatically simplified the code in 'Load.java' (and 
'Lisp.java' and 'AutoloadedFunctionProxy.java').  A weak point is that 
code that thinks that the DEVICE field is always a string—or that 
(truename (pathname-directory (truename pathname)) always yields a 
pathname if (truename pathname) succeeds—fails.  Since a lot of PATHNAME 
behavior in ANSI is implementation dependent, we are still an ANSI CL, 
but we have very different usage of the DEVICE pathname component than 
is commonly assumed.

An alternative might have been to subclass PATHNAME as PATHNAME-JAR, but 
when I analyzed that approach it seemed to involve a lot more (if 
(pathname-jar-p pathanme) option1 option2) than I wanted.  If all the 
system code taking a PATHNAME as an argument were to be defined with 
generic functions this would be considerably more attractive (and 
easier).  But the dirty secret of CLOS is that it's a bolt-on via 
macros, which all CL implementations that I have studied bootstrap after 
the base system is in place.  CLOS isn't even present in ABCL when the 
user gets to "CL-USER>", right?

> Same question applies to the wildcard matching, jar listing seems to
> do the wildcard matching in java,
> rather than in lisp? That's also different from the way directory
> listings are handled.

DIRECTORY involves wildcards for non-trivial use (its non-wildcard use 
of actually doesn't even distinguish a directory from a file!)  The 
algorithm for use of wildcard DIRECTORY is fundamentally different for 
the filesystem than a jar as follows.  For a filesytem, you have to 
branch at each wildcard in the pathname.  For a jar file, you are simply 
running down the list of all entries in the jar file contents.  One 
could probably implement the second (jar pathname directory listing) in 
terms of the first, but it wouldn't make much sense and wouldn't be 
necessary.  I couldn't do it easily coming into problems with my 
LIST-DIRECTORY implementation, although I did give it about an hour's 
effort.

> The list-directory primitive sorely needs to be split into two
> functions (listJar and listDirectory), it's getting long-winded.
> That's not a high-priority issue, but we need to mind function length,
> it's a huge readability issue.

I am a "if the function doesn't fit into one 80x25 Emacs buffer it 
should be split" kinda guy", but the ABCL codebase violates that maxim 
at so many points (q.v. compiler-pass2.lisp) that I don't try to 
religously follow that principle here.  I'd be happy to do such 
splitting, but would have thought that you of all people would have 
jumped on my back about the penalty for a further push to the stack.  My 
rule of thumb is that for code refactoring like you have done with the 
string function where the codepath is used more than once, such 
splitting is worth it.  But for functions like LIST-DIRECTORY, we should 
keep it all in one method call for efficiency.  For what its worth, I 
*did* try to figure out how to factor the common code between 
LIST-DIRECTORY and PATHNAME-MATCH-P out into something separate, but 
Pathname.wildcardMatches() was the only thing that looked plausible to 
my brain.

Hopefully I understood your questions:  push back if I haven't!

yers in cons,
Mark

-- 
"A screaming comes across the sky.  It has happened before, but there
is nothing to compare to it now."