[armedbear-cvs] r13611 - trunk/abcl/doc/design/pathnames

mevenson at common-lisp.net mevenson at common-lisp.net
Sun Oct 2 08:04:45 UTC 2011


Author: mevenson
Date: Sun Oct  2 01:04:44 2011
New Revision: 13611

Log:
Start article describing the implementation of URL-PATHNAME.

Added:
   trunk/abcl/doc/design/pathnames/notes.tex
   trunk/abcl/doc/design/pathnames/pathnames.tex

Added: trunk/abcl/doc/design/pathnames/notes.tex
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ trunk/abcl/doc/design/pathnames/notes.tex	Sun Oct  2 01:04:44 2011	(r13611)
@@ -0,0 +1,493 @@
+\begin{verbatim}
+JARs and JAR entries in ABCL
+============================
+
+    Mark Evenson
+    Created:  09 JAN 2010
+    Modified: 21 JUN 2011
+
+Notes towards an implementation of "jar:" references to be contained
+in Common Lisp `PATHNAME`s within ABCL.
+
+Goals
+-----
+
+1.  Use Common Lisp pathnames to refer to entries in a jar file.
+    
+2.  Use `'jar:'` schema as documented in [`java.net.JarURLConnection`][jarURLConnection] for
+    namestring representation.
+
+    An entry in a JAR file:
+
+         #p"jar:file:baz.jar!/foo"
+    
+    A JAR file:
+
+         #p"jar:file:baz.jar!/"
+
+    A JAR file accessible via URL
+
+         #p"jar:http://example.org/abcl.jar!/"
+
+    An entry in a ABCL FASL in a URL accessible JAR file
+
+         #p"jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls"
+         
+[jarUrlConnection]: http://java.sun.com/javase/6/docs/api/java/net/JarURLConnection.html
+
+3.  `MERGE-PATHNAMES` working for jar entries in the following use cases:
+
+        (merge-pathnames "foo-1.cls" "jar:jar:file:baz.jar!/foo.abcl!/foo._")
+        ==> "jar:jar:file:baz.jar!/foo.abcl!/foo-1.cls"
+
+        (merge-pathnames "foo-1.cls" "jar:file:foo.abcl!/")
+        ==> "jar:file:foo.abcl!/foo-1.cls"
+
+4.  TRUENAME and PROBE-FILE working with "jar:" with TRUENAME
+    cannonicalizing the JAR reference.
+
+5.  DIRECTORY working within JAR files (and within JAR in JAR).
+
+6.  References "jar:<URL>" for all strings <URL> that java.net.URL can
+    resolve works.
+
+7.  Make jar pathnames work as a valid argument for OPEN with
+:DIRECTION :INPUT.
+
+8.  Enable the loading of ASDF systems packaged within jar files.
+
+9.  Enable the matching of jar pathnames with PATHNAME-MATCH-P
+
+        (pathname-match-p 
+          "jar:file:/a/b/some.jar!/a/system/def.asd"
+          "jar:file:/**/*.jar!/**/*.asd")      
+        ==> t
+
+Status
+------
+
+All the above goals have been implemented and tested.
+
+
+Implementation
+--------------
+
+A PATHNAME refering to a file within a JAR is known as a JAR PATHNAME.
+It can either refer to the entire JAR file or an entry within the JAR
+file.
+
+A JAR PATHNAME always has a DEVICE which is a proper list.  This
+distinguishes it from other uses of Pathname.
+
+The DEVICE of a JAR PATHNAME will be a list with either one or two
+elements.  The first element of the JAR PATHNAME can be either a
+PATHNAME representing a JAR on the filesystem, or a URL PATHNAME.
+
+A PATHNAME occuring in the list in the DEVICE of a JAR PATHNAME is
+known as a DEVICE PATHNAME.
+
+Only the first entry in the the DEVICE list may be a URL PATHNAME.
+
+Otherwise the the DEVICE PATHAME denotes the PATHNAME of the JAR file.
+
+The DEVICE PATHNAME list of enclosing JARs runs from outermost to
+innermost.  The implementaion currently limits this list to have at
+most two elements.
+    
+The DIRECTORY component of a JAR PATHNAME should be a list starting
+with the :ABSOLUTE keyword.  Even though hierarchial entries in jar
+files are stored in the form "foo/bar/a.lisp" not "/foo/bar/a.lisp",
+the meaning of DIRECTORY component is better represented as an
+absolute path.
+
+A jar Pathname has type JAR-PATHNAME, derived from PATHNAME.
+
+
+BNF
+---
+
+An incomplete BNF of the syntax of JAR PATHNAME would be:
+
+      JAR-PATHNAME ::= "jar:" URL "!/" [ ENTRY ]
+
+      URL ::= <URL parsable via java.net.URL.URL()>
+            | JAR-FILE-PATHNAME
+
+      JAR-FILE-PATHNAME ::= "jar:" "file:" JAR-NAMESTRING "!/" [ ENTRY ]
+
+      JAR-NAMESTRING  ::=  ABSOLUTE-FILE-NAMESTRING
+                         | RELATIVE-FILE-NAMESTRING
+
+      ENTRY ::= [ DIRECTORY "/"]* FILE
+
+
+### Notes
+
+1.  `ABSOLUTE-FILE-NAMESTRING` and `RELATIVE-FILE-NAMESTRING` can use
+the local filesystem conventions, meaning that on Windows this could
+contain '\' as the directory separator, which are always normalized to
+'/'.  An `ENTRY` always uses '/' to separate directories within the
+jar archive.
+
+
+Use Cases
+---------
+
+    // UC1 -- JAR
+    pathname: {
+      namestring: "jar:file:foo/baz.jar!/"
+      device: ( 
+        pathname: {  
+          device: "jar:file:"
+          directory: (:RELATIVE "foo")
+          name: "baz"
+          type: "jar"
+        }
+      )
+    }
+
+
+    // UC2 -- JAR entry 
+    pathname: {
+      namestring: "jar:file:baz.jar!/foo.abcl"
+      device: ( pathname: {
+        device: "jar:file:"
+        name: "baz"
+        type: "jar"
+      }) 
+      name: "foo"
+      type: "abcl"
+    }
+
+
+    // UC3 -- JAR file in a JAR entry
+    pathname: {
+      namestring: "jar:jar:file:baz.jar!/foo.abcl!/"
+      device: ( 
+        pathname: {
+          name: "baz"
+          type: "jar"
+        }
+        pathname: {
+          name: "foo"
+          type: "abcl"
+        } 
+      )
+    }
+
+    // UC4 -- JAR entry in a JAR entry with directories
+    pathname: {
+      namestring: "jar:jar:file:a/baz.jar!/b/c/foo.abcl!/this/that/foo-20.cls"
+      device: ( 
+        pathname {
+          directory: (:RELATIVE "a")      
+          name: "bar"
+          type: "jar"
+        }
+        pathname {
+          directory: (:RELATIVE "b" "c")
+          name: "foo"
+          type: "abcl"
+        }
+      )
+      directory: (:RELATIVE "this" "that")
+      name: "foo-20"
+      type: "cls" 
+    }
+
+    // UC5 -- JAR Entry in a JAR Entry
+    pathname: {
+      namestring: "jar:jar:file:a/foo/baz.jar!/c/d/foo.abcl!/a/b/bar-1.cls"
+      device: (
+        pathname: {
+          directory: (:RELATIVE "a" "foo")
+          name: "baz"
+          type: "jar"
+        }
+        pathname: {
+          directory: (:RELATIVE "c" "d")
+          name: "foo"
+          type: "abcl"
+        }
+      )
+      directory: (:ABSOLUTE "a" "b")
+      name: "bar-1"
+      type: "cls"
+    }
+
+    // UC6 -- JAR entry in a http: accessible JAR file
+    pathname: {
+      namestring: "jar:http://example.org/abcl.jar!/org/armedbear/lisp/Version.class",
+      device: ( 
+        pathname: {
+          namestring: "http://example.org/abcl.jar"
+        }
+        pathname: {
+          directory: (:RELATIVE "org" "armedbear" "lisp")
+          name: "Version"
+          type: "class"
+       }
+    }
+
+    // UC7 -- JAR Entry in a JAR Entry in a URL accessible JAR FILE
+    pathname: {
+       namestring  "jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls"
+       device: (
+         pathname: {
+           namestring: "http://example.org/abcl.jar"
+         }
+         pathname: { 
+           name: "foo"
+           type: "abcl"
+         }
+      )
+      name: "foo-1"
+      type: "cls"
+    }
+
+    // UC8 -- JAR in an absolute directory
+
+    pathame: {
+       namestring: "jar:file:/a/b/foo.jar!/"
+       device: (
+         pathname: {
+           directory: (:ABSOLUTE "a" "b")
+           name: "foo"
+           type: "jar"
+         }
+       )
+    }
+
+    // UC9 -- JAR in an relative directory with entry
+    pathname: {
+       namestring: "jar:file:a/b/foo.jar!/c/d/foo.lisp"
+       device: (
+         directory: (:RELATIVE "a" "b")
+         name: "foo"
+         type: "jar"
+       )
+       directory: (:ABSOLUTE "c" "d")
+       name: "foo"
+       type: "lisp
+    }
+
+
+URI Encoding 
+------------
+
+As a subtype of URL-PATHNAMES, JAR-PATHNAMES follow all the rules for
+that type.  Most notably this means that all #\Space characters should
+be encoded as '%20' when dealing with jar entries.
+
+
+History
+-------
+
+Previously, ABCL did have some support for jar pathnames. This support
+used the convention that the if the device field was itself a
+pathname, the device pathname contained the location of the jar.
+
+In the analysis of the desire to treat jar pathnames as valid
+locations for `LOAD`, we determined that we needed a "double" pathname
+so we could refer to the components of a packed FASL in jar.  At first
+we thought we could support such a syntax by having the device
+pathname's device refer to the inner jar.  But with in this use of
+`PATHNAME`s linked by the `DEVICE` field, we found the problem that UNC
+path support uses the `DEVICE` field so JARs located on UNC mounts can't
+be referenced. via '\\', i.e.  
+
+    jar:jar:file:\\server\share\a\b\foo.jar!/this\that!/foo.java
+
+would not have a valid representation.
+
+So instead of having `DEVICE` point to a `PATHNAME`, we decided that the
+`DEVICE` shall be a list of `PATHNAME`, so we would have:
+
+    pathname: {
+      namestring: "jar:jar:file:\\server\share\foo.jar!/foo.abcl!/"
+      device: ( 
+                pathname: {
+                  host: "server"
+                  device: "share"
+                  name: "foo"
+                  type: "jar"
+                }
+                pathname: {
+                  name: "foo"
+                  type: "abcl"
+                }
+              )
+    }
+
+Although there is a fair amount of special logic inside `Pathname.java`
+itself in the resulting implementation, the logic in `Load.java` seems
+to have been considerably simplified.
+
+When we implemented URL Pathnames, the special syntax for URL as an
+abstract string in the first position of the device list was naturally
+replaced with a URL pathname.
+
+\end{verbatim}
+\begin{verbatim}
+
+
+
+URL Pathnames ABCL
+==================
+
+    Mark Evenson
+    Created:  25 MAR 2010
+    Modified: 21 JUN 2011
+
+Notes towards an implementation of URL references to be contained in
+Common Lisp `PATHNAME` objects within ABCL.
+
+
+References
+----------
+
+RFC3986   Uniform Resource Identifier (URI): Generic Syntax
+
+
+URL vs URI
+----------
+
+We use the term URL as shorthand in describing the URL Pathnames, even
+though the corresponding encoding is more akin to a URI as described
+in RFC3986.  
+
+
+Goals
+-----
+
+1.  Use Common Lisp pathnames to refer to representations referenced
+by a URL.
+
+2.  The URL schemes supported shall include at least "http", and those
+enabled by the URLStreamHandler extension mechanism.
+
+3.  Use URL schemes that are understood by the java.net.URL object.
+
+    Example of a Pathname specified by URL:
+    
+        #p"http://example.org/org/armedbear/systems/pgp.asd"
+    
+4.  MERGE-PATHNAMES 
+
+        (merge-pathnames "url.asd"
+            "http://example/org/armedbear/systems/pgp.asd")
+        ==> "http://example/org/armedbear/systems/url.asd"
+
+5.  PROBE-FILE returning the state of URL accesibility.
+
+6.  TRUENAME "aliased" to PROBE-FILE signalling an error if the URL is
+not accessible (see "Non-goal 1").
+
+7.  DIRECTORY works for non-wildcards.
+
+8.  URL pathname work as a valid argument for OPEN with :DIRECTION :INPUT.
+
+9.  Enable the loading of ASDF2 systems referenced by a URL pathname.
+
+10.  Pathnames constructed with the "file" scheme
+(i.e. #p"file:/this/file") need to be properly URI encoded according
+to RFC3986 or otherwise will signal FILE-ERROR.  
+
+11.  The "file" scheme will continue to be represented by an
+"ordinary" Pathname.  Thus, after construction of a URL Pathname with
+the "file" scheme, the namestring of the resulting PATHNAME will no
+longer contain the "file:" prefix.
+
+12.  The "jar" scheme will continue to be represented by a jar
+Pathname.
+
+
+Non-goals 
+---------
+
+1.  We will not implement canonicalization of URL schemas (such as
+following "http" redirects).
+
+2.  DIRECTORY will not work for URL pathnames containing wildcards.
+
+
+Implementation
+--------------
+
+A PATHNAME refering to a resource referenced by a URL is known as a
+URL PATHNAME.
+
+A URL PATHNAME always has a HOST component which is a proper list.
+This list will be an property list (plist).  The property list
+values must be character strings.
+
+    :SCHEME
+        Scheme of URI ("http", "ftp", "bundle", etc.)
+    :AUTHORITY   
+        Valid authority according to the URI scheme.  For "http" this
+        could be "example.org:8080".
+    :QUERY
+        The query of the URI
+    :FRAGMENT
+        The fragment portion of the URI
+        
+The DIRECTORY, NAME and TYPE fields of the PATHNAME are used to form
+the URI `path` according to the conventions of the UNIX filesystem
+(i.e. '/' is the directory separator).  In a sense the HOST contains
+the base URL, to which the `path` is a relative URL (although this
+abstraction is violated somwhat by the storing of the QUERY and
+FRAGMENT portions of the URI in the HOST component).
+
+For the purposes of PATHNAME-MATCH-P, two URL pathnames may be said to
+match if their HOST compoments are EQUAL, and all other components are
+considered to match according to the existing rules for Pathnames.
+
+A URL pathname must have a DEVICE whose value is NIL.
+
+Upon creation, the presence of ".." and "." components in the
+DIRECTORY are removed.  The DIRECTORY component, if present, is always
+absolute.
+
+The namestring of a URL pathname shall be formed by the usual
+conventions of a URL.
+
+A URL Pathname has type URL-PATHNAME, derived from PATHNAME.
+
+
+URI Encoding 
+------------
+
+For dealing with URI Encoding (also known as [Percent Encoding]() we
+adopt the following rules
+
+[Percent Encoding]: http://en.wikipedia.org/wiki/Percent-encoding
+
+1.  All pathname components are represented "as is" without escaping.
+
+2.  Namestrings are suitably escaped if the Pathname is a URL-PATHNAME
+    or a JAR-PATHNAME.
+
+3.  Namestrings should all "round-trip":
+
+    (when (typep p 'pathname)
+       (equal (namestring p)
+              (namestring (pathname p))))
+
+
+Status
+------
+
+This design has been implemented.
+
+
+History
+-------
+
+26 NOV 2010 Changed implemenation to use URI encodings for the "file"
+  schemes including those nested with the "jar" scheme by like
+  aka. "jar:file:/location/of/some.jar!/".
+
+21 JUN 2011 Fixed implementation to properly handle URI encodings
+  refering nested jar archive.
+
+\end{verbatim}

Added: trunk/abcl/doc/design/pathnames/pathnames.tex
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ trunk/abcl/doc/design/pathnames/pathnames.tex	Sun Oct  2 01:04:44 2011	(r13611)
@@ -0,0 +1,22 @@
+% -*- mode: latex; -*-
+% http://en.wikibooks.org/wiki/LaTeX/
+\documentclass[10pt]{article}
+% \usepackage{abcl}
+
+\begin{document}
+\title{An Implementation and Analysis of Addding IRI to Common Lisp's Pathname}
+\date{October 2011}
+\author{Mark~Evenson}
+
+\maketitle
+
+\section{Abstract}
+
+We implement the semantics for distributed resource description and
+retrieval by URL/URI/IRI Pathname in the Armeedbear Common Lisp implementation.
+
+\section{Notes}
+\include{notes}
+
+\end{document}
+




More information about the armedbear-cvs mailing list