[armedbear-cvs] r13611 - trunk/abcl/doc/design/pathnames
mevenson at common-lisp.net
mevenson at common-lisp.net
Sun Oct 2 08:04:45 UTC 2011
Author: mevenson
Date: Sun Oct 2 01:04:44 2011
New Revision: 13611
Log:
Start article describing the implementation of URL-PATHNAME.
Added:
trunk/abcl/doc/design/pathnames/notes.tex
trunk/abcl/doc/design/pathnames/pathnames.tex
Added: trunk/abcl/doc/design/pathnames/notes.tex
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/abcl/doc/design/pathnames/notes.tex Sun Oct 2 01:04:44 2011 (r13611)
@@ -0,0 +1,493 @@
+\begin{verbatim}
+JARs and JAR entries in ABCL
+============================
+
+ Mark Evenson
+ Created: 09 JAN 2010
+ Modified: 21 JUN 2011
+
+Notes towards an implementation of "jar:" references to be contained
+in Common Lisp `PATHNAME`s within ABCL.
+
+Goals
+-----
+
+1. Use Common Lisp pathnames to refer to entries in a jar file.
+
+2. Use `'jar:'` schema as documented in [`java.net.JarURLConnection`][jarURLConnection] for
+ namestring representation.
+
+ An entry in a JAR file:
+
+ #p"jar:file:baz.jar!/foo"
+
+ A JAR file:
+
+ #p"jar:file:baz.jar!/"
+
+ A JAR file accessible via URL
+
+ #p"jar:http://example.org/abcl.jar!/"
+
+ An entry in a ABCL FASL in a URL accessible JAR file
+
+ #p"jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls"
+
+[jarUrlConnection]: http://java.sun.com/javase/6/docs/api/java/net/JarURLConnection.html
+
+3. `MERGE-PATHNAMES` working for jar entries in the following use cases:
+
+ (merge-pathnames "foo-1.cls" "jar:jar:file:baz.jar!/foo.abcl!/foo._")
+ ==> "jar:jar:file:baz.jar!/foo.abcl!/foo-1.cls"
+
+ (merge-pathnames "foo-1.cls" "jar:file:foo.abcl!/")
+ ==> "jar:file:foo.abcl!/foo-1.cls"
+
+4. TRUENAME and PROBE-FILE working with "jar:" with TRUENAME
+ cannonicalizing the JAR reference.
+
+5. DIRECTORY working within JAR files (and within JAR in JAR).
+
+6. References "jar:<URL>" for all strings <URL> that java.net.URL can
+ resolve works.
+
+7. Make jar pathnames work as a valid argument for OPEN with
+:DIRECTION :INPUT.
+
+8. Enable the loading of ASDF systems packaged within jar files.
+
+9. Enable the matching of jar pathnames with PATHNAME-MATCH-P
+
+ (pathname-match-p
+ "jar:file:/a/b/some.jar!/a/system/def.asd"
+ "jar:file:/**/*.jar!/**/*.asd")
+ ==> t
+
+Status
+------
+
+All the above goals have been implemented and tested.
+
+
+Implementation
+--------------
+
+A PATHNAME refering to a file within a JAR is known as a JAR PATHNAME.
+It can either refer to the entire JAR file or an entry within the JAR
+file.
+
+A JAR PATHNAME always has a DEVICE which is a proper list. This
+distinguishes it from other uses of Pathname.
+
+The DEVICE of a JAR PATHNAME will be a list with either one or two
+elements. The first element of the JAR PATHNAME can be either a
+PATHNAME representing a JAR on the filesystem, or a URL PATHNAME.
+
+A PATHNAME occuring in the list in the DEVICE of a JAR PATHNAME is
+known as a DEVICE PATHNAME.
+
+Only the first entry in the the DEVICE list may be a URL PATHNAME.
+
+Otherwise the the DEVICE PATHAME denotes the PATHNAME of the JAR file.
+
+The DEVICE PATHNAME list of enclosing JARs runs from outermost to
+innermost. The implementaion currently limits this list to have at
+most two elements.
+
+The DIRECTORY component of a JAR PATHNAME should be a list starting
+with the :ABSOLUTE keyword. Even though hierarchial entries in jar
+files are stored in the form "foo/bar/a.lisp" not "/foo/bar/a.lisp",
+the meaning of DIRECTORY component is better represented as an
+absolute path.
+
+A jar Pathname has type JAR-PATHNAME, derived from PATHNAME.
+
+
+BNF
+---
+
+An incomplete BNF of the syntax of JAR PATHNAME would be:
+
+ JAR-PATHNAME ::= "jar:" URL "!/" [ ENTRY ]
+
+ URL ::= <URL parsable via java.net.URL.URL()>
+ | JAR-FILE-PATHNAME
+
+ JAR-FILE-PATHNAME ::= "jar:" "file:" JAR-NAMESTRING "!/" [ ENTRY ]
+
+ JAR-NAMESTRING ::= ABSOLUTE-FILE-NAMESTRING
+ | RELATIVE-FILE-NAMESTRING
+
+ ENTRY ::= [ DIRECTORY "/"]* FILE
+
+
+### Notes
+
+1. `ABSOLUTE-FILE-NAMESTRING` and `RELATIVE-FILE-NAMESTRING` can use
+the local filesystem conventions, meaning that on Windows this could
+contain '\' as the directory separator, which are always normalized to
+'/'. An `ENTRY` always uses '/' to separate directories within the
+jar archive.
+
+
+Use Cases
+---------
+
+ // UC1 -- JAR
+ pathname: {
+ namestring: "jar:file:foo/baz.jar!/"
+ device: (
+ pathname: {
+ device: "jar:file:"
+ directory: (:RELATIVE "foo")
+ name: "baz"
+ type: "jar"
+ }
+ )
+ }
+
+
+ // UC2 -- JAR entry
+ pathname: {
+ namestring: "jar:file:baz.jar!/foo.abcl"
+ device: ( pathname: {
+ device: "jar:file:"
+ name: "baz"
+ type: "jar"
+ })
+ name: "foo"
+ type: "abcl"
+ }
+
+
+ // UC3 -- JAR file in a JAR entry
+ pathname: {
+ namestring: "jar:jar:file:baz.jar!/foo.abcl!/"
+ device: (
+ pathname: {
+ name: "baz"
+ type: "jar"
+ }
+ pathname: {
+ name: "foo"
+ type: "abcl"
+ }
+ )
+ }
+
+ // UC4 -- JAR entry in a JAR entry with directories
+ pathname: {
+ namestring: "jar:jar:file:a/baz.jar!/b/c/foo.abcl!/this/that/foo-20.cls"
+ device: (
+ pathname {
+ directory: (:RELATIVE "a")
+ name: "bar"
+ type: "jar"
+ }
+ pathname {
+ directory: (:RELATIVE "b" "c")
+ name: "foo"
+ type: "abcl"
+ }
+ )
+ directory: (:RELATIVE "this" "that")
+ name: "foo-20"
+ type: "cls"
+ }
+
+ // UC5 -- JAR Entry in a JAR Entry
+ pathname: {
+ namestring: "jar:jar:file:a/foo/baz.jar!/c/d/foo.abcl!/a/b/bar-1.cls"
+ device: (
+ pathname: {
+ directory: (:RELATIVE "a" "foo")
+ name: "baz"
+ type: "jar"
+ }
+ pathname: {
+ directory: (:RELATIVE "c" "d")
+ name: "foo"
+ type: "abcl"
+ }
+ )
+ directory: (:ABSOLUTE "a" "b")
+ name: "bar-1"
+ type: "cls"
+ }
+
+ // UC6 -- JAR entry in a http: accessible JAR file
+ pathname: {
+ namestring: "jar:http://example.org/abcl.jar!/org/armedbear/lisp/Version.class",
+ device: (
+ pathname: {
+ namestring: "http://example.org/abcl.jar"
+ }
+ pathname: {
+ directory: (:RELATIVE "org" "armedbear" "lisp")
+ name: "Version"
+ type: "class"
+ }
+ }
+
+ // UC7 -- JAR Entry in a JAR Entry in a URL accessible JAR FILE
+ pathname: {
+ namestring "jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls"
+ device: (
+ pathname: {
+ namestring: "http://example.org/abcl.jar"
+ }
+ pathname: {
+ name: "foo"
+ type: "abcl"
+ }
+ )
+ name: "foo-1"
+ type: "cls"
+ }
+
+ // UC8 -- JAR in an absolute directory
+
+ pathame: {
+ namestring: "jar:file:/a/b/foo.jar!/"
+ device: (
+ pathname: {
+ directory: (:ABSOLUTE "a" "b")
+ name: "foo"
+ type: "jar"
+ }
+ )
+ }
+
+ // UC9 -- JAR in an relative directory with entry
+ pathname: {
+ namestring: "jar:file:a/b/foo.jar!/c/d/foo.lisp"
+ device: (
+ directory: (:RELATIVE "a" "b")
+ name: "foo"
+ type: "jar"
+ )
+ directory: (:ABSOLUTE "c" "d")
+ name: "foo"
+ type: "lisp
+ }
+
+
+URI Encoding
+------------
+
+As a subtype of URL-PATHNAMES, JAR-PATHNAMES follow all the rules for
+that type. Most notably this means that all #\Space characters should
+be encoded as '%20' when dealing with jar entries.
+
+
+History
+-------
+
+Previously, ABCL did have some support for jar pathnames. This support
+used the convention that the if the device field was itself a
+pathname, the device pathname contained the location of the jar.
+
+In the analysis of the desire to treat jar pathnames as valid
+locations for `LOAD`, we determined that we needed a "double" pathname
+so we could refer to the components of a packed FASL in jar. At first
+we thought we could support such a syntax by having the device
+pathname's device refer to the inner jar. But with in this use of
+`PATHNAME`s linked by the `DEVICE` field, we found the problem that UNC
+path support uses the `DEVICE` field so JARs located on UNC mounts can't
+be referenced. via '\\', i.e.
+
+ jar:jar:file:\\server\share\a\b\foo.jar!/this\that!/foo.java
+
+would not have a valid representation.
+
+So instead of having `DEVICE` point to a `PATHNAME`, we decided that the
+`DEVICE` shall be a list of `PATHNAME`, so we would have:
+
+ pathname: {
+ namestring: "jar:jar:file:\\server\share\foo.jar!/foo.abcl!/"
+ device: (
+ pathname: {
+ host: "server"
+ device: "share"
+ name: "foo"
+ type: "jar"
+ }
+ pathname: {
+ name: "foo"
+ type: "abcl"
+ }
+ )
+ }
+
+Although there is a fair amount of special logic inside `Pathname.java`
+itself in the resulting implementation, the logic in `Load.java` seems
+to have been considerably simplified.
+
+When we implemented URL Pathnames, the special syntax for URL as an
+abstract string in the first position of the device list was naturally
+replaced with a URL pathname.
+
+\end{verbatim}
+\begin{verbatim}
+
+
+
+URL Pathnames ABCL
+==================
+
+ Mark Evenson
+ Created: 25 MAR 2010
+ Modified: 21 JUN 2011
+
+Notes towards an implementation of URL references to be contained in
+Common Lisp `PATHNAME` objects within ABCL.
+
+
+References
+----------
+
+RFC3986 Uniform Resource Identifier (URI): Generic Syntax
+
+
+URL vs URI
+----------
+
+We use the term URL as shorthand in describing the URL Pathnames, even
+though the corresponding encoding is more akin to a URI as described
+in RFC3986.
+
+
+Goals
+-----
+
+1. Use Common Lisp pathnames to refer to representations referenced
+by a URL.
+
+2. The URL schemes supported shall include at least "http", and those
+enabled by the URLStreamHandler extension mechanism.
+
+3. Use URL schemes that are understood by the java.net.URL object.
+
+ Example of a Pathname specified by URL:
+
+ #p"http://example.org/org/armedbear/systems/pgp.asd"
+
+4. MERGE-PATHNAMES
+
+ (merge-pathnames "url.asd"
+ "http://example/org/armedbear/systems/pgp.asd")
+ ==> "http://example/org/armedbear/systems/url.asd"
+
+5. PROBE-FILE returning the state of URL accesibility.
+
+6. TRUENAME "aliased" to PROBE-FILE signalling an error if the URL is
+not accessible (see "Non-goal 1").
+
+7. DIRECTORY works for non-wildcards.
+
+8. URL pathname work as a valid argument for OPEN with :DIRECTION :INPUT.
+
+9. Enable the loading of ASDF2 systems referenced by a URL pathname.
+
+10. Pathnames constructed with the "file" scheme
+(i.e. #p"file:/this/file") need to be properly URI encoded according
+to RFC3986 or otherwise will signal FILE-ERROR.
+
+11. The "file" scheme will continue to be represented by an
+"ordinary" Pathname. Thus, after construction of a URL Pathname with
+the "file" scheme, the namestring of the resulting PATHNAME will no
+longer contain the "file:" prefix.
+
+12. The "jar" scheme will continue to be represented by a jar
+Pathname.
+
+
+Non-goals
+---------
+
+1. We will not implement canonicalization of URL schemas (such as
+following "http" redirects).
+
+2. DIRECTORY will not work for URL pathnames containing wildcards.
+
+
+Implementation
+--------------
+
+A PATHNAME refering to a resource referenced by a URL is known as a
+URL PATHNAME.
+
+A URL PATHNAME always has a HOST component which is a proper list.
+This list will be an property list (plist). The property list
+values must be character strings.
+
+ :SCHEME
+ Scheme of URI ("http", "ftp", "bundle", etc.)
+ :AUTHORITY
+ Valid authority according to the URI scheme. For "http" this
+ could be "example.org:8080".
+ :QUERY
+ The query of the URI
+ :FRAGMENT
+ The fragment portion of the URI
+
+The DIRECTORY, NAME and TYPE fields of the PATHNAME are used to form
+the URI `path` according to the conventions of the UNIX filesystem
+(i.e. '/' is the directory separator). In a sense the HOST contains
+the base URL, to which the `path` is a relative URL (although this
+abstraction is violated somwhat by the storing of the QUERY and
+FRAGMENT portions of the URI in the HOST component).
+
+For the purposes of PATHNAME-MATCH-P, two URL pathnames may be said to
+match if their HOST compoments are EQUAL, and all other components are
+considered to match according to the existing rules for Pathnames.
+
+A URL pathname must have a DEVICE whose value is NIL.
+
+Upon creation, the presence of ".." and "." components in the
+DIRECTORY are removed. The DIRECTORY component, if present, is always
+absolute.
+
+The namestring of a URL pathname shall be formed by the usual
+conventions of a URL.
+
+A URL Pathname has type URL-PATHNAME, derived from PATHNAME.
+
+
+URI Encoding
+------------
+
+For dealing with URI Encoding (also known as [Percent Encoding]() we
+adopt the following rules
+
+[Percent Encoding]: http://en.wikipedia.org/wiki/Percent-encoding
+
+1. All pathname components are represented "as is" without escaping.
+
+2. Namestrings are suitably escaped if the Pathname is a URL-PATHNAME
+ or a JAR-PATHNAME.
+
+3. Namestrings should all "round-trip":
+
+ (when (typep p 'pathname)
+ (equal (namestring p)
+ (namestring (pathname p))))
+
+
+Status
+------
+
+This design has been implemented.
+
+
+History
+-------
+
+26 NOV 2010 Changed implemenation to use URI encodings for the "file"
+ schemes including those nested with the "jar" scheme by like
+ aka. "jar:file:/location/of/some.jar!/".
+
+21 JUN 2011 Fixed implementation to properly handle URI encodings
+ refering nested jar archive.
+
+\end{verbatim}
Added: trunk/abcl/doc/design/pathnames/pathnames.tex
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ trunk/abcl/doc/design/pathnames/pathnames.tex Sun Oct 2 01:04:44 2011 (r13611)
@@ -0,0 +1,22 @@
+% -*- mode: latex; -*-
+% http://en.wikibooks.org/wiki/LaTeX/
+\documentclass[10pt]{article}
+% \usepackage{abcl}
+
+\begin{document}
+\title{An Implementation and Analysis of Addding IRI to Common Lisp's Pathname}
+\date{October 2011}
+\author{Mark~Evenson}
+
+\maketitle
+
+\section{Abstract}
+
+We implement the semantics for distributed resource description and
+retrieval by URL/URI/IRI Pathname in the Armeedbear Common Lisp implementation.
+
+\section{Notes}
+\include{notes}
+
+\end{document}
+
More information about the armedbear-cvs
mailing list