[pro] Common Lisp replacement for noweb
William Halliburton
whalliburton at gmail.com
Mon Mar 28 17:47:21 UTC 2011
---------- Forwarded message ----------
From: daly <daly at axiom-developer.org>
Date: Mon, Mar 28, 2011 at 12:18 PM
Subject: Common Lisp replacement for noweb
To: thomas.m.hermann at odonata-research.com, pro at common-lisp.net
I have moved from using noweb to a pure Common Lisp version
of literate programming. (The source is also at:
http://literatesoftware.com/tangle.lisp )
The noweb program uses two functions,
weave -- takes a file and extracts latex
tangle - takes a file and extracts running code
The noweb syntax is:
<<thechunk>>=
your source code
@
where your code is defined in the block delimited by the
<<...>>= and the @ symbol. To use the delimited chunk
somewhere you write the name of the chunk as:
<<thechunk>>
It would be better to use a valid latex environment.
That would mean that there is no need for a "weave" function
since the original file is valid latex.
\begin{chunk}{thechunk}
your code here
\end{chunk}
\getchunk{thechunk}
All that would be left is to make Common Lisp understand
the latex environment syntax which is trivial to do. So I
wrote a tangle.lisp program. It accepts and processes both
the old noweb syntax and the new latex syntax.
The idea is simple. Read the file, hash the chunks, and
expand them when a getchunk is found.
The code is attached.
Send questions to daly at literatesoftware.com
Tim Daly
daly at axiom-developer.org
daly at literatesoftware.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/pro/attachments/20110328/028cf8be/attachment.html>
-------------- next part --------------
; 0 AUTHOR and LICENSE
; 1 ABSTRACT
; 2 THE LATEX SUPPORT CODE
; 3 GLOBALS
; 4 THE TANGLE COMMAND
; 5 THE TANGLE FUNCTION
; 6 GCL-READ-FILE (aka read-sequence)
; 7 GCL-HASHCHUNKS
; 8 GCL-EXPAND
; 9 ISCHUNK-LATEX
; 10 ISCHUNK-NOWEB
; 11 ALLCHUNKS
; 12 makeHelpFiles
; 13 makeInputFiles
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 0 AUTHOR and LICENSE
;;; Timothy Daly (daly at axiom-developer.org)
;;; License: Public Domain
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 1 ABSTRACT
;;; This program will extract the source code from a literate file
;;; A literate lisp file contains a mixture of latex and lisp sources code.
;;; The file is intended to be in one of two formats, either in latex
;;; format or, for legacy reasons, in noweb format.
;;; Latex format files defines a newenvironment so that code chunks
;;; can be delimited by \begin{chunk}{name} .... \end{chunk} blocks
;;; This is supported by the following latex code.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 2 THE LATEX SUPPORT CODE
;;; The verbatim package quotes everything within its grasp and is used to
;;; hide and quote the source code during latex formatting. The verbatim
;;; environment is built in but the package form lets us use it in our
;;; chunk environment and it lets us change the font.
;;;
;;; \usepackage{verbatim}
;;;
;;; Make the verbatim font smaller
;;; Note that we have to temporarily change the '@' to be just a character
;;; because the \verbatim at font name uses it as a character
;;;
;;; \chardef\atcode=\catcode`\@
;;; \catcode`\@=11
;;; \renewcommand{\verbatim at font}{\ttfamily\small}
;;; \catcode`\@=\atcode
;;; This declares a new environment named ``chunk'' which has one
;;; argument that is the name of the chunk. All code needs to live
;;; between the \begin{chunk}{name} and the \end{chunk}
;;; The ``name'' is used to define the chunk.
;;; Reuse of the same chunk name later concatenates the chunks
;;; For those of you who can't read latex this says:
;;; Make a new environment named chunk with one argument
;;; The first block is the code for the \begin{chunk}{name}
;;; The second block is the code for the \end{chunk}
;;; The % is the latex comment character
;;; We have two alternate markers, a lightweight one using dashes
;;; and a heavyweight one using the \begin and \end syntax
;;; You can choose either one by changing the comment char in column 1
;;; \newenvironment{chunk}[1]{% we need the chunkname as an argument
;;; {\ }\newline\noindent% make sure we are in column 1
;;; %{\small $\backslash{}$begin\{chunk\}\{{\bf #1}\}}% alternate begin mark
;;; \hbox{\hskip 2.0cm}{\bf --- #1 ---}% mark the beginning
;;; \verbatim}% say exactly what we see
;;; {\endverbatim% process \end{chunk}
;;; \par{}% we add a newline
;;; \noindent{}% start in column 1
;;; \hbox{\hskip 2.0cm}{\bf ----------}% mark the end
;;; %$\backslash{}$end\{chunk\}% alternate end mark (commented)
;;; \par% and a newline
;;; \normalsize\noindent}% and return to the document
;;; This declares the place where we want to expand a chunk
;;; Technically we don't need this because a getchunk must always
;;; be properly nested within a chunk and will be verbatim.
;;; \providecommand{\getchunk}[1]{%
;;; \noindent%
;;; {\small $\backslash{}$begin\{chunk\}\{{\bf #1}\}}}% mark the reference
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 3 GLOBALS
;;; The *chunkhash* variable will hold the hash table of chunks.
;;;
;;; Every time we find a \begin{chunk}{name} ... \end{chunk} we look
;;; in this hash table. If the ``name'' is not found we add it.
;;; If the name is found, we concatentate it to the existing chunk.
(defvar *chunkhash* nil "this hash table contains the chunks found")
;;; This shows critical information for debugging purposes
(defvar *chunknoise* nil "turn this on to debug internals")
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 4 THE TANGLE COMMAND
;;;
;;; The tangle command does all of the work of extracting code.
;;; For legacy reasons we support 2 syntax forms, latex and noweb
;;;
;;; In latex form the code blocks are delimited by
;;; \begin{chunk}{name}
;;; ... (code for name)...
;;; \end{chunk}
;;;
;;; and referenced by \getchunk{name} which gets replaced by the code
;;; In noweb form the code blocks are delimited by
;;; <<name>>=
;;; ... (code for name)...
;;; @
;;;
;;; and referenced by <<name>> which gets replaced by the code
:;; There are several ways to invoke the tangle function.
;;;
;;; The first argument is always the file from which to extract code
;;;
;;; The second argument is the name of the chunk to extract
;;; If the name starts with < then we assume noweb format as in:
;;; (tangle "clweb.pamphlet" "<<name>>") <== noweb syntax
;;; Otherwise we assume latex format as in:
;;; (tangle "clweb.pamphlet "name") <== latex syntax (default)
;;;
;;; The standard noweb chunk name is ``*'' but any name can be used.
;;;
;;; The third arument is the name of an output file:
;;; (tangle "clweb.pamphlet" "clweb.chunk" "clweb.spadfile")
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 5 THE TANGLE FUNCTION
;;; This routine looks at the first character of the chunk name.
;;; If it is a $<$ character then we assume noweb syntax otherwise
;;; we assume latex syntax.
;;;
;;; We initialize the chunk hashtable
;;; then read the file and store each chunk
;;; then we recursively expand the ``topchunk'' to the output stream
(defun tangle (filename topchunk &optional file)
"Extract the source code from a pamphlet file"
(let ((noweb? (char= (schar topchunk 0) #\<)))
(setq *chunkhash* (make-hash-table :test #'equal))
(when *chunknoise* (format t "PASS 1~%"))
(gcl-hashchunks (gcl-read-file filename) noweb?)
(when *chunknoise* (format t "PASS 2~%"))
(if (and file (stringp file))
(with-open-file (out file :direction :output)
(gcl-expand topchunk noweb? out))
(gcl-expand topchunk noweb? t))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 6 GCL-READ-FILE (aka read-sequence)
;;; This would be read-sequence in ansi common lisp. Here we read
;;; a line, push it onto a stack and then reverse the stack. The
;;; net effect is a list of strings, one per line of the file.
(defun gcl-read-file (streamname)
"Implement read-sequence in GCL"
(let (result)
(with-open-file (stream (open streamname))
(do (line eof)
((eq line 'done) (nreverse result))
(multiple-value-setq (line eof) (read-line stream nil 'done))
(unless (eq line 'done) (push line result))))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 7 GCL-HASHCHUNKS
;;; gcl-hashchunks gathers the chunks and puts them in the hash table
;;;
;;; if we find the chunk syntax and it is a
;;; define ==> parse the chunkname and start gathering lines onto a stack
;;; end ==> push the completed list of lines into a stack of chunks
;;; already in the hash table
;;; otherwise ==> if we are gathering, push the line onto the stack
;;; a hash table entry is a list of lists such as
;;; (("6" "5") ("4" "3") ("2" "1"))
;;; each of the sublists is a set of lines in reverse (stack) order
;;; each sublist is a single chunk of lines.
;;; there is a new sublist for each reuse of the same chunkname
;;; If the noweb argument is non-nil we assume that we are parsing
;;; using the noweb syntax. A nil argument implies latex syntax.
(defun gcl-hashchunks (lines noweb)
"Gather all of the chunks and put them into a hash table"
(let (type name chunkname oldchunks chunk gather)
(dolist (line lines)
(if noweb
(multiple-value-setq (type name) (ischunk-noweb line))
(multiple-value-setq (type name) (ischunk-latex line)))
(cond
((eq type 'define)
(when *chunknoise* (format t "DEFINE name=~a~%" name))
(setq chunkname name)
(setq gather t))
((eq type 'end)
(when *chunknoise*
(format t "END name= ~a chunk=~s~%" chunkname (reverse chunk)))
(setq oldchunks (gethash chunkname *chunkhash*))
(setf (gethash chunkname *chunkhash*) (push chunk oldchunks))
(setq gather nil)
(setq chunk nil))
(gather ;; collect lines into the chunk while gather is true
(push line chunk))))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 8 GCL-EXPAND
;;; gcl-expand will recursively expand chunks in the hash table
;;;
;;; latex chunk names are just the chunkname itself e.g. chunkname
;;; noweb chunk names include the delimiters, e.g: <<chunkname>>
;;; a hash table entry is a list of lists such as
;;; (("6" "5") ("4" "3") ("2" "1"))
;;; so to process the chunk we reverse the main list and
;;; for each sublist we reverse the sublist and process the lines
;;; if a chunk name reference is encountered in a line we call expand
;;; recursively to expand the inner chunkname.
(defun gcl-expand (chunk noweb? file)
"Recursively expand a chunk into the output stream"
(let ((chunklist (gethash chunk *chunkhash*)) type name)
(dolist (chunk (reverse chunklist))
(dolist (line (reverse chunk))
(if noweb?
(multiple-value-setq (type name) (ischunk-noweb line))
(multiple-value-setq (type name) (ischunk-latex line)))
(if (eq type 'refer)
(progn
(when *chunknoise* (format t "REFER name=~a~%" name))
(gcl-expand name noweb? file))
(format file "~a~%" line))))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 9 ISCHUNK-LATEX
;;; There is a built-in assumption (in the ischunk-* functions)
;;; that the chunks occur on separate lines and that the indentation
;;; of the chunk reference has no meaning.
;;;
;;; ischunk-latex recognizes chunk names in latex convention
;;;
;;; There are 3 cases to recognize:
;;; \begin{chunk}{thechunkname} ==> 'define thechunkname
;;; \end{chunk} ==> 'end nil
;;; \getchunk{thechunkname} ==> 'refer thechunkname
(defun ischunk-latex (line)
"Find chunks delimited by latex syntax"
(let ((mark (search "chunk" line)) ; is this a line we care about?
(point 0)
name
(beginstring "\\begin{chunk}{") ; this is the define marker string
beginlength
(endstring "\end{chunk}") ; this is the end marker string
(referstring "\getchunk{") ; this is the refer string
referlength)
(setq beginlength (length beginstring))
(setq referlength (length referstring))
(when mark
(cond
((setq mark (search beginstring line)) ; recognize define
(setq point (position #\} line :start (+ mark beginlength)))
(cond
((null point) (values nil nil))
((= point 0) (values nil nil))
(t
(setq name (subseq line (+ mark beginlength) point))
;(print (list 'ischunk-latex 'define name))
(values 'define name))))
((setq mark (search endstring line)) ; recognize end
;(print (list 'ischunk-latex 'end))
(values 'end nil))
((setq mark (search referstring line)) ; recognize reference
(setq point (position #\} line :start (+ mark referlength)))
(cond
((null point) (values nil nil))
((= point 0) (values nil nil))
(t
(setq name (subseq line (+ mark referlength) point))
;(print (list 'ischunk-latex 'refer name))
(values 'refer name))))
(t (values nil nil))))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 10 ISCHUNK-NOWEB
;;; ischunk-noweb recognizes chunk names using the noweb convention
;;;
;;; There are 3 cases to recognize:
;;; <<thechunkname>>= ==> 'define thechunkname
;;; @ ==> 'end nil
;;; <<thechunkname>> ==> 'refer thechunkname
(defun ischunk-noweb (line)
"Find chunks delimited by noweb syntax"
(let ((len (length line)) (mark (position #\> line)) (point 0))
(cond
((and mark ; recognize define
(> len (+ mark 2))
(char= #\< (schar line 0))
(char= #\< (schar line 1))
(char= #\> (schar line (+ mark 1)))
(char= #\= (schar line (+ mark 2))))
;(print (list 'define (subseq line 0 (+ mark 2))))
(values 'define (subseq line 0 (+ mark 2))))
((and mark ; recognize reference
(> len (+ mark 1))
(char= #\> (schar line (+ mark 1))))
(setq point (position #\< line))
(if
(and point
(< point (- mark 2))
(char= #\< (schar line (+ point 1))))
(values 'refer (subseq line point (+ mark 2)))
(values 'noise nil)))
((and (> len 0) ; end chunk
(char= #\@ (schar line 0)))
(values 'end nil))
(t (values nil nil)))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 11 allchunks
;;;
;;; allchunks will make a single pass over a book extracting any chunk
;;; that fits the PATTERN from the FROMFILE to the TODIR. The chunk
;;; format is either noweb (if true) or latex (if false).
;;;
;;; allchunks takes 4 arguments,
;;; the PATTERN (a string like ".help>>"
;;; the FROMFILE (a string like "/axiom/books/bookvol5.pamphlet")
;;; the TODIR (a string like "/axiom/mnt/ubuntu/doc/spadhelp")
;;; and a boolean NOWEB? (true is noweb format chunks, false is latex style)
;;;
;;; a chunk name is expected to be of the form:
;;; <<FROMFILE.PATTERN>>=
;;; which means that a chunk matching the pattern (e.g. ".input>>")
;;; will be extracted to the file TODIR/FROMFILE.PATTERN
;;;
;;; This is used for <<foo.help>> and <<foo.input>> file extraction.
;;; allchunks is used to extract help files and input files in a single
;;; pass over the books. Since there are hundreds of input files and
;;; help files this is a significant speedup.
(defun allchunks (pattern fromfile todir noweb?)
(setq *chunkhash* (make-hash-table :test #'equal))
(when *chunknoise* (format t "PASS 1~%"))
(gcl-hashchunks (gcl-read-file fromfile) noweb?)
(when *chunknoise* (format t "PASS 2~%"))
(maphash #'(lambda (key value)
(if (search pattern key)
(let ((filename key) helpfile)
(when noweb? (setq filename (subseq key 2 (- (length key) 2))))
(setq helpfile (concatenate 'string todir "/" filename))
(with-open-file (out helpfile :direction :output)
(format t "extracting ~a~%" helpfile)
(gcl-expand key noweb? out)))))
*chunkhash*))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 12 makeHelpFiles
;;;
;;; The makeHelpFiles function creates all of the help files in a single
;;; pass over the file. The usual method of extracting each individual
;;; help file requires hundreds of passes over the file.
;;;
;;; An example call is:
;;;
;;; (makeHelpFiles)
;;;
;;; This will find all of the .help chunks in books of interest
;;; and write each chunk to the target directory in its own filename.
;;; So if a chunk name is <<somedomain.help>> the above call will create
;;; the file "/tmp/help/somedomain.help" containing the chunk value.
;;; Help documentation for algebra
;;; The help documentation for algebra files lives within the algebra
;;; pamphlet. The help chunk contains the name of the domain, thus:
;;; <<thisdomain.help>>=
;;; ====================================================================
;;; thisdomain examples
;;; ====================================================================
;;;
;;; (documentation for this domain)
;;;
;;; examplefunction foo
;;; output
;;; Type: thetype
;;;
;;; See Also:
;;; o )show thisdomain
;;; o $AXIOM/bin/src/doc/algebra/thisfile.spad.dvi
;;;
;;; @
;;; The .help files are automatically extracted by code in books/tangle.lisp
;;; and placed in the directory \verb|${HELP}|.
;;;
;;; The documentation starts off with the domain enclosed in two lines
;;; of equal signs. The documentation is free format. Generally the
;;; functions are indented two spaces, the output is indented 3 spaces,
;;; and the Type field has been moved toward the center of the line.
;;;
;;; The ``See Also:'' section lists the domain with the ``show'' command
;;; and the path to the source file in dvi format.
(defun makeHelpFiles ()
(let ((AXIOM (si::getenv "AXIOM")) (BOOKS (si::getenv "BOOKS")) HELP PAT)
(setq HELP (concatenate 'string AXIOM "/doc/spadhelp"))
(setq PAT1 ".help")
(setq PAT2 ".help>>")
(allchunks PAT1 (concatenate 'string BOOKS "/bookvol5.pamphlet") HELP nil)
(allchunks PAT2 (concatenate 'string BOOKS "/bookvol10.2.pamphlet") HELP t)
(allchunks PAT2 (concatenate 'string BOOKS "/bookvol10.3.pamphlet") HELP t)
(allchunks PAT2 (concatenate 'string BOOKS "/bookvol10.4.pamphlet") HELP t)
(allchunks PAT2 (concatenate 'string BOOKS "/bookvol10.5.pamphlet") HELP t)))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 13 makeInputFiles
;;;
;;; The makeInputFiles function creates all of the input files in a single
;;; pass over the file. The usual method of extracting each individual
;;; input file requires hundreds of passes over the file.
;;;
;;; An example call is:
;;;
;;; (makeInputFiles)
;;;
;;; This will find all of the .input chunks in the books
;;; and write each chunk to the target directory in its own filename.
;;; So if a chunk name is <<somedomain.input>> the above call will create
;;; the file "/tmp/help/somedomain.input" containing the chunk value.
(defun makeInputFiles ()
(let ((SPD (si::getenv "SPD")) (BOOKS (si::getenv "BOOKS")) INPUT PAT)
(setq INPUT (concatenate 'string SPD "/int/input"))
(setq PAT ".input>>")
(allchunks PAT (concatenate 'string BOOKS "/bookvol10.2.pamphlet") INPUT t)
(allchunks PAT (concatenate 'string BOOKS "/bookvol10.3.pamphlet") INPUT t)
(allchunks PAT (concatenate 'string BOOKS "/bookvol10.4.pamphlet") INPUT t)
(allchunks PAT (concatenate 'string BOOKS "/bookvol10.5.pamphlet") INPUT t)))
More information about the pro
mailing list