[climacs-cvs] CVS update: papers/ilc2005/syntax/climacssyntax.bib papers/ilc2005/syntax/climacssyntax.tex
Christophe Rhodes
crhodes at common-lisp.net
Tue May 24 14:23:45 UTC 2005
Update of /project/climacs/cvsroot/papers/ilc2005/syntax
In directory common-lisp.net:/tmp/cvs-serv7545
Modified Files:
climacssyntax.bib climacssyntax.tex
Log Message:
Add Tomita's parser reference.
Mostly trivial red-ink local modifications to the text. I have a
tendency to overly-passive sentence structure; it may be that I have
pushed it too far in that direction. Do not feel inhibited from further
changing things.
Date: Tue May 24 16:23:44 2005
Author: crhodes
Index: papers/ilc2005/syntax/climacssyntax.bib
diff -u papers/ilc2005/syntax/climacssyntax.bib:1.10 papers/ilc2005/syntax/climacssyntax.bib:1.11
--- papers/ilc2005/syntax/climacssyntax.bib:1.10 Tue May 24 11:20:19 2005
+++ papers/ilc2005/syntax/climacssyntax.bib Tue May 24 16:23:44 2005
@@ -210,7 +210,21 @@
OPTkey = {},
volume = {3},
number = {4},
- OPTpages = {553--561},
+ pages = {553--561},
+ OPTmonth = {},
+ OPTnote = {},
+ OPTannote = {}
+}
+
+ at Article{tomita,
+ author = {Masaru Tomita},
+ title = "{An Efficient Augmented--Context--Free Parsing Algorithm}",
+ journal = {Computational Linguistics},
+ year = {1987},
+ OPTkey = {},
+ volume = {13},
+ number = {1--2},
+ pages = {31--46},
OPTmonth = {},
OPTnote = {},
OPTannote = {}
Index: papers/ilc2005/syntax/climacssyntax.tex
diff -u papers/ilc2005/syntax/climacssyntax.tex:1.27 papers/ilc2005/syntax/climacssyntax.tex:1.28
--- papers/ilc2005/syntax/climacssyntax.tex:1.27 Tue May 24 13:43:08 2005
+++ papers/ilc2005/syntax/climacssyntax.tex Tue May 24 16:23:44 2005
@@ -71,21 +71,22 @@
\section{Introduction}
The field of advanced text editors is a crowded one, with a long
-history and an apparent ability to cause passionate argument. Climacs
+history and an apparent ability to cause passionate argument. Climacs
is philosophically part of the Emacs editor tradition, which has
spawned many variants with many different approaches to buffer
management, incremental redisplay, and syntax analysis. Emacs itself
traces its lineage to TECO, where Emacs was originally implemented as
-a set of TECO macros. Climacs is compared to several interesting
-variants of Emacs in table \ref{table:editorcompare}; more information
-about text editing in general, and some editors we shall not discuss
-further, can be found in \cite{FinsethCraft,greenberg,Pike94,woodZ}
-and references therein.
+a set of TECO macros. A summary comparison of Climacs to a
+non-exhaustive set of Emacs variants is presented in table
+\ref{table:editorcompare}; more information about text editing in
+general, and particulars of some editors we shall not discuss further,
+can be found in \cite{FinsethCraft,greenberg,Pike94,woodZ} and
+references therein.
\begin{table}
\begin{center}
{\small
-\begin{tabular}{|c|c|c|c|c|}
+\begin{tabular}{|c|c|c|c|}
\hline
\textbf{Editor} & \textbf{Buffer Implementation} & \textbf{Syntax Analysis} & \textbf{Language}
\\
@@ -113,40 +114,39 @@
Climacs' syntax analysis is a flexible protocol which can be
implemented with a full language lexer and parser. GNU Emacs, the most
commonly used Emacs-like editor, uses regular expressions for its
-syntax analysis. Because these regular expressions are applied lazily
-and not on the whole buffer, constructs such as Common Lisp's nestable
-\verb+#| |#+ block comments will often confuse the regular
-expressions. If the parser starts after the opening \verb+#|+ then the
-closing \verb+|#+ will be treated as the start of an escaped symbol
-name. Even if the regular expression parses the whole block comment
-correctly, other expressions can still match on the contents of the
-comment, leading to issues when the first character in a column in the
-block comment is the start of a definition. Emacs users quickly learn
-to insert a space before the open parenthesis to work around Emacs'
-font-lock deficiencies.
+syntax analysis. As well as the issue that regular expressions cannot
+be used to parse the general case of non-regular constructs such as
+Common Lisp's nestable \verb+#| |#+ block comments, the lazy
+application of those regular expressions can lead to additional
+erroneous parses: if the parser starts after the opening \verb+#|+
+then the closing \verb+|#+ will be treated as the start of an escaped
+symbol name. Even if the regular expression parses the whole block
+comment correctly, other expressions can still match on the contents
+of the comment, leading to issues when the first character in a column
+in the block comment is the start of a definition: Emacs users quickly
+learn to insert a space before the open parenthesis to work around
+Emacs' font-lock deficiencies.
The Climacs text editor is a combination of frameworks for buffer
representation and parsing, loosely coupled with a CLIM-based
\cite{mckayacm} display engine. It includes the Flexichain library
\cite{flexichain}, which provides an editable sequence representation
-and mark (cursor) management using a simple linked list for
-implementing the buffer protocol. Climacs also includes an
-implementation of a slight modification of the Earley parsing
-algorithm \cite{earley}, to assist in the creation of syntax-aware
-editing modes, though such modes can use any appropriate parsing
-algorithm. An application can combine a particular implementation of
-the buffer protocol, the syntax protocol, and its own display methods
-to produce a sophisticated editor for a particular language.
-
-The Climacs buffer protocol, which provides a standard interface to
-common text editor buffer operations, uses the Flexichain library; we
-discuss this protocol in more detail in section \ref{sec:buffer}. The
-syntax protocol, which we discuss in section \ref{sec:syntax},
-provides a method for interfacing a lexical analyzer and parser with
-the text editor, and provides for defining methods to draw syntax
-objects in the Climacs window. In section \ref{sec:syntaxes} we
-discuss the implementation of syntactic analyses for various
-programming languages, including Common Lisp; in section
+and mark (cursor) management, and an implementation of the Earley
+parsing algorithm \cite{earley}, to assist in the creation of
+syntax-aware editing modes. An application can combine a particular
+implementation of the buffer protocol, the syntax protocol, and its
+own display methods to produce a sophisticated editor for a particular
+language.
+
+The rest of this paper is organised as follows: we discuss the Climacs
+buffer protocol, which provides a standard interface to common text
+editor buffer operations, in section \ref{sec:buffer}. The syntax
+protocol, which we discuss in section \ref{sec:syntax}, provides a
+mechanism for attaching a lexical analyser and parser to the text
+editor, and provides for defining methods to draw syntax objects in
+the Climacs window. In section \ref{sec:syntaxes} we present some
+details of the implementation of syntactic analysis of editor buffers
+for various programming languages, including Common Lisp; in section
\ref{sec:tabeditor}, we discuss an application with Climacs at its
core to support editing a textual representation of lute tablature.
We discuss avenues for further development in section
@@ -155,99 +155,96 @@
\section{Buffer Protocol}
\label{sec:buffer}
-Climacs abstracts the implementation of editable sequences as editor
-buffers using the buffer protocol. The buffer protocol provides a set
-of methods for modifying and reading the contents of a buffer, and
-setting and retrieving marks in the buffer. Buffers can contain
-arbitrary objects. Typically these objects are characters, but the
-buffer can contain any object that the syntax protocol can display in
-the Climacs window. The buffer protocol is independent of any
-implementation of the protocol, which allows flexible representations
-of buffers.
-
-Currently Climacs uses a single Flexichain ``cursorchain'' as the
-editable sequence representation for standard buffers. A cursorchain
-is a circular gap buffer with an arbitrary number of cursors in the
-buffer. Climacs uses these cursors as the implementation of marks in
-the buffer. The single gap-buffer implementation is used by many other
-editors, including GNU Emacs. Flexichain improves on this by making
-the gap buffer circular in its underlying representation; the start of
-the sequence is stored in a separate slot, along with the beginning of
-the gap.
-
-Climacs also provides three purely functional (aka fully persistent)
-buffer implementations, all based on work in progress \cite{dessy} by
-Robert Will in Haskell, which builds upon older work by Stephen Adams
-\cite{adams}. The underlying data structure is a balanced binary tree
-with an abstracted-away rebalancing scheme, supporting sequence
-operations needed by the Climacs buffer protocol at reasonable speed
-($O(\log~N$)). The first implementation, {\tt binseq-buffer}, uses
-one tree whose leaf nodes (buffer elements) can be arbitrary objects.
-An optimized implementation, {\tt obinseq-buffer}, uses less space but
-buffer elements must be non-nil atoms. Finally, {\tt binseq2-buffer}
-combines the previous two implementations, by using a tree whose leaf
-nodes contain the optimized trees representing lines; the benefit of
-this implementation are faster ($O(\log~N)$) operations dealing with
-lines and columns. All three implementations enable simple and
-inexpensive undo/redo operations because older buffer versions are
-kept as a whole in memory. The space cost of these implementations is
-not negligible. However, significant portions of older buffer versions
-are simply shared with newer buffer versions. Also, it is not
-necessary separately to remember editing operations in undo records,
-in order to preserve precise buffer history. Besides the undo
+The Climacs buffer protocol abstracts the operations performed on an
+editor buffer -- or any editable sequence of arbitrary objects -- from
+the implementation of those operations on a given data structure.
+This protocol is a set of generic functions for modifying and reading
+the contents of a buffer, and setting and retrieving marks in the
+buffer. The protocol abstraction is independent of any particular
+implementation, allowing flexible representations of buffers.
+
+Currently Climacs uses a single {\tt cursorchain} from the Flexichain
+library as the editable sequence representation for standard buffers.
+A \texttt{cursorchain} is a circular gap buffer with an arbitrary
+number of cursors in the buffer. Climacs uses these cursors as the
+implementation of marks in the buffer. The single gap-buffer
+implementation is used by many other editors, including GNU Emacs.
+Flexichain improves on this by making the gap buffer circular in its
+underlying representation; the start of the sequence is stored in a
+separate slot, along with the beginning of the gap.
+
+Climacs also provides three purely functional (or fully persistent)
+buffer implementations, all based on functional data structures
+\cite{adams,dessy}. The underlying data structure is a balanced
+binary tree with an abstracted-away rebalancing scheme, supporting
+sequence operations needed by the Climacs buffer protocol at
+reasonable $O(\log~n)$ efficiency. The first implementation, {\tt
+ binseq-buffer}, uses one tree whose leaf nodes (buffer elements) can
+be arbitrary objects. An optimised implementation, {\tt
+ obinseq-buffer}, uses less space but buffer elements must be
+non-{\tt nil} atoms. Finally, {\tt binseq2-buffer} combines the
+previous two implementations, by using a tree whose leaf nodes contain
+the optimised trees representing lines; the benefit of this
+implementation are faster ($O(\log~n)$, compared with $O(n)$)
+operations dealing with lines and columns. All three implementations
+enable simple and inexpensive undo/redo operations because older
+buffer versions are kept as a whole in memory, so there is no need to
+store editing operations to facilitate undo. Besides the undo
operation simplification, the persistent buffer implementations
facilitate further purely functional operations on Climacs buffers.
-
-Climacs is intended to provide other buffer implementations, one of
-which will use a sequence of lines organized into a tree for quick
-access. In this structure, a line can be considered opened or
-closed. When a line is opened, it is represented as a Flexichain
-cursorchain. All editing operations are performed on open lines. A
-fixed number of lines are kept opened according to a
-least-recently-used scheme. When a line is closed it is converted to a
-vector for efficient storage and access. If the line contains only
-base-char objects this vector is a base-string; otherwise, it is
-unspecialized.
+The space cost of these implementations is not negligible, though it
+is alleviated by sharing significant portions of older buffer versions
+with newer versions.
+
+There is scope for provision of further buffer implementations in
+Climacs, one of which will use a sequence of lines organised into a
+tree for quick access. In this structure, a line can be considered
+opened or closed. When a line is opened, it is represented as a
+Flexichain {\tt cursorchain}. All editing operations are performed on
+open lines, and a fixed number of lines are kept opened according to a
+least-recently-used scheme. When a line is closed it is converted to
+a vector for efficient storage and access. If the line contains only
+{\tt base-char} objects this vector is a {\tt base-string}; otherwise,
+it is unspecialised.
This structure has the advantage of efficient line-based access in the
-buffer. It also provides much less pessimistic behavior than that of a
-single gap buffer when a user is editing two disparate sections in a
-large file. With a single gap buffer, the gap must be moved to the
-point of edit before an edit operation is allowed. When the buffer is
-large and edit operations occur frequently at multiple locations in
-the buffer, this requires a substantial amount of copying between
-edits. In this situation single-gap-buffer editors such as GNU Emacs
-will noticably pause between edits to move the gap. A structure which
-contains a sequence of lines and keeps the most recently used lines
-open as gap buffers can operate as a multi-gap buffer with automatic
-gap placement, while not providing pessimistic performance when
-accessing a specific line in the buffer.
+buffer. It also provides much better behaviour than that of a single
+gap buffer when a user is editing two disparate sections in a large
+file. With a single gap buffer, the gap must be moved to the point of
+edit before an edit operation is allowed; when the buffer is large and
+edit operations occur frequently at multiple locations in the buffer,
+this requires a substantial amount of copying between edits. In this
+situation single-gap-buffer editors such as GNU Emacs will noticeably
+pause between edits to move the gap. A structure which contains a
+sequence of lines and keeps the most recently used lines open as gap
+buffers can operate as a multi-gap buffer with automatic gap
+placement, while not suffering poor performance when accessing a
+specific line in the buffer.
-The efficiency of Climacs buffers depends greatly on the
+The efficiency of Climacs buffers depends of course on the
implementation of the buffer protocol that is used. Space efficiency
will also depend on the implementation of Common Lisp which is used to
-run Climacs. In Steel Bank Common Lisp, the type character is
+run Climacs. In Steel Bank Common Lisp, the type {\tt character} is
represented with the full 21 bits used by the Unicode character space.
-External character encodings other than Unicode are converted to
-UTF-32 when a file is read in to memory. If the characters are stored
-in a specialized array, this will net a worst case space efficiency of
-four bytes of every byte in the file. However, the time advantages of
-this representation outweigh the space inefficiency. Searching for an
-individual character in a sequence of $n$ characters encoded in UTF-8
-(or other variable-length encoding) is $O(n)$, because each individual
-character must be examined to determine the number of octets which are
-stored to represent that character.
+External character encodings are converted to UTF-32 when a file is
+read in to memory. If the characters are stored in a specialised
+array, this will net a worst case space efficiency of four bytes of
+every byte in the file. However, the time advantages of this
+representation outweigh the space inefficiency for our purposes.
+Searching for an individual character in a sequence of $n$ characters
+encoded in UTF-8 (or other variable-length encoding) is $O(n)$,
+because each individual character must be examined to determine the
+number of octets which are stored to represent that character.
The Flexichain library was designed to be able to take advantage of
-specialized lisp vectors for compact storage, though this possibility
-is not used by the Climacs buffer implementation. Instead Climacs
-uses an unspecialized vector for its storage, which uses one machine
-word per element, either as an immediate value or as a pointer to a
-larger element. Climacs buffers can contain any object, so in a
-suitably complex syntax and buffer protocol implementation any buffer
-object might correspond to an arbitrary number of bytes in the
-file. For instance, it is concievable that a buffer implementation
-might compress sections of the buffer which are not in use.
+specialised lisp vectors for compact storage, though this possibility
+is not used by the current Climacs buffer implementation. Instead
+Climacs uses an unspecialised vector for its storage, which uses one
+machine word per element, either as an immediate value or as a pointer
+to a larger element. More space-efficient buffer implementations are
+possible, should it be necessary: for instance, it is conceivable that
+a buffer implementation might choose to compress sections of the
+buffer which are not in use.
\section{Syntax Protocol}
\label{sec:syntax}
@@ -255,7 +252,7 @@
\begin{figure*}
\begin{center}
\includegraphics{parserclasses}
- \caption{Organization of classes used by a typical syntax}
+ \caption{Organisation of classes used by a typical syntax}
\label{fig:syntaxclasses}
\end{center}
\end{figure*}
@@ -264,72 +261,75 @@
buffer parsers and syntax-aware display mechanisms. The set of hooks
that Climacs provides to allow this is the syntax protocol. A syntax
in Climacs is a class and set of methods which provide a lexical
-analyzer, parser, and display methods for a buffer. The incremental
+analyser, parser, and display methods for a buffer. The incremental
parser associated with a syntax creates and updates a parse tree of a
buffer's contents, and provides a mechanism for drawing these parsed
objects in a window. The parser accepts lexemes produced by an
-incremental lexical analyzer. Display is handled by drawing methods
+incremental lexical analyser. Display is handled by drawing methods
implemented using the CLIM high-level display toolkit.
-Though a syntax is free to choose its own implementation strategy,
-lexical analysis and parsing of the buffer is typically done in an
-object-oriented fashion. The lexer operates on objects in the buffer,
-usually characters, and returns objects of various classes. Each
-parser production is represented by a class. In complex syntaxes, the
-parser rules can be quite complicated and involve arbitrary code, but
-for a simple grammar the parsing rules can be entirely represented by
-matching on the classes returned by the tokenizer and parser. Figure
-\ref{fig:syntaxclasses} shows the organization of classes in the
-TTCN-3 grammar.
+Though an implementation of a syntax is free to choose its own parser
+implementation strategy, lexical analysis and parsing of the buffer is
+typically done in an object-oriented fashion. The lexer operates on
+objects in the buffer, usually characters, and returns objects of
+various classes. Each parser production is represented by a class. In
+complex syntaxes, the parser rules can be quite complicated and
+involve arbitrary code, but for a simple grammar the parsing rules can
+be entirely represented by matching on the classes returned by the
+tokeniser and parser. Figure \ref{fig:syntaxclasses} shows the
+organisation of classes in the TTCN-3 grammar.
-The syntax analysis can be applied either in a per-window or
-per-buffer function. The per-window approach is best suited to
+The syntax analysis is divided between a per-window and a per-buffer
+function. Performing the analysis per-window is best suited to
analysis of text where the parse tree will be used only for display
and editing, as it is less important if the parse tree for off-screen
-text is up-to-date at every point during an edit. The per-buffer
+text is up-to-date at every point during an edit. The per-buffer
approach is appropriate when the parse tree will also be used for some
-other display or analysis of the text in the buffer.
-
-Climacs includes a parser that uses the Earley \cite{earley} parsing
-algorithm. There are many advantages of this algorithm in the context
-of text editing. Perhaps most importantly, it does not require any
-preprocessing of the grammar, which would make it necessary for the entire
-grammar to be known ahead of time. This means that the user can
-load Lisp files containing additional syntax rules to complete the
-existing ones without having to apply any costly grammar analysis.
-Other advantages include the possibility of using ambiguous grammars,
+other display or analysis of the text in the buffer, though it is also
+used for invalidating a region of the previous parse based on the
+extent of the region damaged by an edit.
+
+Climacs includes a parser generator that implements the Earley
+\cite{earley} algorithm. There are many advantages of this algorithm
+in the context of text editing. Perhaps most importantly, no grammar
+preprocessing is required: so it is not necessary for the entire
+grammar to be known ahead of time. This means that the user can load
+Lisp files containing additional syntax rules to complete the existing
+ones without having to apply any costly grammar analysis. Other
+advantages include the possibility of handling ambiguous grammars,
since the Earley parsing algorithm accepts the full class of
context-free grammars. This feature is crucial in certain
-applications, for instance in a grammar checker for natural
-languages. The Climacs syntax protocol can, but is not required
-to, use the provided Earley parser. It can use any algorithm with an
-explicit representation of the parser state, which is a necessary, but
+applications, for instance in a grammar checker for natural languages.
+Implementations of the Climacs syntax protocol may, but are not
+required to, use the provided Earley parser: any algorithm with an
+explicit representation of the parser state (which is a necessary, but
not sufficient, requirement for making the parsing algorithm
-incremental.
+incremental) is potentially suitable.
-However, the Earley parsing algorithm is relatively slow compared to
-table-based algorithms such as the LR shift/reduce algorithm.
-Worst-case complexity is $O(n^3)$ where $n$ is the size of the input.
-It drops to $O(n^2)$ for unambiguous grammars and to $O(n)$ for a
-large class of grammars suitable for parsing programming langauges.
-Even so, the complexity is often proportional to the size of the
-grammar (which is considered a constant by Earley), which can be
-problematic in a text editor. We have yet to determine whether the
-implementation of the Earley algorithm that we provide will turn out
-to be sufficiently fast for most Climacs syntax modules. Other
-possibilities include the Tomita parsing algorithm which provides more
-generality than LR, but which is nearly as fast in most cases.
+It should be noted that the Earley parsing algorithm is relatively
+slow compared to table-based algorithms such as the LR shift/reduce
+algorithm. Worst-case complexity is $O(n^3)$ where $n$ is the size of
+the input. It drops to $O(n^2)$ for unambiguous grammars and to
+$O(n)$ for a large class of grammars suitable for parsing programming
+languages. Additionally, the complexity is often proportional to the
+size of the grammar (which is considered a constant by Earley), which
+can be problematic in a text editor. We have yet to determine whether
+the implementation of the Earley algorithm that we provide will turn
+out to be sufficiently fast for most Climacs syntax modules. Other
+possibilities include the Tomita parsing algorithm \cite{tomita} which
+provides more generality than LR, but which is nearly as fast in most
+cases.
\section{Syntaxes}
\label{sec:syntaxes}
-We describe two different approaches to syntax analysis in the Climacs
-editor. Per-window parsing is used by the provided modes for HTML,
-Common Lisp, Prolog, and a Testing Control Notation (TTCN-3). Each of
-these syntaxes is implemented with the provided Earley parser
-\cite{earley}. The lute tablature editor uses a per-buffer function
-for its syntax analysis and implements a simple state-machine parser
-for its regular notation.
+We describe examples illustrating two different approaches to syntax
+analysis in the Climacs editor. Per-window parsing is used by the
+provided modes for HTML, Common Lisp, Prolog, and a Testing Control
+Notation (TTCN-3). Each of these syntaxes is implemented with the
+provided Earley parser \cite{earley}. The lute tablature editor uses
+a per-buffer function for its syntax analysis and implements a simple
+state-machine parser for its regular notation.
\subsection{Per-Window Syntaxes}
@@ -337,7 +337,7 @@
function. Of these the Prolog syntax is the most complete and
implements the entire ISO Prolog syntax. The HTML, Common Lisp, and
TTCN-3 syntaxes are somewhat less complete in their implementation.
-Each syntax is free to implement its lexical analyzer and parser in
+Each syntax is free to implement its lexical analyser and parser in
the manner which is most convenient for its grammar. All of these
syntaxes use the provided implementation of the Earley parsing
algorithm, but each provides its own set of macros for defining parser
@@ -348,9 +348,9 @@
framework. Firstly, and most importantly, ISO Prolog \cite{ISOProlog}
is not a context-free grammar: \textit{terms} have an implicit
priority affecting their parse.\footnote{Formally, the grammar could
- be made context-free by introducing 2400 new production rules.} The
-implementation of Earley's algorithm, however, was able to address
-this additional complexity with no difficulty.
+ be made context-free by introducing a large number of new production
+ rules.} The implementation of Earley's algorithm, however, was able
+to address this additional complexity with no difficulty.
Another area of difficulty is the fact that parsing a Prolog text can
change the grammar itself through the use of the \texttt{op/3}
@@ -359,7 +359,7 @@
:- op(100,xfy,<>).
\end{verbatim}
in a Prolog text means that, after parsing this directive, the token
-\texttt{<>} must be recognized as an right-associative operator with
+\texttt{<>} must be recognised as an right-associative operator with
priority 100 in the grammar. This is achievable by keeping a cache of
parsed \texttt{op/3} directives, and maintaining and invalidating it
in parallel with the parse of the buffer itself.
@@ -371,16 +371,19 @@
syntax, interpretation of quoting rules and escape sequences in
strings are very variable, while additionally treatment of operators
in currently-available Prologs can differ markedly from the standard
-requirements. Nevertheless, work is underway to use Climacs' syntax
-analysis to provide a front-end for a Prolog development environment.
+requirements; this means that working code written for these Prologs
+can be flagged as a parse error by Climacs. Nevertheless, work is
+underway to use Climacs' syntax analysis to provide a front-end for a
+Prolog development environment.
At present, one parse error implies the invalidation of the rest of
the file. This adds a burden on the mode implementor that the syntax
analyser be both bug-free and correspond with reality; a
-slightly-buggy or incomplete syntax mode will render the whole thing
-useless. We plan on implementing a resynchroniziation method for
-parsers, which would allow the parse to continue at the next valid
-parsable state in the buffer; see section \ref{sec:conclusions}.
+slightly-buggy or incomplete syntax mode can severely impair the
+utility of the editor. We plan to implement a resynchronisation
+method for parsers, which would allow the parse to continue at the
+next valid parsable state in the buffer; see section
+\ref{sec:conclusions}.
The Testing and Test Control Notation 3 (TTCN-3) language \cite{TTCN3}
is a language which captures detailed test specifications. TTCN
@@ -396,7 +399,7 @@
productions in the form of ``any number of'' or ``one or more of''
receive their own non-terminal entry in the grammar. In addition, this
macro defines basic display functions for the syntax objects produced
-by the parser, with language keywords appearing in a separate color.
+by the parser, with language keywords appearing in a separate colour.
Much like the other per-window syntax modules, the one for Common Lisp
uses high-level macros, some of which come with the Earley parser
@@ -442,7 +445,7 @@
|}}
\caption{An extract from `Lachrime by I. D.' from \textit{A New
Booke of Tabliture}, published by William Barley (London,
- 1596), E1r, and its \TabCode\ encoding. The parenthesized
+ 1596), E1r, and its \TabCode\ encoding. The parenthesised
characters encode the lines joining and spanning the example,
while the individual punctuation characters refer to the
fingering marks.}
@@ -453,7 +456,7 @@
\TabCode\ \cite{tabcode} is a textual format for description of lute
tablature, a form of musical notation. In its simplest form, it is a
sequence of whitespace-delimited independent words, where each word
-represents either a set of string--fret coordinates for the player's
+represents either a set of fret--string coordinates for the player's
left hand specifying the note or chord to be played or alternatively
some other element of musical notation (such as a barline); figure
\ref{fig:besfantlach} shows a fragment of tablature, and demonstrates
@@ -522,18 +525,19 @@
of the order of 200--300 words, which requires only little time to
parse on modern hardware. However, such a parsing scheme would stress
the display engine if a complete redraw were forced on every edit, so
-we have implemented the obvious optimizations: the extent of the edit,
+we have implemented the obvious optimisations: the extent of the edit,
along with its typical locality of effect, are used to limit the
damaged region as before, so preserving the identity of unaffected
tabwords; this identity can then be used in a cache informing CLIM's
incremental redisplay mechanism.
We handle parse errors on a word-by-word basis, so that even during
-editing the vast majority of a tabcode buffer can be graphically
+editing the vast majority of a \TabCode\ buffer can be graphically
presented, rather than only up to the current location; by returning
-our best guess at the intent of a particular word and resynchronizing
+our best guess at the intent of a particular word and resynchronising
at the next whitespace, we can preserve the tablature view mostly
-unchanged for most editing operations.
+unchanged for most editing operations, and highlight individual errors
+for the user's attention.
To assist the editorial process, we have also implemented MIDI audio
feedback: in addition to a command to render the entire tablature in
@@ -547,41 +551,43 @@
\section{Future Work and Conclusions}
\label{sec:conclusions}
-Climacs is already a very competent and stable editor, especially
-given the relatively small amount of work (only a few person-months)
-that has been put into it so far. Using CLIM (and in particular the
-McCLIM \cite{ilc2002-moore} implementation) as the display engine has
-allowed the project to progress much more rapidly than would otherwise
-have been possible. However, Climacs development has also revealed
-some serious limitations and performance problems of the McCLIM
-library. Nevertheless, we maintain that using CLIM and McCLIM was the
-best choice, and in fact advantageous to other McCLIM users as well,
-as the deficiencies in the McCLIM implementation are being addressed and
-other improvements made for use with Climacs.
+Climacs is already a very capable editor, especially given the
+relatively small amount of work (only a few person-months) that has
+been put into it so far. Using CLIM (and in particular the McCLIM
+\cite{ilc2002-moore} implementation) as the display engine has allowed
+the project to progress much more rapidly than would otherwise have
+been possible. It should be noted that Climacs development has also
+revealed some serious limitations and performance problems of the
+McCLIM library. Nevertheless, we maintain that using CLIM and McCLIM
+was the best choice, and in fact advantageous to other McCLIM users as
+well, as the deficiencies in the McCLIM implementation are being
+addressed and other improvements made for use with Climacs.
Due to its reliance on fairly well-defined protocols, the Climacs text
editor framework is flexible enough to allow for different future
directions. Turning the Common Lisp syntax module into an excellent
programming tool for Lisp programmers is high on the list of
-priorities, for many reasons. First, it will encourage further work
-on Climacs. Second, the Common Lisp syntax module is likely to become
+priorities, for several reasons: first, it will encourage further work
+on Climacs; second, the Common Lisp syntax module is likely to become
one of the more advanced ones to exist for Climacs, given that Climacs
has unique and direct access to the features of the underlying Common
-Lisp implementation. Thus, the Common Lisp syntax module is likely to
-exercise the Climacs protocols to a very high degree. This will allow
-us to improve those protocols as well as their corresponding
-implementations.
+Lisp implementation.\footnote{The possibility of providing this level
+ of editor integration for Prolog, given the existence of Prolog
+ implementations embedded in Lisps, is also of interest.} Thus, the
+Common Lisp syntax module is likely to exercise the Climacs protocols
+to a very high degree. This will allow us to improve those protocols
+as well as their corresponding implementations.
The TTCN-3 grammar is currently defined on the core textual
language. For a large subset of this language, there is a direct
-correspondance between TTCN-3 textual notation and TTCN-3 Graphical
+correspondence between TTCN-3 textual notation and TTCN-3 Graphical
Representation (GR) diagrams. Implementing a live-updating TTCN-3 GR
display of a parsed buffer will, in addition to being a useful
application, serve as a demonstration of the utility of maintaining a
full parse tree of a buffer.
Another important future direction is the planned implementation of
-the buffer protocol. Representing a line being edited as a flexichain
+the buffer protocol. Representing a line being edited as a Flexichain
can greatly improve the performance of some crucial operations that
currently require looping over each buffer object until a newline
character is found. Other operations that are currently prohibitive
@@ -590,32 +596,32 @@
One disadvantage of the current parsing scheme is that a single parse
error prevents analysis of the rest of the buffer, which is
potentially disturbing to a user's workflow. For relatively simple
-grammars such as \TabCode, it is simple enough to resynchronize at the
+grammars such as \TabCode, it is simple enough to resynchronise at the
next token, whereas for more complex grammars the resolution is less
-clear. Providing a framework for customizeable resynchronising of the
+clear. Providing a framework for customisable resynchronising of the
parser after a parse error would allow for more user-friendly editing.
Our plans for Climacs go further than creating an improved
implementation of Emacs. We intend to make Climacs a fully-integrated
CLIM application. This implies among other things using the
presentation-type system of CLIM to exchange objects between Climacs
-and other CLIM appliations such as an inspector application, a
+and other CLIM applications such as an inspector application, a
debugger pane, etc. We also hope that implementors of other CLIM
applications such as mail readers, news readers, etc, will consider
using Climacs for creating messages.
-We are often asked whether applications such as VM and Gnus for GNU
-Emacs will be available for Climacs. Our opinion is that such
+We are often asked whether Emacs-based applications such as VM and
+Gnus will be available for Climacs. Our opinion is that such
applications currently run as GNU Emacs subsystems simply because GNU
Emacs does not have an independent substrate such as CLIM for creating
user interfaces. Climacs is itself a CLIM application, and
applications such as mail readers and news readers that do not require
editable buffers should instead be implemented directly as CLIM
-applications, perhaps calling Climacs to write messages.
+applications, optionally calling Climacs to compose and edit messages.
%\nocite{*}
-\section*{Acknowledgments}
+\section*{Acknowledgements}
The authors wish to thank Aleksandr Bakic for fruitful discussions.
B.M.\ acknowledges Motorola's generous support. C.R.\ is supported by
More information about the Climacs-cvs
mailing list