From bmastenb at cs.indiana.edu Fri Jul 9 15:39:56 2004 From: bmastenb at cs.indiana.edu (Brian Mastenbrook) Date: Fri, 9 Jul 2004 10:39:56 -0500 (EST) Subject: [mac-lisp-ide] Thoughts on syntax editing Message-ID: I've been thinking for several weeks about the issues involved with supporting syntax-aware editing in an editor, particularly Common Lisp syntax. Recently I wrote an incremental parser to handle syntax coloring for lisppaste, and so that experience has helped me to get a better handle on what the relevant issues are to supporting robust syntax-aware editing in a text editor. What I'm including in the umbrella of syntax-aware editing are the following tasks: * Coloring the source based on syntactic type * Detecting invalid syntax and (a) informing the user and (b) skipping over detectable sections of invalid syntax when performing other commands * Detecting sections of comment and strings, and ignoring these for other commands appropriately * and, commands which operate on the parse tree of the source, including the C-M-* functions which operate on an entire * s-expression There are several approaches to making this work. Two of them have been done before en masse, with various degrees of success: * Force the user to only edit valid syntax, and insert balanced pairs of parens / quotes / comment delimiters. This approach works fine for editing new source but does not work so well when reading in an existing, possibly invalid file, and also can feel much like a straightjacket. It also requires a large amount of adaptation to the environment. The best exemplar of this approach is Interlisp's S-Edit structural editor. * Maintain the view that the text is merely an octet stream, and locally use regexps to try to determine what syntax things are, setting character properties along the way. This approach is (as far as I understand it) the approach used by Emacs, which leads to massive confusion when editing unbalanced strings and comments. Sometimes Emacs never recovers; especially with CL-style #||# multiline comments. What I would like to see is a third path: a robust editor which always knows the syntactic type of the text, but allows the user to edit code as if they were using a plain text editor (or allows a slightly smarter mode where balanced syntax is inserted by default). This is the holy grail of useful syntactic editing. It's also very complicated. My first approach in writing such an editor revolved around viewing a buffer as a doubly-linked list of lines, which themselves were a doubly linked list composed of segments of text broken up by markers (which delineate syntactic type and also include the cursor and mark). This approach quickly got very complex: it was too difficult to write general text-manipulation routines when text was constantly being broken up by markers, and there could be an unbounded number of markers between each character. ------------------------------------------------------ | "a" | # | "bc" | # | "def" | ------------------------------------------------------ The "markers embedded in lines" approach: too complex. Dan Barlow had a better idea: keep the text as a doubly linked list of strings representing lines, but still use markers to represent the beginning and end of sections with a particular syntactic type. In other words, the markers would be separated out from the text itself, making it possible to have views of the same text with different collections of markers. Primitive editing operations would then be responsible for making sure that whatever marker invariants were necessary were preserved, but this could be simpler when markers are disjoint and it's easy to pull out the set of markers affected by an operation. This is, I think, the best approach, as it would allow keeping the "line" abstraction that hemlock uses as-is. I am envisioning that markers can be used for several different purposes - extensible at the user's request. This would include: the cursor, the mark, delimiters for syntax types, even the beginning and end of the line. Allowing multiple cursor-type markers and setting one as primary would allow collaborative editing fairly easily. (On the subject of collaborative editing, the disjoint-markers idea is probably a good one here too: it means that updates to the text can be sent out without syntactic information, and each client updates the marker set to account for the new text on its own.) The next question of relevance is to figure out how to use these markers to implement knowledge of the file syntax. This Ain't Easy, for several reasons. First, we need to know the raw syntactic role of a various section of text. Is it a symbol? A string? A list opening or closing? But if we view s-expression editing via the likes of C-M-t as the same problem as syntax coloring, this means we need to maintain a nested view of the current syntax, because finding the end of the current s-expression means understanding not just the raw syntactic role of an element but its level of nesting inside other syntactic types. Nesting is also necessary for robust coloring in Common Lisp: emacs famously fails to handle nested #||# comments (and even sometimes non-nested ones), leaving most of your buffer showing in the font lock comment color. Maintaining this nested view of syntax while allowing the user to insert unbalanced syntactic elements is more than merely a SMOP, however. A simple insertion of a character might affect the syntactic type of the entire rest of the file. Inserting a closing character would revert it back then - but possibly not a clean reversion. Deleting a character might require restarting the parser at some point prior to the previous character or syntactic type change. Here are some specific examples to think about: The cursor is on #\A in "#3A((1) (2) (3))". The user hits the delete key. The cursor is on the first #\( in "() (+ 1 2))". The user hits the #\( key. The cursor is at the beginning of the second line of the following section of text. The user hits the #\" key. ------------------------------- (format nil Mary had a little lamb. Its fleece was as white as ~A. #|" '|#snow|) ; | ------------------------------- These examples demonstrate that robust syntax-aware editing is not a simple matter of local regular expressions or of a parser that can be run until the top-level syntactic role matches what it was before the edit. A single edit may actually have a deeper meaning - for instance, in the second example, inserting a paren means "insert a level of list-ness at this position in the syntactic role stack until an unmatched close paren is found on this level". To solve the problem of editing unbalanced strings or multiline comments, Andreas Fuchs suggested that it be possible to revert the syntactic type of an entire region when the insertion of an unmatched opening element would otherwise destroy this information. I think this is a good idea. It can be implemented by "hiding" one set of markers when the unmatched opening element is inserted, and un-hiding them after. Any edits the user makes in the meantime can be rectified to both the hidden and unhidden syntactic markers, thus meaning zero reparsing when the close element is inserted. However, this sounds like just the type of SMOP that is far more difficult than it sounds. I'm curious to know what other people think about this. It seems to me that this is a rather difficult problem when the various nooks and crannies of CL syntax are taken into account (unless you're willing to live with the possibility of reparsing the entire source file on many edits). If this doesn't make any sense at all I'd like to know that too. Brian -- Brian Mastenbrook "God made the natural numbers; http://www.cs.indiana.edu/~bmastenb/ all else is the work of man." bmastenb at cs.indiana.edu -- Leopold Kroneker From pw at panix.com Fri Jul 9 16:44:54 2004 From: pw at panix.com (Paul Wallich) Date: Fri, 09 Jul 2004 12:44:54 -0400 Subject: [mac-lisp-ide] Thoughts on syntax editing In-Reply-To: References: Message-ID: <40EECB86.9030407@panix.com> Brian Mastenbrook wrote: > To solve the problem of editing unbalanced strings or multiline > comments, Andreas Fuchs suggested that it be possible to revert the > syntactic type of an entire region when the insertion of an unmatched > opening element would otherwise destroy this information. I think this > is a good idea. It can be implemented by "hiding" one set of markers > when the unmatched opening element is inserted, and un-hiding them > after. Any edits the user makes in the meantime can be rectified to > both the hidden and unhidden syntactic markers, thus meaning zero > reparsing when the close element is inserted. However, this sounds > like just the type of SMOP that is far more difficult than it sounds. > > I'm curious to know what other people think about this. It seems to me > that this is a rather difficult problem when the various nooks and > crannies of CL syntax are taken into account (unless you're willing to > live with the possibility of reparsing the entire source file on many > edits). If this doesn't make any sense at all I'd like to know that > too. This makes sense to me, mostly because I'm willing to accept solutions that don't always work. If things look terribly wrong after the user is done with the editing, there's always (or should be) something like "M-x-reparse-the-darn-file" to fix things. (Along these lines, I'm also assuming that you're limiting any real-time reparsing and recoloring to the region between the top-level (fsvo "top-level") markers that contains the editing point. There's also the obvious question of which markers should be "hidden" for display purposes and which should be "visible" -- it might be nice in some situations/modes for the displayed version to be based on the assumption that there's a closing element right after the insertion point.) paul From david at david-steuber.com Thu Jul 22 22:28:11 2004 From: david at david-steuber.com (David Steuber) Date: Thu, 22 Jul 2004 18:28:11 -0400 Subject: [mac-lisp-ide] Portable Hemlock and questions Message-ID: <6A13B952-DC2E-11D8-9543-000A959DDDE0@david-steuber.com> These are just some idle questions. Is the source from Hemlock at all useful as far as editing functions for porting over to a proper Carbon based application for an Aqua compliant IDE? Are there any other independent projects besides Clotho in this area? Is the goal of this list to work on an implementation portable IDE (OpenMCL, SBCL, etc), or is a particular implementation favored? What about integration with Interface Builder? This is a very quiet list. Could it be that many people think Emacs + SLIME (or some other non-Aqua solution) is "good enough" for now? SLIME has done some clever stuff to support a wide variety of Lisp implementations. The techniques used in that project may be applicable for a portable implementation. How much extra work is that compared to just supporting a single implementation? Is there a current wish list / spec for this project? From gb at clozure.com Thu Jul 22 23:13:49 2004 From: gb at clozure.com (Gary Byers) Date: Thu, 22 Jul 2004 17:13:49 -0600 (MDT) Subject: [mac-lisp-ide] Portable Hemlock and questions In-Reply-To: <6A13B952-DC2E-11D8-9543-000A959DDDE0@david-steuber.com> References: <6A13B952-DC2E-11D8-9543-000A959DDDE0@david-steuber.com> Message-ID: <20040722163342.B70324@clozure.com> On Thu, 22 Jul 2004, David Steuber wrote: > These are just some idle questions. > > Is the source from Hemlock at all useful as far as editing functions > for porting over to a proper Carbon based application for an Aqua > compliant IDE? I had made some progress in integrating Portable Hemlock and Cocoa; I'd hoped to be able to release that a few months ago, but haven't had time to do much of anything OpenMCL- or mac-lisp-ide- related since. I don't know how useful this would be to someone determined to use Carbon. Hemlock is strongly oriented towards a model where a single "frame" (in the Emacs sense; this means something like "window" to the rest of the world) displays multiple buffers. I had tried to get things to fit better into the Mac "document" paradigm: so that there'd be roughly a 1:1 correspondence between a Hemlock buffer and a Mac document (and generally a single document per window.) I'd managed to nudge a lot of Hemlock towards the Mac paradigm, but there was still a lot of nudging to do (some Hemlock commands/functions still think that it's involved in redisplay and layout, and don't behave correctly under Cocoa.) I think that it's fair to say that Hemlock's not very thread-aware, and some of the things that I did to try to make it behave better in a multi-window, multi-threaded environment are incredibly ugly. I suspect that many things have to be at least fairly ugly: "modal" things (like M-X and incremental search) want to block for input, but you don't really want the whole IDE to freeze because some window is waiting for the next incremental search character ... As far as Hemlock itself goes ... I still think that it's easier than writing an editor core would have been, and it provides lots of support for S-expression-based navigation out-of-the-box. Aside from the whole "who's in charge of event-processing and redisplay" issues, the biggest piece of missing functionality that I can think of in the Hemlock code base is support for pattern-based searching (e.g., to be able to have M-. find a DEFUN or DEFMETHOD pattern without getting confused by case-sensitivity, whitespace, and other details.) I think that most of the functionality that Hemlock provides is reasonable (if a bit spartan), and that the major difficulties with using Hemlock in a modern Aqua environment would be shared most other starting points (e.g., if you re-wrote [X]Emacs in CL, you'd still have to think about document/buffer/frame issues and threading and modality and ...). If anyone wants to volunteer to fix up/finish the Hemlock/Cocoa stuff I started, I can help some; I hope to be able to work on it more in the near future, but every prediction I've made so far about how "near" has proven wildly optimistic. From mikel at evins.net Fri Jul 23 03:52:30 2004 From: mikel at evins.net (mikel evins) Date: Thu, 22 Jul 2004 20:52:30 -0700 Subject: [mac-lisp-ide] Portable Hemlock and questions In-Reply-To: <20040722163342.B70324@clozure.com> References: <6A13B952-DC2E-11D8-9543-000A959DDDE0@david-steuber.com> <20040722163342.B70324@clozure.com> Message-ID: On Jul 22, 2004, at 4:13 PM, Gary Byers wrote: > > > On Thu, 22 Jul 2004, David Steuber wrote: > >> These are just some idle questions. >> >> Is the source from Hemlock at all useful as far as editing functions >> for porting over to a proper Carbon based application for an Aqua >> compliant IDE? > > I had made some progress in integrating Portable Hemlock and Cocoa; > I'd hoped to be able to release that a few months ago, but haven't > had time to do much of anything OpenMCL- or mac-lisp-ide- related > since. > > I don't know how useful this would be to someone determined to use > Carbon. > [...] > If anyone wants to volunteer to fix up/finish the Hemlock/Cocoa stuff > I started, I can help some; I hope to be able to work on it more in > the near future, but every prediction I've made so far about how "near" > has proven wildly optimistic. I have a new version of Clotho on the table, based on a new version of Bosco that I did a month or so ago, and I plan sometime 'soon' to try merging your Hemlock sources into it. But I'm kind of in the same boat as you regarding time to do it. If people are patient enough, I'll eventually get it done. If they aren't, well, I'm always glad for other people to do the work I was going to do. --me From david at david-steuber.com Fri Jul 23 04:48:28 2004 From: david at david-steuber.com (David Steuber) Date: Fri, 23 Jul 2004 00:48:28 -0400 Subject: [mac-lisp-ide] Portable Hemlock and questions In-Reply-To: References: <6A13B952-DC2E-11D8-9543-000A959DDDE0@david-steuber.com> <20040722163342.B70324@clozure.com> Message-ID: <8A06185C-DC63-11D8-9543-000A959DDDE0@david-steuber.com> On Jul 22, 2004, at 11:52 PM, mikel evins wrote: > I have a new version of Clotho on the table, based on a new version of > Bosco that I did a month or so ago, and I plan sometime 'soon' to try > merging your Hemlock sources into it. > > But I'm kind of in the same boat as you regarding time to do it. If > people are patient enough, I'll eventually get it done. If they > aren't, well, I'm always glad for other people to do the work I was > going to do. Well, my present status is that I am beginning to work through "Learning Carbon" with Xcode and Interface Builder 2.4.2 (from Xcode 1.2 CD). I'm as far along as chapter 9 in Keene's CLOS book without having actually tried any CLOS out. I was under the impression that OpenMCL's Cocoa support was in flux. What is the status of that? I'm mainly going for Carbon because it seems more stable. Also, I plan to try and incorporate OpenGL and QuickTime in my programming (some time in the future) and they both have C APIs as Carbon does. Otherwise, Cocoa is not a problem for me. I have a tutorial book on that also that I have not worked through. Like Learning Carbon, it assumes Project Builder rather than Xcode and an older version of Interface Builder. So I am in a catch up mode for some indefinite period of time. I was fairly sure that someone would be using Hemlock code in part as part of a Mac Lisp IDE. Doing an editor from scratch seems hard. I also still have to learn about Lisp's error handling system. Other than that, I am confident that I can handle CS 101 sort of stuff in Lisp. If there is no wish list yet, can I add integration with Interface Builder near the top? I think for its part, Xcode does necessary code generation and Interface Builder is simply another program that it starts up. I'm not aware of any IPC going on. Anyway, wrapping Interface Builder Services with the appropriate reusable code seems like the clear way to go. It could even be used in the IDE.