[pro] Lisp and DSLs

Fri Jul 22 07:14:33 UTC 2011

On Jul 20, 2011, at 6:32 AM, Didier Verna wrote:
> Right now, I would like to know if any of you have DSL "pearls", nice
> examples of DSLs that you have written in Lisp by using some of its
> features in a clever or elegant way. I would also gladly accept any
> point of view or comment on what's important to mention, in terms of
> design principle or anything else, things that I may have missed in the
> list above.

One perspective that I haven't seen addressed yet is:
know your audience and related concepts.

Last year, I wrote a tool for processing large quantities of dirty .csv files: more than can fit into RAM, multiple files comprising a single dataset, column sequence changing across files, junk in cell values, different number of columns across files, etc.  And extremely short turn-around time.

The idea was to process this data without having to clean it up first, such that one might gain some insights (sums, uniques, bucketing, etc.) helpful in determining whether or not the data was of any value or gain clues on how to proceed further.  Proper database ETL would be too time consuming.

My intended audience was a statistical analyst that favored mousing around Excel spreadsheets while letting prior SAS training go unused because of not wanting to write scripts.  (That should have been the first clue to be concerned, but oh, what we do for friends!)

Ultimately, the tool was useful to me, and that was sufficient to meet deadlines.

But the DSL code became overly complex due to this extraneous design criteria of accommodating someone who doesn't want to write scripts in the first place.  

Instead of a few simple macros such as WITH-MESSY-CSV appropriate to a Lisp programmer, I effectively created a register machine with "simple" commands: include/exclude filtering, sum, count, unique, look-up, group-by, etc.  

Otherwise, the approach was sound enough: load relatively small .csv files as look-up tables and iterate over the entire dataset in a single pass, applying lexically scoped blocks of filtering & calculations.  Convert only named columns of interest regardless of position changes across files, parse numbers on demand only for those operations that required it, skip unparseable rows as last resort, etc.  Some error in results-- but some results with reduced confidence are better than none in this case.

Lessons learned:  (a few more while I'm here)

  1. Know your audience, and build for the correct users.

  2. Build the right tool.  (I'm a systems programmer; a good stats person would likely have come up with a better work-flow, likely using R so rich reports could also be generated quickly.)

  3. Good language design can be challenging.  I would have been better off (perhaps) stealing SQL or XQuery's FLOWR conventions than inventing my own "simple" set of commands.  (Syntax is another matter... as you know.)

  4. Being adept at backquotes, comma substitution and unrolling lists is not necessarily enough skill to create a good, clean DSL implementation.  But keep trying.  Do your best to make one for "keeps".  Then throw it away, anyway.  It's important to not hold anything back in the first version.  Ah, experience!  (I'll likely go at this one again just for the fun of it.)  
e.g., unrelated project from years ago: http://play.org/learning-lisp/html.lisp

  5. Collaborate: Get input from others.  My co-workers who also use Common Lisp were many time-zones and an ocean away, busy with looming deadlines of their own. However, their 10 years CL experience to my 5 (and their far deeper stats familiarity) would certainly have helped here.

-Daniel

--
first name at last name dot com