[toronto-lisp] dealing with strings -> symbols

Paul Tarvydas tarvydas at visualframeworksinc.com
Fri Nov 5 14:57:08 UTC 2010

Hi Alex

> Hi there,
> I have come across something that I need just a little bit of help with.
> I managed to write some lisp code to read a flat file, line by line.  I
> also
> managed to put those lines of code into something like this:
> (
> 	( "A,MSFT.N,D,12.4,12:43:00")
> 	( "f,IBM.N,f,108.4,11:09:023")
> )

Why do you need the extra level of lists?

If each line comes in as a string, then a simple list of strings would suffice:


> All comma delineated text files.  Now, I would like to convert this into
> something that I can deal with from a lisp perspective - I would like
> to get so that these values are symbols - so this is what I really
> want:
> (
> 	( A  MSFT.N  D  12.4  12:43:00  )
> 	(  f  IBM.N  f  108.4  11:09:023 )
> )

I would load up quicklisp and grab cl-ppcre - Edi Weitz' regular expression parser, then use something like register-groups-bind to parse the strings into appropriate data structures.  (Ask me/us again, if the documentation of this function leaves you wondering :-).

I would not convert every item above into an actual symbol.  For example, 12:43:00 looks like a time to me.  I would create a time class and parse the components of the string into the fields of a time object, using cl-ppcre.  And, depending on what the other things in the string are, I might create classes for them, too.

Remember, lisp allows non-homogeneous lists - you can store objects and strings and whatever within the same list.

If you know that the incoming data has a fixed number of fields, it might be more efficient to create and use vectors (make-array) or your own record class instead of lists...

> I noticed that there is a function called INTERN which seems to convert
> a regular string to a symbol, but, I don't really have a regular string,

A simplistic explanation of a running lisp image is that all symbols are stored in a global hash table (in Lisp 1.5, this used to be a list, not a hash table, called OBLIST).  Every atom read by the reader is hashed and converted into a symbol.

INTERN simply hashes the given string and inserts a symbol object into the hash table, with a hash index (name) consisting of the string..  If you gave it one of the above messy strings, INTERN would gladly hash it and make a symbol out of it.  If you tried to print the resulting symbol, it would probably print the string surrounded by or-bars (|) since the string contains special characters.

So, from your perspective, INTERN is too low-level for what you want to do.  You want to parse a string.  You need to write a string parser, or buy one in.  Cl-ppcre is a string parser and it's free.

Actually, I would probably read the file and parse each incoming line on the spot, instead of creating a list of strings to be parsed later.


More information about the toronto-lisp mailing list