[toronto-lisp] dealing with strings -> symbols
tarvydas at visualframeworksinc.com
Fri Nov 5 14:57:08 UTC 2010
> Hi there,
> I have come across something that I need just a little bit of help with.
> I managed to write some lisp code to read a flat file, line by line. I
> managed to put those lines of code into something like this:
> ( "A,MSFT.N,D,12.4,12:43:00")
> ( "f,IBM.N,f,108.4,11:09:023")
Why do you need the extra level of lists?
If each line comes in as a string, then a simple list of strings would suffice:
> All comma delineated text files. Now, I would like to convert this into
> something that I can deal with from a lisp perspective - I would like
> to get so that these values are symbols - so this is what I really
> ( A MSFT.N D 12.4 12:43:00 )
> ( f IBM.N f 108.4 11:09:023 )
I would load up quicklisp and grab cl-ppcre - Edi Weitz' regular expression parser, then use something like register-groups-bind to parse the strings into appropriate data structures. (Ask me/us again, if the documentation of this function leaves you wondering :-).
I would not convert every item above into an actual symbol. For example, 12:43:00 looks like a time to me. I would create a time class and parse the components of the string into the fields of a time object, using cl-ppcre. And, depending on what the other things in the string are, I might create classes for them, too.
Remember, lisp allows non-homogeneous lists - you can store objects and strings and whatever within the same list.
If you know that the incoming data has a fixed number of fields, it might be more efficient to create and use vectors (make-array) or your own record class instead of lists...
> I noticed that there is a function called INTERN which seems to convert
> a regular string to a symbol, but, I don't really have a regular string,
A simplistic explanation of a running lisp image is that all symbols are stored in a global hash table (in Lisp 1.5, this used to be a list, not a hash table, called OBLIST). Every atom read by the reader is hashed and converted into a symbol.
INTERN simply hashes the given string and inserts a symbol object into the hash table, with a hash index (name) consisting of the string.. If you gave it one of the above messy strings, INTERN would gladly hash it and make a symbol out of it. If you tried to print the resulting symbol, it would probably print the string surrounded by or-bars (|) since the string contains special characters.
So, from your perspective, INTERN is too low-level for what you want to do. You want to parse a string. You need to write a string parser, or buy one in. Cl-ppcre is a string parser and it's free.
Actually, I would probably read the file and parse each incoming line on the spot, instead of creating a list of strings to be parsed later.
More information about the toronto-lisp