[toronto-lisp] dealing with strings -> symbols

Doug doug at hcsw.org
Sat Nov 6 16:07:58 UTC 2010


I agree with everything Paul said. To parse strings like this CL-PPCRE 
is probably your best bet.

With lol.lisp loaded, you can define a parser like this:

* (defun my-parser (line)
     (if-match (#~m/^(\w+),([\w.]+),(\w+),([\d.]+),([\d:]+)$/ line)
       (list (intern $1) (intern $2) (intern $3) (read-from-string $4) $5)
       (error "unable to parse line")))

MY-PARSER

And then run it against your input like this:

* (my-parser "A,MSFT.N,D,12.4,12:43:00")

(A MSFT.N D 12.4 "12:43:00")

* (my-parser "f,IBM.N,f,108.4,11:09:023")

(|f| IBM.N |f| 108.4 "11:09:023")


Note: Lowercase symbols are printed like |blah|. Normally symbols you 
READ in (ie at the REPL) are "up cased" to BLAH but if you pass 
lowercase to intern it will not do this which may or may not be what you 
want.

Also, this parser just leaves the date as a string but instead you may 
want to use Daniel Lowe's local-time.lisp to construct an actual date 
object so you can for example do date arithmetic on it.

Doug

http://letoverlambda.com/lol.lisp
http://common-lisp.net/project/local-time/



On 10-11-05 10:57 AM, Paul Tarvydas wrote:
> Hi Alex
>
>    
>> Hi there,
>>
>> I have come across something that I need just a little bit of help with.
>> I managed to write some lisp code to read a flat file, line by line.  I
>> also
>> managed to put those lines of code into something like this:
>>
>> (
>> 	( "A,MSFT.N,D,12.4,12:43:00")
>> 	( "f,IBM.N,f,108.4,11:09:023")
>>
>> )
>>      
> Why do you need the extra level of lists?
>
> If each line comes in as a string, then a simple list of strings would suffice:
>
>   (
> 	 "A,MSFT.N,D,12.4,12:43:00"
>   	"f,IBM.N,f,108.4,11:09:023"
>    )
>
>    
>> All comma delineated text files.  Now, I would like to convert this into
>> something that I can deal with from a lisp perspective - I would like
>> to get so that these values are symbols - so this is what I really
>> want:
>>
>> (
>> 	( A  MSFT.N  D  12.4  12:43:00  )
>> 	(  f  IBM.N  f  108.4  11:09:023 )
>>
>> )
>>      
> I would load up quicklisp and grab cl-ppcre - Edi Weitz' regular expression parser, then use something like register-groups-bind to parse the strings into appropriate data structures.  (Ask me/us again, if the documentation of this function leaves you wondering :-).
>
> I would not convert every item above into an actual symbol.  For example, 12:43:00 looks like a time to me.  I would create a time class and parse the components of the string into the fields of a time object, using cl-ppcre.  And, depending on what the other things in the string are, I might create classes for them, too.
>
> Remember, lisp allows non-homogeneous lists - you can store objects and strings and whatever within the same list.
>
> If you know that the incoming data has a fixed number of fields, it might be more efficient to create and use vectors (make-array) or your own record class instead of lists...
>
>    
>> I noticed that there is a function called INTERN which seems to convert
>> a regular string to a symbol, but, I don't really have a regular string,
>>      
> A simplistic explanation of a running lisp image is that all symbols are stored in a global hash table (in Lisp 1.5, this used to be a list, not a hash table, called OBLIST).  Every atom read by the reader is hashed and converted into a symbol.
>
> INTERN simply hashes the given string and inserts a symbol object into the hash table, with a hash index (name) consisting of the string..  If you gave it one of the above messy strings, INTERN would gladly hash it and make a symbol out of it.  If you tried to print the resulting symbol, it would probably print the string surrounded by or-bars (|) since the string contains special characters.
>
> So, from your perspective, INTERN is too low-level for what you want to do.  You want to parse a string.  You need to write a string parser, or buy one in.  Cl-ppcre is a string parser and it's free.
>
> Actually, I would probably read the file and parse each incoming line on the spot, instead of creating a list of strings to be parsed later.
>
> pt
>
> _______________________________________________
> toronto-lisp mailing list
> toronto-lisp at common-lisp.net
> http://common-lisp.net/cgi-bin/mailman/listinfo/toronto-lisp
>    





More information about the toronto-lisp mailing list