From edi at agharta.de Mon Jun 12 23:06:16 2006 From: edi at agharta.de (Edi Weitz) Date: Tue, 13 Jun 2006 01:06:16 +0200 Subject: [cl-ppcre-devel] Re: An idea for ppcre::register-groups-bind In-Reply-To: (Alexander Kjeldaas's message of "Sat, 3 Jun 2006 22:28:29 +0200") References: Message-ID: Hi! Sorry for the delay, I was on vacation. Oh, and please use the mailing list - see Cc. On Sat, 3 Jun 2006 22:28:29 +0200, "Alexander Kjeldaas" wrote: > I'm a big fan of register-groups-bind. For some longer texts that > I'm parsing, I have code that looks like this: > > The idiom is - you want to go through the text matching various > regexps, but you want to keep on to the end-position in order to do > incremental matching. So I added an :end-holder keyword parameter > that is the name of a variable that is set to the end-match (I also > added :start-holder, but I don't use it). > > To me, this type of parsing is relatively simple to debug, but it is > also not too sloppy in that we don't start from the beginning all > the time. > > :end-place and :start-place might be better names, I don't know.. I think that the idea is basically OK, but I'm not sure if I like that seemingly END-POS is supposed to be an existing variable which is set - I think it'd be more Lisp-y to bind within the body of the form. Admittedly, the indentation would get quite nasty then. Cheers, Edi. > (defun parse-stuff (string) > "Parse various account information" > (let ((result nil) > end-pos) > (register-groups-bind (foo) > ("Kontoinformasjon for (.*)\\s+" string :end-holder end-pos) > (push `(:username ,foo) result)) > (register-groups-bind (foo) > ("Navn:[ \\t]+(.*)\\s+" string :start end-pos :end-holder end-pos) > (push `(:name ,foo) result)) > ;; Use (?=Telefon) to not affect the end-position. > (register-groups-bind (addr1 addr2 addr3) > ("Adresse:[ > \\t]+(.*)\\s+(?:(.*)\\s+)?(?:(.*)\\s+)?(?=Telefon)" string :start > end-pos :end-holder end-pos) > (push (remove-if-not #'identity `(:address ,addr1 ,addr2 ,addr3)) result)) > (register-groups-bind (foo) > ("Telefon:[ \\t]+(\\+?\\d+)" string :start end-pos :end-holder end-pos) > (push `(:phone ,foo) result)) > (register-groups-bind (foo) > ("Mobil:[ \\t]+(\\+?\\d+)" string :start end-pos :end-holder end-pos) > (push `(:mobile-phone ,foo) result)) > (register-groups-bind (foo) > ("Epost:[ \\t]+(.*)\\n" string :start end-pos :end-holder end-pos) > (push `(:email ,foo) result)) > (register-groups-bind (foo) > ("Epost:[ \\t]+(.*)\\n" string :start end-pos :end-holder end-pos) > (push `(:email ,foo) result)) > (nreverse result))) From ctdean at sokitomi.com Mon Jun 12 23:40:38 2006 From: ctdean at sokitomi.com (Chris Dean) Date: Mon, 12 Jun 2006 16:40:38 -0700 Subject: [cl-ppcre-devel] Re: An idea for ppcre::register-groups-bind In-Reply-To: (Edi Weitz's message of "Tue, 13 Jun 2006 01:06:16 +0200") References: Message-ID: > "Alexander Kjeldaas" wrote: >> The idiom is - you want to go through the text matching various >> regexps, but you want to keep on to the end-position in order to do >> incremental matching. I agree with Edi that the idea is basically sound, but finding a good syntax will be challenging. In other languages (Perl mostly), I've used the same idiom for a quick and dirty parser. The difference is that I've always used a replace on the target string instead of just a match. The replacement text is always the empty string "". That way, I remove what I match and can continue on. As an example, I'll invent REPLACE-REGISTER-GROUPS-BIND and use it to parse the name, rank, and serial number out of a string. (defun parse-stuff (string) (replace-register-groups-bind (name) ("Name: (\\S+)" string "") (process-name name)) (replace-register-groups-bind (rank) ("Rank: (\\S+)" string "") (process-rank rank)) (replace-register-groups-bind (sn) ("Serial Number: (\\S+)" string "") (process-sn sn))) I probably should have copied the string first, but you get the idea. If you want to go with a non-destructive solution I think the syntax is tough. The best I could come up with in the 30 seconds of contemplation was the mythical REGISTER-GROUPS-BIND-2 form that binds the start and end of the match: (defun parse-stuff (string) (let ((last-end 0)) (register-groups-bind-2 (name) (match-start match-end) (*name-re* string :start last-end) (process-name name) (setf last-end match-end)) (register-groups-bind-2 (rank) (match-start match-end) (*rank-re* string :start last-end) (process-rank rank) (setf last-end match-end)) (register-groups-bind-2 (sn) (match-start match-end) (*sn-re* string :start last-end) (process-sn sn) (setf last-end match-end)))) BTW, Perl has some anchoring meta characters (\G), but I don't think that is what you are looking for. Cheers, Chris Dean From larsnostdal at gmail.com Thu Jun 22 22:48:10 2006 From: larsnostdal at gmail.com (=?UTF-8?Q?Lars_Rune_N=C3=B8stdal?=) Date: Fri, 23 Jun 2006 00:48:10 +0200 Subject: [cl-ppcre-devel] download-link on frontpage Message-ID: Hello lispers, Just a quick note; it seems the cl-ppcre download-link on the frontpage points to an old version of cl-ppcre (1.2.12) while the newest one mentioned on the site is 1.2.14. :) -- Mvh, Lars Rune N?stdal http://lars.nostdal.org/ From edi at agharta.de Fri Jun 23 07:04:22 2006 From: edi at agharta.de (Edi Weitz) Date: Fri, 23 Jun 2006 09:04:22 +0200 Subject: [cl-ppcre-devel] download-link on frontpage In-Reply-To: (Lars Rune =?iso-8859-1?q?N=F8stdal's?= message of "Fri, 23 Jun 2006 00:48:10 +0200") References: Message-ID: On Fri, 23 Jun 2006 00:48:10 +0200, "Lars Rune N?stdal" wrote: > Just a quick note; it seems the cl-ppcre download-link on the > frontpage points to an old version of cl-ppcre (1.2.12) while the > newest one mentioned on the site is 1.2.14. :) I just tried and got 1.2.14. I assume this is some cache problem with your browser. Cheers, Edi. From seb-cl-mailist at matchix.com Fri Jun 23 08:51:10 2006 From: seb-cl-mailist at matchix.com (=?ISO-8859-1?Q?S=E9bastien_Saint-Sevin?=) Date: Fri, 23 Jun 2006 10:51:10 +0200 Subject: [cl-ppcre-devel] download-link on frontpage In-Reply-To: References: Message-ID: <449BAB7E.2090801@matchix.com> Hi Edi, The link is OK for me too. While we are there, it's the http://weitz.de/cl-ppcre/CHANGELOG that points to 1.2.13 changelog for me (since the release of 1.2.14, ie it's not new...) Cheers, Sebastien. Edi Weitz a ?crit : > On Fri, 23 Jun 2006 00:48:10 +0200, "Lars Rune N?stdal" wrote: > >> Just a quick note; it seems the cl-ppcre download-link on the >> frontpage points to an old version of cl-ppcre (1.2.12) while the >> newest one mentioned on the site is 1.2.14. :) > > I just tried and got 1.2.14. I assume this is some cache problem with > your browser. > > Cheers, > Edi. > _______________________________________________ > cl-ppcre-devel site list > cl-ppcre-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/cl-ppcre-devel > From edi at agharta.de Fri Jun 23 10:45:31 2006 From: edi at agharta.de (Edi Weitz) Date: Fri, 23 Jun 2006 12:45:31 +0200 Subject: [cl-ppcre-devel] download-link on frontpage In-Reply-To: <449BAB7E.2090801@matchix.com> =?iso-8859-1?q?=28S=E9bastien?= Saint-Sevin's message of "Fri, 23 Jun 2006 10:51:10 +0200") References: <449BAB7E.2090801@matchix.com> Message-ID: On Fri, 23 Jun 2006 10:51:10 +0200, S?bastien Saint-Sevin wrote: > While we are there, it's the http://weitz.de/cl-ppcre/CHANGELOG that > points to 1.2.13 changelog for me (since the release of 1.2.14, ie > it's not new...) Right, thanks. I always forget to update the CHANGELOG. Instead I took the easy route now and removed the link... :) Cheers, Edi.