[cl-ppcre-devel] CL-PPCRE Split behaviour

Sébastien Saint-Sevin seb-cl-mailist at matchix.com
Tue Aug 28 08:38:04 UTC 2007


Thanks a lot Chris,
Very interesting feedback

Cheers,
sebastien.

Chris Dean a écrit :
> Sébastien Saint-Sevin <seb-cl-mailist at matchix.com> writes:
>> While using cl-ppcre:split recently, I discover that when the regex
>> match at pos 0, the function returns an empty string in first pos,
>> where I think it should not as I do not consider the empty string
>> being a substring of the original one.
>>
>> Ex : (cl-ppcre:split "\\s+" " foo bar baz ") ==>  ("" "foo" "bar" "baz")
> 
> It is an interesting question, but I believe that the current split
> behavior of the returning the leading empty string is the rational
> behavior.  In mind my in comes down to the definition of split
> "returns a list of the substrings between the matches".  
> 
> Having said that I often have real-world needs to *not* have the
> leading string around.  I wish there were explicit keyword args to
> omit any leading and trailing empty strings.  If I get motivated, I
> might even make a patch!  Perl's version of split doesn't have keyword
> args so it tries to fit several behavior changes into its arguments.
> 
> Here's some more practical advice: If you know your problem domain
> well, you can try the inverse match trick.  Instead of calling SPLIT,
> call ALL-MATCHES-AS-STRINGS with the inverse regex.  In this case:
> 
>   (all-matches-as-strings "\\S+" " foo  bar  baz ") => ("foo" "bar" "baz")
> 
> (This will skip internal empty strings in the general case, but
> doesn't matter for your example case.)
> 
> It's also easy to also write your own split that does what you want.
> An untested version is below.
> 
> Cheers,
> Chris Dean
> 
> 
> (defun simple-split (regex target-string)
>   "A simple version of split that doesn't handle registers in any
>    special way and discards leading and trailing empty matches.  
>    Untested!"
>   (let ((res nil)                       ; The result
>         (last-end 0))                   ; The end positon of the last match
>     (cl-ppcre:do-matches (mstart mend regex target-string)
>       (unless (zerop mstart)
>         (push (subseq target-string last-end mstart) res))
>       (setf last-end mend))
>     (when (< last-end (length target-string))
>       (push (subseq target-string last-end) res))
>     (nreverse res)))
> _______________________________________________
> cl-ppcre-devel site list
> cl-ppcre-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/cl-ppcre-devel
> 



More information about the Cl-ppcre-devel mailing list