[cl-ppcre-devel] CL-PPCRE Split behaviour
Sébastien Saint-Sevin
seb-cl-mailist at matchix.com
Tue Aug 28 08:38:04 UTC 2007
Thanks a lot Chris,
Very interesting feedback
Cheers,
sebastien.
Chris Dean a écrit :
> Sébastien Saint-Sevin <seb-cl-mailist at matchix.com> writes:
>> While using cl-ppcre:split recently, I discover that when the regex
>> match at pos 0, the function returns an empty string in first pos,
>> where I think it should not as I do not consider the empty string
>> being a substring of the original one.
>>
>> Ex : (cl-ppcre:split "\\s+" " foo bar baz ") ==> ("" "foo" "bar" "baz")
>
> It is an interesting question, but I believe that the current split
> behavior of the returning the leading empty string is the rational
> behavior. In mind my in comes down to the definition of split
> "returns a list of the substrings between the matches".
>
> Having said that I often have real-world needs to *not* have the
> leading string around. I wish there were explicit keyword args to
> omit any leading and trailing empty strings. If I get motivated, I
> might even make a patch! Perl's version of split doesn't have keyword
> args so it tries to fit several behavior changes into its arguments.
>
> Here's some more practical advice: If you know your problem domain
> well, you can try the inverse match trick. Instead of calling SPLIT,
> call ALL-MATCHES-AS-STRINGS with the inverse regex. In this case:
>
> (all-matches-as-strings "\\S+" " foo bar baz ") => ("foo" "bar" "baz")
>
> (This will skip internal empty strings in the general case, but
> doesn't matter for your example case.)
>
> It's also easy to also write your own split that does what you want.
> An untested version is below.
>
> Cheers,
> Chris Dean
>
>
> (defun simple-split (regex target-string)
> "A simple version of split that doesn't handle registers in any
> special way and discards leading and trailing empty matches.
> Untested!"
> (let ((res nil) ; The result
> (last-end 0)) ; The end positon of the last match
> (cl-ppcre:do-matches (mstart mend regex target-string)
> (unless (zerop mstart)
> (push (subseq target-string last-end mstart) res))
> (setf last-end mend))
> (when (< last-end (length target-string))
> (push (subseq target-string last-end) res))
> (nreverse res)))
> _______________________________________________
> cl-ppcre-devel site list
> cl-ppcre-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/cl-ppcre-devel
>
More information about the Cl-ppcre-devel
mailing list