[cl-ppcre-devel] behavior of \w
Robert Brown
robert.brown at gmail.com
Mon Mar 12 15:10:04 UTC 2012
Some folks I work with are using cl-ppcre. They've run into an
incompatibility between cl-ppcre and the PCRE library that boils
down to cl-ppcre's handling of \w. The behavior is documented in
cl-ppcre's manual:
CL-PPCRE uses ALPHANUMERICP to decide whether a character
matches Perl's "\w", so depending on your CL implementation you
might encounter differences between Perl and CL-PPCRE when
matching non-ASCII characters.
This reliance on ALPHANUMERICP may be a misfeature. It means
that cl-ppcre behaves differently depending on the Lisp
implementation it's running on.
My co-workers desire compatibility between cl-ppcre on SBCL
(where ALPHANUMERICP follows Unicode) and PCRE for matching
Latin-1 encoded strings. They patched the cl-ppcre code to make
\w match a-z, A-Z, 0-9, and underscore. Is there a better
workaround for them?
bob
More information about the Cl-ppcre-devel
mailing list