[Langutils-devel] Period not correctly tokenized?
Jianshi Huang
jianshi.huang at gmail.com
Sun Aug 14 10:56:45 UTC 2011
Hey,
Periods in a sentence seems wrongly tokenized.
Here's an example:
LANGUTILS> (tokens-for-ids (vector-document-words (vector-tag "Hello
world. I'm here.")))
("Hello" "world." "I" "'" "m" "here.")
I think it should be:
("Hello" "world" "." "I" "'" "m" "here" ".")
I'm using the latest langutils from github, and SBCL 1.0.50
Cheers,
--
黄 澗石 (Jianshi Huang)
http://huangjs.net/
More information about the langutils-devel
mailing list