[Langutils-devel] Period not correctly tokenized?

Jianshi Huang jianshi.huang at gmail.com
Sun Aug 14 10:56:45 UTC 2011


Hey,

Periods in a sentence seems wrongly tokenized.

Here's an example:

LANGUTILS> (tokens-for-ids (vector-document-words (vector-tag "Hello
world. I'm here.")))
("Hello" "world." "I" "'" "m" "here.")

I think it should be:

("Hello" "world" "." "I" "'" "m" "here" ".")

I'm using the latest langutils from github, and SBCL 1.0.50

Cheers,
-- 
黄 澗石 (Jianshi Huang)
http://huangjs.net/




More information about the langutils-devel mailing list