[Langutils-devel] Period not correctly tokenized?

Jianshi Huang jianshi.huang at gmail.com
Fri Aug 19 05:56:36 UTC 2011


Oh, I see from example.lisp that sentence ending period is a known bug.

But is there a reason not to add #\. in (deftype punctuation ...) in
tokenize.lisp?

Cheers,

On Sun, Aug 14, 2011 at 7:56 PM, Jianshi Huang <jianshi.huang at gmail.com> wrote:
> Hey,
>
> Periods in a sentence seems wrongly tokenized.
>
> Here's an example:
>
> LANGUTILS> (tokens-for-ids (vector-document-words (vector-tag "Hello
> world. I'm here.")))
> ("Hello" "world." "I" "'" "m" "here.")
>
> I think it should be:
>
> ("Hello" "world" "." "I" "'" "m" "here" ".")
>
> I'm using the latest langutils from github, and SBCL 1.0.50
>
> Cheers,
> --
> 黄 澗石 (Jianshi Huang)
> http://huangjs.net/
>



-- 
黄 澗石 (Jianshi Huang)
http://huangjs.net/




More information about the langutils-devel mailing list