[Langutils-devel] Period not correctly tokenized?
Jianshi Huang
jianshi.huang at gmail.com
Fri Aug 19 05:56:36 UTC 2011
Oh, I see from example.lisp that sentence ending period is a known bug.
But is there a reason not to add #\. in (deftype punctuation ...) in
tokenize.lisp?
Cheers,
On Sun, Aug 14, 2011 at 7:56 PM, Jianshi Huang <jianshi.huang at gmail.com> wrote:
> Hey,
>
> Periods in a sentence seems wrongly tokenized.
>
> Here's an example:
>
> LANGUTILS> (tokens-for-ids (vector-document-words (vector-tag "Hello
> world. I'm here.")))
> ("Hello" "world." "I" "'" "m" "here.")
>
> I think it should be:
>
> ("Hello" "world" "." "I" "'" "m" "here" ".")
>
> I'm using the latest langutils from github, and SBCL 1.0.50
>
> Cheers,
> --
> 黄 澗石 (Jianshi Huang)
> http://huangjs.net/
>
--
黄 澗石 (Jianshi Huang)
http://huangjs.net/
More information about the langutils-devel
mailing list