From jianshi.huang at gmail.com Sun Aug 14 10:56:45 2011 From: jianshi.huang at gmail.com (Jianshi Huang) Date: Sun, 14 Aug 2011 19:56:45 +0900 Subject: [Langutils-devel] Period not correctly tokenized? Message-ID: Hey, Periods in a sentence seems wrongly tokenized. Here's an example: LANGUTILS> (tokens-for-ids (vector-document-words (vector-tag "Hello world. I'm here."))) ("Hello" "world." "I" "'" "m" "here.") I think it should be: ("Hello" "world" "." "I" "'" "m" "here" ".") I'm using the latest langutils from github, and SBCL 1.0.50 Cheers, -- ???? (Jianshi Huang) http://huangjs.net/ From jianshi.huang at gmail.com Fri Aug 19 05:56:36 2011 From: jianshi.huang at gmail.com (Jianshi Huang) Date: Fri, 19 Aug 2011 14:56:36 +0900 Subject: [Langutils-devel] Period not correctly tokenized? In-Reply-To: References: Message-ID: Oh, I see from example.lisp that sentence ending period is a known bug. But is there a reason not to add #\. in (deftype punctuation ...) in tokenize.lisp? Cheers, On Sun, Aug 14, 2011 at 7:56 PM, Jianshi Huang wrote: > Hey, > > Periods in a sentence seems wrongly tokenized. > > Here's an example: > > LANGUTILS> (tokens-for-ids (vector-document-words (vector-tag "Hello > world. I'm here."))) > ("Hello" "world." "I" "'" "m" "here.") > > I think it should be: > > ("Hello" "world" "." "I" "'" "m" "here" ".") > > I'm using the latest langutils from github, and SBCL 1.0.50 > > Cheers, > -- > ???? (Jianshi Huang) > http://huangjs.net/ > -- ???? (Jianshi Huang) http://huangjs.net/