[Langutils-devel] Period not correctly tokenized?

Ian Eslick eslick at media.mit.edu
Tue Oct 11 16:01:31 UTC 2011


Periods are handled specially because they show up in numbers, abbreviations, e.g. and i.e., etc.  You lose numbers as tokens if you split out periods naively.

Sent from my iPhone

On Oct 11, 2011, at 12:33 AM, Jianshi Huang <jianshi.huang at gmail.com> wrote:

> Hey Kevin,
> 
> On Fri, Oct 7, 2011 at 2:57 PM, Jianshi Huang <jianshi.huang at gmail.com> wrote:
>> Currently it works for me, but I'm not sure whether it will break
>> something else...
>> 
>> There must be a reason for not including #\. in the punctuation type.
>> 
>> Anyway, here's the patch for git.
>> 
> 
> I messed up your repository with eslick's cl-langutils. LOL
> 
> So here's the patch for your langutils.
> 
> 
> -- 
> 黄 澗石 (Jianshi Huang)
> http://huangjs.net/
> <0001-Fix-tokenization-for-sentence-ending-periods.patch>
> _______________________________________________
> Langutils-devel mailing list
> Langutils-devel at common-lisp.net
> http://lists.common-lisp.net/cgi-bin/mailman/listinfo/langutils-devel




More information about the langutils-devel mailing list