[regex-coach] problem with dot '.' inside brackets

sites at brynmosher.com sites at brynmosher.com
Wed Apr 26 21:41:09 UTC 2006


the "|m" was just in there for the example. I'm actually trying to match html 
tags with small contents inside like;
<tag> V </tag>
<tag> I </tag>
<tag> A </tag>
<tag> G </tag>
<tag> R </tag>
<tag> A </tag>

with the expression similar to "/(>.{1,4}<[\S\s]*){4}/" and noticed the 
behaviour when I changed the ".{1,4}" to "[.^<]{1,4}". I did notice that 
"\s" and "\S" are matched inside a character class, which is what I think 
led me to assume other meta-characters would be too. I've only been using 
regex for a while, so I am stumbling along. Thanks for the man page. I'm 
reading it now. Any other advice you could give for this expression would 
be great.

Thanks,
Bryn

----- Original Message -----
From: Edi Weitz <edi at agharta.de>
To: sites at brynmosher.com
Cc: regex-coach at common-lisp.net
Date: Wed, 26 Apr 2006 23:17:25 +0200
Subject: Re: [regex-coach] problem with dot '.' inside brackets

> On Wed, 26 Apr 2006 12:44:30 -0700, sites at brynmosher.com wrote:
> 
> > I've been using Regex-Coach 0.8.4 on Windows to test some
> > SpamAssassin rules and noticed something odd:
> >
> > Placing the following expression:
> > [.|m]
> >
> > to match the following data:
> > bleh.com
> >
> > Matches the '.' in bleh.com and not the first non-linefeed character
> > as the '.' character in the expression should match. It's almost as
> > if I had excaped the '.' like '\.'. Using the expression '[.]'
> > yields the same result. I've also noticed that the non-match
> > character '^' doesn't work inside brackets as well.
> >
> > Is this an error or am I crazy?
> 
> Well, at least it's not an error... :)
> 
> Most characters that have a special meaning in regular expressions
> (like the dot or the pipe symbol, for example) are treated like normal
> characters within character classes, i.e. within square brackets.
> 
> See 'man perlre' for details.
> 
> BTW, it seems that your understanding of character classes as a whole
> is wrong.  If the dot /would/ match every non-linefeed character, then
> "[.|m]" would be equivalent to "[.]".
> 
> Cheers,
> Edi.
> 



More information about the regex-coach mailing list