[regex-coach] ".+" and ".+?" with optional parenthesized text

John Clements johnjc-regex at publicinfo.net
Sun Aug 22 16:37:36 UTC 2004


That is absolutely brilliant, Edi! Thank you so much!

At 14:55 22/08/04, you wrote:
>On Sun, 22 Aug 2004 14:04:47 +0100, John Clements 
><johnjc-regex at publicinfo.net> wrote:
>
> > I ran the pattern with "i" checked.
>
>I guess you also had "s" checked because your target string contained
>line breaks.

If it had line breaks they were introduced by the mailer(s) because Regex 
Coach didn't show any.

> > What I want it to do is match the string from the beginning through
> > "between", and when there is no instance of "between", I want it to
> > match the entire string.
>
>This regex should work:
>
>^\s*An appeal.+?(Joined )?Cases? ?t ?[-­] ?\d{1,3}\/ ?\d{2}(.+?between|.*)

I was just looking over some tutorial material which was talking about what 
you enclose in parentheses and what not, and it hadn't dawned on me that it 
was relevant to my problem!

Yes, putting the ".+?" inside the parenthesis does the trick. And the "|.*" 
makes perfect sense. It says so directly "or the rest of the string".

I had settled for a solution that used the "greedy" version of ".+" before 
"between", which in the presence of a second instance of the word "between" 
would have brought in unwanted text. Now it's just right. I really 
appreciate this!


>The behaviour you saw was right. (As a rule of thumb Regex Coach is
>always right as long as it does the same as Perl... :)

Yeah, that's what I thought, too. :)


>You had ".+?(between)?" which meant "match as few characters as
>possible up to ..." where ... was "the string 'between' OR ANYTHING"
>because you made 'between' optional, i.e. you regex was equivalent to
>".+?". So, the regex engine matched exactly zero characters.
>
>Does that help?

Indeed, indeed! Thanks for that explanation, too. I need to see the logic 
of something to really absorb it. I had accepted what I saw as the 
limitation of the regex engine but without understanding its logic hadn't 
worked out how to get that refinement that I needed.


All the best, John

John Clements
john.clements at publicinfo.net
+44(0)20 8959-6432

http://www.publicinfo.net

PublicInfo.Net Ltd.
29 Gibbs Green
Edgware, Middlesex
United Kingdom HA8 9RS






More information about the regex-coach mailing list