[regex-coach] Two Problems

Wed Nov 24 19:47:12 UTC 2004

On Wed, 24 Nov 2004 19:05:39 +0000, "Snow Squall" <snow_squall at hotmail.com> wrote:

> I have two issues that i can't seem to track down....

These are really not on topic for this mailing list because it's about
the program Regex Coach, not about learning regular expression basics.

> 1.  I'm looking for a rule that will eliminate the following.  I'm
> looking for all of my web pages that have some snippit of code
> before the <!DOCTYPE... my <!DOCTYPE should start the HTML on my web
> pages...  I've seen individuals sneak the following code in:
>
> <!-- saved from -->
> <!DOCTYPE HTML Public ..ect...
>
> So is there a regex construct that will fail if any characters are
> found before <!DOCTYPE  ???

^<!DOCTYPE

This assumes that you've read the whole HTML page into one string

> 2. Secondly, looking to find ONLY the .PDF's inside a test.com
> domain.  I wish to match the pattern
> http://www.test.com/snow/squall/index.pdf .  I know to start my
> regex as http://www\.test\.com but how do i ignore all the directory
> stuff and key in on the .pdf extension.

http://www\.test\.com/.*\.pdf

Cheers,
Edi.