[regex-coach] Two Problems

Wed Nov 24 19:05:39 UTC 2004

Hello, first time poster...

I have two issues that i can't seem to track down....

1.  I'm looking for a rule that will eliminate the following.  I'm looking 
for all of my web pages that have some snippit of code before the 
<!DOCTYPE... my <!DOCTYPE should start the HTML on my web pages...  I've 
seen individuals sneak the following code in:

<!-- saved from -->
<!DOCTYPE HTML Public ..ect...

So is there a regex construct that will fail if any characters are found 
before <!DOCTYPE  ???

2. Secondly, looking to find ONLY the .PDF's inside a test.com domain.  I 
wish to match the pattern http://www.test.com/snow/squall/index.pdf .  I 
know to start my regex as http://www\.test\.com but how do i ignore all the 
directory stuff and key in on the .pdf extension.

Thanks