I'm using a pattern as described by John Gruber in this daringfireball article to auto link URLs in user comments.

I'm using it with PHP to match URLs, and want it to match a single TLD with no www and no trailing slash, but it doesn't seem to be working.

Here's the pattern (and can be seen in more detail at the article above):

$pattern  = '#(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4})(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))#';

Specifically I'm looking at this particular subpattern: [a-z0-9.\-]+[.][a-z]{2,4}

This subpattern works separately, but as a part of the larger pattern, it doesn't match google.com.

Even if you can get it to match google.com, it certainly won't match, for example, annebjerggaard.museum. -

you know you must escape all . characters right? -

The dots are inside square brackets, so the OP is okay. -

Is t possible that you are overcomplifying things? I am sure there are simpler patterns to match url's -

[a-z0-9.\-]+[.][a-z]{2,4} works as you expect, but the rest of the pattern requires at least 1 following character:



You can try allowing the tail to be optional, but it may in turn give you false-positives rather than excluding false-negatives:

$pattern  = '#...“”‘’])?)#';  # '...' for brevity
#                      ^

Awesome, thanks. I ended up doing what you mentioned and instead of matching [a-z]{2,4}, I just specified some domain suffixes so as not to match false-positives. - Calvin

Funciona para mi:


Are you sure there is nothing in you php that is tripping you up?


It's because the full pattern expects to find something before the domain name, like http:// or www

The subpattern there works when it's by itself, but doesn't when it's a part of the larger pattern, try it. - Calvin

