¿Por qué no coincide este patrón de URL?

I'm using a pattern as described by John Gruber in this daringfireball article to auto link URLs in user comments.

I'm using it with PHP to match URLs, and want it to match a single TLD with no www and no trailing slash, but it doesn't seem to be working.

Here's the pattern (and can be seen in more detail at the article above):

$pattern  = '#(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4})(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))#';

Specifically I'm looking at this particular subpattern: [a-z0-9.\-]+[.][a-z]{2,4}

This subpattern works separately, but as a part of the larger pattern, it doesn't match google.com.

preguntado el 28 de agosto de 11 a las 05:08

Even if you can get it to match google.com, it certainly won't match, for example, annebjerggaard.museum. -

you know you must escape all . characters right? -

The dots are inside square brackets, so the OP is okay. -

Is t possible that you are overcomplifying things? I am sure there are simpler patterns to match url's -

2 Respuestas

[a-z0-9.\-]+[.][a-z]{2,4} works as you expect, but the rest of the pattern requires at least 1 following character:

google.com/
google.com?lang=en-us
google.com#!foo/bar

etc.

You can try allowing the tail to be optional, but it may in turn give you false-positives rather than excluding false-negatives:

$pattern  = '#...“”‘’])?)#';  # '...' for brevity
#                      ^

Respondido 28 ago 11, 09:08

Awesome, thanks. I ended up doing what you mentioned and instead of matching [a-z]{2,4}, I just specified some domain suffixes so as not to match false-positives. - Calvin

Funciona para mi:

http://regexr.com?2uica

Are you sure there is nothing in you php that is tripping you up?

EDITAR

It's because the full pattern expects to find something before the domain name, like http:// or www

Respondido 28 ago 11, 09:08

The subpattern there works when it's by itself, but doesn't when it's a part of the larger pattern, try it. - Calvin

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.