Regex para hacer coincidir las dos últimas partes de una URL

I am trying to figure out the best regex to simply match only the last two strings in a url.

Por ejemplo con www.stackoverflow.com solo quiero emparejar stackoverflow.com

The issue i have is some strings can have a large number of periods for instance

a-abcnewsplus.i-a277eea3.rtmp.atlas.cdn.yimg.com 

should also return only yimg.com

The set of URLS I am working with does not have any of the path information so one can assume the last part of the string is always .org or .com O algo por el estilo.

What regular expresion will return stackoverflow.com when run against www.stackoverflow.com and will return yimg.com when run against a-abcnewsplus.i-a277eea3.rtmp.atlas.cdn.yimg.com under the condtions above?

preguntado el 14 de enero de 13 a las 06:01

Which language are you using? And what have you tried? -

¿Estás seguro de que te refieres? URL? Sounds more like host. -

Do you need to support domains that end in ".co.uk" or similar? -

If there is URL facility in your language, you can use it to extract the host, then use some simple indexOf to pick it out. -

Sugeriría mirar stackoverflow.com/questions/288810/get-the-subdomain-from-a-url as the answer provides details about the intricacies of parsing domain names. -

4 Respuestas

You don't have to use regex, instead you can use a simple explode función.

So you're looking to split your URL at the periods, so something like

$url = "a-abcnewsplus.i-a277eea3.rtmp.atlas.cdn.yimg.com";
$url_split = explode(".",$url);

And then you need to get the last two elements, so you can echo them out from the array created.

//this will return the second to last element, yimg
echo $url_split[count($url_split)-2];
//this will echo the period
echo ".";
//this will return the last element, com
echo $url_split[count($url_split)-1];

So in the end you'll get yimg.com as the final output.

Espero que esto ayude.

Respondido el 14 de enero de 13 a las 06:01

I don't know what did you try so far, but I can offer the following solution:

/.*?([\w]+\.[\w]+)$/

There are a couple of tricks here:

  1. Use $ to match till the end of the string. This way you'll be sure your regex engine won't catch the match from the very beginning.

  2. Use grouping inside (...). In fact it means the following: match word that contains at least one letter then there should be a dot (backslashed because dot has a special meaning in regex and we want it 'as is' and then again series of letters with at least one of letters).

  3. Use reluctant search in the beginning of the pattern, because otherwise it will match everything in a greedy manner, for example, if your text is :

    abc.def.gh

the greedy match will give f.gh in your group, and its not what you want.

I assumed that you can have only letters in your host (\w matches the word, maybe in your example you will need something more complicated).

I post here a working groovy example, you didn't specify the language you use but the engine should be similar.

def  s = "abc.def.gh"
def m = s =~/.*?([\w]+\.[\w]+)$/
println m[0][1] // outputs the first (and the only you have) group in groovy

Espero que esto ayude

Respondido el 14 de enero de 13 a las 06:01

and what about the urls that include digits, etc. does not seem that [\w] covers these cases - sin aquiles

if you needed a solution in a Perl Regular Expression compatible way that will work in a number of languages, you can use something like that - the example is in PHP

$url = "a-abcnewsplus.i-a277eea3.rtmp.atlas.cdn.yimg.com";

preg_match('|[a-zA-Z-0-9]+\.[a-zA-Z]{2,3}$|', $url, $m);
print($m[0]);

This regex guarantees you to fetch the last part of the url + domain name. For example, with a-abcnewsplus.i-a277eea3.rtmp.atlas.cdn.yimg.com esto produce

yimg.com

as an output, and with www.stackoverflow.com (with or without preceding triple w) it gives you

stackoverflow.com

como resultado

Respondido el 14 de enero de 13 a las 06:01

Una versión más corta

/(\.[^\.]+){2}$/

Respondido 01 Feb 13, 15:02

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.