Datos de etiquetas html Regex

I'm getting (HTTP request) and I'm trying to get certain data out of it by using a regex, for example this part of the HTML:

<tr><th>Continent:</th><td class='trc'>Europe (EU)</td></tr>

How can I get the 'Europe (EU)' out of this?

I've tried this regex:

/<th>Continent:<\/th><td class='trc'>(.+)\s<\/td>/

Pero esto no funciona

preguntado el 05 de mayo de 13 a las 15:05

You should not be using regexes to parse HTML. Use an HTML parser for that... -

This is for a mIRC script but I figured that regexes are the same in mIRC scripting language as in PHP? -

@plalx depending on the intent using a full blown SGML parser to extract a single bit of data is like attacking a rubber boat with naval artillery. There are plenty use cases for preferring to simple extract a few simple bits of data from HTML with regular expressions over a full blown parser. It's often even more resilient too since the regex method will survive minor changes in the source page structure. -

1 Respuestas

You are telling the regex to look for a space followed by </td>

/<th>Continent:<\/th><td class='trc'>(.+)\s<\/td>/  
                                         ^^

Recomendaría usar [^<>]+ to search for text between html tags.

/<th>Continent:<\/th><td class='trc'>([^<>]+)<\/td>/

contestado el 05 de mayo de 13 a las 15:05

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.