Regex Eliminar Markup Python

Have a string:

myString = '<p>Phone Number:</p><p>706-878-8888</p>'

Trying to regex out all HTML tags, in this case Paragraphs.


preguntado el 30 de enero de 12 a las 19:01

Don't use Regex to parse (X)HTML. Use a parser. BeautifulSoup comes to mind. -

I would link directly to the answer of that question @Hamish: :-PAG -

2 Respuestas

Usar BeautifulSoup as pointed out by a comment:

>>> from BeautifulSoup import BeautifulSoup
>>> BeautifulSoup(myString).text
u'Phone Number:706-878-8888'

Respondido el 30 de enero de 12 a las 23:01

Perfect! I kept trying attribute 'string' instead of text. Much thanks! - Hikalea

Utilizan re.sub:

>>> re.sub('<[^>]+>', '', '<p>Phone Number:</p><p>706-878-8888</p>')
'Phone Number:706-878-8888'

Usar re is a good solution if you just want to remove tags. But, if you're want to do things a little bit more complicated (involving HTML parsing) I suggest you to look into BeautifulSoup.

Respondido el 30 de enero de 12 a las 23:01

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.