Obtenga el valor de retorno de la clase HTMLParser a la clase principal
Frecuentes
Visto 748 equipos
0
Aquí mi código actual:
HTMLParser class:
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
if tag == "a":
for name, value in attrs:
if name == "href":
print value
Clase principal:
html = urllib2.urlopen(url).read()
MyHTMLParser().feed(html)
TODO: Any idea to make "value" can be return to main class? Thank for advance.
1 Respuestas
2
You store information you want to collect on your parser instance:
class MyHTMLParser(HTMLParser):
def __init__(self):
HTMLParser.__init__()
self.links = []
def handle_starttag(self, tag, attrs):
if tag == "a" and 'href' in attrs:
self.links.append(attrs['href'])
then after you have fed HTML into the parser you can retrieve the links
attribute from the instance
parser = MyHTMLParser()
parser.feed(html)
print parser.links
For parsing HTML, I can heartily recommend you look at BeautifulSoup en lugar:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
links = [a['href'] for a in soup.find_all('a', href=True)]
Respondido 12 Feb 14, 08:02
can't use BeautifulSoup because I'm using python 2.7 version - azmilhafiz
@azmilhafiz: BeautifulSoup is an add-on that works on Python 2 and Python 3, including Python 2.7. - Martijn Pieters
No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas python html python-2.7 html-parsing href or haz tu propia pregunta.
¿Afecta la regulación de la esta respuesta ¿ayuda? - miku
it's not really help me. - azmilhafiz