Obtenga el valor de retorno de la clase HTMLParser a la clase principal

Aquí mi código actual:

HTMLParser class:

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        if tag == "a":
            for name, value in attrs:
                if name == "href":
                    print value

Clase principal:

html = urllib2.urlopen(url).read()
MyHTMLParser().feed(html)

TODO: Any idea to make "value" can be return to main class? Thank for advance.

preguntado el 12 de febrero de 14 a las 08:02

¿Afecta la regulación de la esta respuesta ¿ayuda? -

it's not really help me. -

1 Respuestas

You store information you want to collect on your parser instance:

class MyHTMLParser(HTMLParser):
    def __init__(self):
         HTMLParser.__init__()
         self.links = []

    def handle_starttag(self, tag, attrs):
        if tag == "a" and 'href' in attrs:
            self.links.append(attrs['href'])

then after you have fed HTML into the parser you can retrieve the links attribute from the instance

parser = MyHTMLParser()
parser.feed(html)
print parser.links

For parsing HTML, I can heartily recommend you look at BeautifulSoup en lugar:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html)
links = [a['href'] for a in soup.find_all('a', href=True)]

Respondido 12 Feb 14, 08:02

can't use BeautifulSoup because I'm using python 2.7 version - azmilhafiz

@azmilhafiz: BeautifulSoup is an add-on that works on Python 2 and Python 3, including Python 2.7. - Martijn Pieters

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.