Obtener algunos datos de una página web

He utilizado this tutorial to fetch all the content of some webpage via c# code.

I now want to gather into an IEnumerable collection all the strings which are decorated in the following text pattern: (i.e. MY-TEXT)

data-address=" MY-TEXT "></

How can I do that? I tried using "string.split()" but got to many "white noises".

¿Alguna idea?

preguntado el 27 de agosto de 11 a las 17:08

What webpage is that? Is it HTML (which doesn't have any data-address attribute AFAIK)? Or XML? -

3 Respuestas

Una mejor solución es usar HtmlAgilityPack and let it handle the parsing/scraping for you. Here is an example:

var web = new HtmlWeb();
var doc = web.Load("http://www.stackoverflow.com");

var nodes = doc.DocumentNode.SelectNodes("//[@data-address]");

foreach (var node in nodes)
{
    Console.WriteLine(node.Attributes["data-address"].Value);
}

This will fetch stackoverflow.com, find all elements which has a data-address attribute and then print the value of that attribute.

Respondido 27 ago 11, 21:08

few questions:1) I got the following error:"Expression must evaluate to a node-set". What went wrong? 2)how did you get to this opensource dll? just for me to know for the next time. - Elad Benda

If the page is well formed I'd load the content into an XDocument and query over it with LINQ to XML.

Respondido 27 ago 11, 21:08

You (probably) can't load HTML into an XDocument, event if it is well-formed. - svick

@alexn is right. A small correction though:

  var nodes = doc.DocumentNode.SelectNodes("//*[@data-address]");

added the *

Respondido 27 ago 11, 23:08

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.