¿Cómo implementar un servicio de monitoreo de redes sociales / sitios web?

i would like to implement some kind of service my customers can use to find their company on a. blogs, forums b. facebook, twitter c. review sites

a. blogs, forums This can only be done by a crawler, right? A crawler looking for the robots.txt on a forum/blog and than optionally reading the content (and of course links) of the forum/blog. But where to start? Can i use a set of sites to start with crawling? do i have to predefine them or can i use some other searchengine first? E.g. searching in Google for that company and then crawl the SERPs? Legal?

b. facebook, twitter They have APIs, so hat should not be a problem i think.

c. review sites I looked at some review site's TOS and they wrote that using an automated software crawling their sites is not permitted. On the other hand, the sites that are relevant to me are not disallowed in their robots.txt. What matters here?

Any other hints are welcome.

Gracias por adelantado :-)

preguntado el 08 de enero de 11 a las 15:01

1 Respuestas

Honestly, the easiest way to do it would be to start with the search engines. They all have APIs for doing automated searches, so that'd probably give yout he highest return for your time on getting back links/mentions of your client's products or brand.

That won't handle things behind authentication, only public stuff (of course). But it'll give you a good baseline to start with. From there, you could (if you want) use API's or custom-written bots that are given auth creds on the sites, but honestly I think at that point you're missnig the core question, I think.

Is the core question, "Where are we mentioned?" or is the core question really... "What sites are getting traffic to come to us?" In most cases, it's the latter, in which case you can ignore all of what I said previously and just use Google Analytics, or similar software on your client's site to determine where traffic's coming from.

Editar Ok, so if it's where are we mentioned, I'd still start w/ the search engines as stated. Google's api is pretty easy and it has a SOAP based one that you can pull in as a web reference if you want; ejemplo

Re: review sites. If the site's TOS says you can't use automated bots, then it's a good idea not to use automated bots. The robots.txt is not legally binding (it's sort of a good-neighbor thing), and so I wouldn't not use the lack of exclusion there to be permission. Some review sites (more modern ones) might disallow automated scraping of their site, but they might still publish RSS feeds or Atom feeds or have some other API that you can hook into, that's worth checking.

Respondido el 08 de enero de 11 a las 19:01

The core question realls is "Where are we mentioned?" - nogamawa

So therefore the hint with the search engines is the right one. Any sources for that? - nogamawa

So my problem with the review sites has to be clarified. - nogamawa

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.