¿La mejor manera de obtener recuentos de frecuencia de palabras para un sitio web? ¿O parte de un sitio web?

Pretty simple, I just looking for a simple means of extracting word frequencies from a given website, or section of a website.

I am also interested in calculating average distance between two given words throughout a website. The units of distance being in words.

I am asking this question because I quite frankly haven't been able to find much information leading to the intuition of performing such a task. I don't have any experience with web spidering or scraping of any kind.

Thanks (I asked this question earlier, but it wasn't well formed)

preguntado el 15 de mayo de 13 a las 03:05

Maybe you can get some ideas by searching 'python str_word_count'. (str_word_count is a PHP function which return number of words counts of string) -

1 Respuestas

Podrías intentar usar Scrapy. It is quite powerful tool for scrapping websites, but may require knowledge of regular expressions and XPath. Try to follow tutoriales.

Respondido el 18 de junio de 13 a las 15:06

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.