¿La mejor manera de obtener recuentos de frecuencia de palabras para un sitio web? ¿O parte de un sitio web?

Pretty simple, I just looking for a simple means of extracting word frequencies from a given website, or section of a website.

I am also interested in calculating average distance between two given words throughout a website. The units of distance being in words.

I am asking this question because I quite frankly haven't been able to find much information leading to the intuition of performing such a task. I don't have any experience with web spidering or scraping of any kind.

Thanks (I asked this question earlier, but it wasn't well formed)

Maybe you can get some ideas by searching 'python str_word_count'. (str_word_count is a PHP function which return number of words counts of string) -

Podrías intentar usar Scrapy. It is quite powerful tool for scrapping websites, but may require knowledge of regular expressions and XPath. Try to follow tutoriales.

