¿Cómo puedo usar Nokogiri para encontrar texto / palabras específicos en una página web?

I am new to nokogiri, but it looks like this would be the tool that I would use to scrape a webpage. I am looking for specific words on a webpage. The words are "Valid", "Requirements Met", and "Requirements Not". I am using watir to drive through the website. I currently have:

page = Nokogiri::HTML.parse(browser.html)

to get the html, but I am not sure where to go from here.

Gracias por la ayuda!

preguntado el 09 de marzo de 12 a las 17:03

you can store text you get by nokogiri in variable and do a regex match against the keywords you need viz 'valid', ... -

You will find it easier if the words are in a tagged element, e.g. <p id="status"></p>, where you can search for the element and then do .inner_text to grab the value. -

3 Respuestas

If you are using Watir to drive the website, I would suggest using Watir to check for the text. You can get all the text on the page using:

ie.text      #Where ie is a Watir::IE

You could then check to see if it has those words are included (by comparing to a regex):

if ie.text =~ /Valid|Requirements Met|Requirements Not/
  #Do something if the words are on the page

That said, if you are looking for a specific bits of text, you can use Watir to look specifically for those elements (and avoid parsing text or html). If you can provide an HTML sample of what you are working on, we can help find a more robust solution.

respondido 09 mar '12, 18:03

This was perfect. I was over thinking it (typical me). I used a variation of the regex and now I am getting the output that I needed! - user1128637

I am not sure why you are using both. You could get the page using 'net/http' or mechanize if you just want to check for text. Anyways, you can check for text in watir with browser.text.match 'Valid', same for nokogiri with page.text.match 'Valid'.

respondido 09 mar '12, 18:03

I tried mechanize first, but it doesn't support javascript so I am not able to "click" the button. So, I switch to watir and it is working perfect. - user1128637

You should also be able to use the .text method from Justin's answer along with the standard ruby string .include? method which returns true or false.

if browser.text.include? /Valid|Requirements Met|Requirements Not/  
  #code to execute if text found
  #code to execute if text not found

This also makes it easy to have a single line validation step if that is what you are after

if using rspec/cucumber

browser.text.should include /Valid|Requirements Met|Requirements Not/

if using test:Unit

assert browser.text.include? /Valid|Requirements Met|Requirements Not/

respondido 09 mar '12, 20:03

I thought this would be possible, but when I tried I got a "can't convert Regexp into String". Is there something I am missing to allow .include? to use regex? - justin ko

.include? may not allow a regex then, so the other methods might be easier since if you can only use a string with .include? then you'd end up with a three way OR in there which might be a bit cumbersome compared to justin's answer - Chuck van der Linden

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.