Obtener enlace de Mechanize/Nokogiri

I am trying to discover the best way to retrieve the a href link from a Nokogiri Node. Here is where I am at

mech = Mechanize.new 

mech.page.search('.listing_content').each do |business| 
  website = business.css('.website-feature')
  puts website.class
  puts website.inner_html

salida =>

<a href="http://urlofsite.com" class="track-visit-website no-tracks"  onclick='omniture.callClick({"eVar6":6,"eVar9":1,"eVar21":"search_results","eVar50":null,"prop17":"cars","prop26":"64c15af0-a558-012f-a041-00215a4685f6","eVar42":"64c15af0-a558-012f-a041-00215a4685f6","prop27":6,"prop38":"search_results","prop39":1,"prop46":null,"events":"event6,event7","eVar51":optimostIDs.trialID.toString(),"eVar52":optimostIDs.segmentID.toString(),"eVar53":optimostIDs.creativeID.toString(),"eVar54":optimostIDs.subjectID.toString(),"prop47":null,"prop51":optimostIDs.trialID.toString(),"prop52":optimostIDs.segmentID.toString(),"prop53":optimostIDs.creativeID.toString(),"prop54":optimostIDs.subjectID.toString(),"prop56":"Saint+George%2C+UT","prop57":null,"prop58":false,"prop59":null,"eVar60":"relevancyTest2","prop60":"relevancyTest2","prop61":false,"prop62":null,"prop64":null,"prop67":null,"prop68":null,"prop70":null,"prop71":null});; atti_logs.attiClick({"iid":"651691e0-a558-012f-2ca7-18a9053c171a","lt":6,"ptid":"www.yellowpages.com","rid":"vendetta-236e7298-3a4f-4744-8ff5-4eb5fcc8e188","ypid":3848879,"lid":3848879,"vrid":"64c15af0-a558-012f-a041-00215a4685f6","nav":null});' rel="nofollow" target="_blank" title="Executive Service Ctr Website"><span class="raquo">»</span>  Website</a>

Basically, I just need to get the http://urlofsite.com fuera de la inner_html, and I'm not sure how to do that. I've read about doing it with CSS and XPATH but I can't get either to work at this point. Thanks for any help

preguntado el 01 de julio de 12 a las 04:07

1 Respuestas

First, tell Nokogiri to get a node, rather than a NodeSet. at_css will retrieve the Node and css retrieves a NodeSet, which is like an Array.

En lugar de:

website = business.css('.website-feature')


website = at_css('a.track-visit-website no-tracks')

to retrieve the first instance of an <a> nodo con class="website-feature". If it's not the first instance you want then you'll need to narrow it down by grabbing the NodeSet and then indexing into it. Without the surrounding HTML it's difficult to help more.

Para obtener el href parameter from a Node, simply treat the node like a hash:


debe regresar:


Here's a little sample from IRB:

irb(main):001:0> require 'nokogiri'
=> true
irb(main):003:0*   html = '<a class="this_node" href="http://example.com">'
=> "<a class=\"this_node\" href=\"http://example.com\">"
irb(main):004:0> doc = Nokogiri::HTML.parse(html)
=> #<Nokogiri::HTML::Document:0x8041e2ec name="document" children=[#<Nokogiri::XML::DTD:0x8041d20c name="html">, #<Nokogiri::XML::Element:0x805a2a14 name="html" children=[#<Nokogiri::XML::Element:0x805df8b0 name="body" children=[#<Nokogiri::XML::Element:0x8084c5d0 name="a" attributes=[#<Nokogiri::XML::Attr:0x80860170 name="class" value="this_node">, #<Nokogiri::XML::Attr:0x8086047c name="href" value="http://example.com">]>]>]>]>
irb(main):006:0*   doc.at_css('a.this_node')['href']
=> "http://example.com"

Respondido 01 Jul 12, 05:07

Thanks for the info. Whenever I try to grab the node with at_css('a.track-visit-website no-tracks') it's returning a nilclass. I'm going to edit my post go ahead and take a look - ruevaughn

After going through it one more time I was able to get it exactly as you described. Thanks for helping, the tin man has a heart after all ;) - ruevaughn

I'm glad it worked. Nokogiri is an awesome XML/HTML parser so thank that team. - el hombre de hojalata

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.