¿Existe una joya rubí que difiera entre los documentos HTML?

Doing a diff of two different html documents turns out to be an entirely different problem than simply doing a diff of plain text. For example, if I do a naive LCS diff between:

Google</p>

y la

Google</a></p>

the diff result is NOT:

</a>

pero

/a></

I've tried most gems out there that claim to be html diff but all of them seem to be just implementing text based LCS diff. Is there any gem that does a diff while taking html tags into account?

preguntado el 01 de febrero de 12 a las 14:02

I don't know of any, but that doesn't mean that they don't exist. It would be interesting to create such a gem, using Nokogiri to generate comparable element trees and do a tree-based diff. Try searching the official gem repo at rubygems.org -

2 Respuestas

respondido 11 mar '15, 19:03

Both of these only support diffing plain text and outputting HTML diffs, not diffing HTML and outputting HTML. - Ruxton

@ruxton did you end up finding what you were looking for? - Richardsondx

@Richardsondx I don't recall what, but I do recall it being the most annoying part of the system I was working on. - Ruxton

@Ruxton not true anymore, Diffy works well with html! - Ulysse BN

After much searching for a gem to do this for me, I discovered that I can simply do a string compare between two parsed Nokogiri documents:

def should_match_html(html_text1, html_text2)
  dom1 = Nokogiri::HTML(html_text1)
  dom2 = Nokogiri::HTML(html_text2)
  dom1.to_s.should == dom2.to_s
end

Then you can simply add this in your spec:

should_match_html expected_html, actual_html

The best part is that the built-in rspec matcher will automatically provide you a line-by-line diff result of the mismatched lines.

Respondido 05 Oct 13, 02:10

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.