I'm interested in finding a more-sophisticated-than-typical algorithm for finding differences between strings, that can be "tuned" via some parameters, to balance between such things as "maximize count of identical characters" vs. "maximize the length of spans" vs. "try to keep whole words intact".
Ultimately, I want to be able to make the results as human readable as possible. For instance, if a long sentence has been replaced with an entirely new sentence, where the only things it has in common with the original are the words "the" "and" and "a" in that order, I might want it treated as if the whole sentence is changed, rather than just that 4 particular spans are changed --- just like how a reasonable person would see it.
I'm actually ok with something that uses Monte Carlo methods or the like, if its results are better. Computation time is not an issue (within reason), nor is determinism.
Note: although this is beyond the scope of what I'm asking, I'll throw one more thing out there just in case: It would also be great if it could recognize changes that are out of order....for instance if someone changes the order of two paragraphs while leaving them otherwise identical, it would be awesome if it recognized it as a simple move, rather than as one subtraction and and one unrelated addition.
preguntado el 27 de agosto de 11 a las 19:08