¿Biblioteca liviana capaz de sugerir diferentes ortografías de palabras de un conjunto limitado?

I was looking for lightweight library that'd allow me to feed it a bunch of words, and then ask it whether a given word would have any close matches.z

I'm not particularly concerned with the underlying algorithm (I reckon a simple hamming distance algorithm would probably suffice, were I to undertake the task myself).

I'm just in the development of a small language and I found it nifty to make suggestions to the user when an "Undefined class" error is detected (lots of times it's just a misspelled word). I don't want to lose much time on the issue though.

Muchas Gracias

preguntado el 04 de diciembre de 12 a las 04:12

2 Respuestas

Levenshtein distance is a common way of handling it. Just add all the words to a list and then brute-force iterate over it and return the smallest distance. Here's one library with a Levenschtein function: http://commons.apache.org/lang/api-2.4/org/apache/commons/lang/StringUtils.html

If you have a large number of words and you want it to run fast, then you'd have to use ngrams. Spilt each word into bigrams and then add (bigram, word) to a map. Use the map to look up the bigrams in the target word, and then iterate through the candidates. That's probably more work than you want to do, though.

Respondido el 04 de diciembre de 12 a las 04:12

Speed is not an issue, I bet there aren't going to be at any time more than 2-3 words to search for, and at max something like 20-30. It will always be instant. - elisio devorado

not necessarily a library but i think este artículo may be really helpful. it mostly describes the general workings of how a spelling corrector works in python, but also has a link for a java implementation which you may use if that is what you are looking for specifically (note that I haven't specifically used the java one before)

Respondido el 04 de diciembre de 12 a las 04:12

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.