Elasticsearch / Lucene Espacio en blanco mal escrito

How can I make Elasticsearch correct queries in which keyword should contain whitespace but instead typed adjacent. E.g.

"thisisaquery" -> "this is a query"

my current settings are:

"settings": {
    "index": {
        "analysis": {
            "analyzer": {
                "autocomplete": {
                    "tokenizer": "whitespace",
                    "filter": [
                        "lowercase", "engram"
            "filter": {
                "engram": {
                    "type": "edgeNGram",
                    "min_gram": 3,
                    "max_gram": 10

preguntado el 17 de mayo de 13 a las 08:05

2 Respuestas

There isn't an out of the box tokenizer/token filter to explicitly handle what you're asking for. The closest would be the compound word token filter which requires manually providing a dictionary file which in your case would may require the full english dictionary to work correctly. Even with that it would likely have issues with words that are stems of other words, abbreviations, etc without a lot of additional logic. It may be good enough though depending on your exact requirements.

contestado el 17 de mayo de 13 a las 16:05

This ruby project claims to do this. You might try it if you're using ruby, or just look at their code and copy their analyzer settings for it :)


Respondido 25 Jul 13, 06:07

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.