La sintaxis de la fábrica de filtros solr no funciona

So I am attempting to have a custom field in my Solr schema that is filtered and processed a certain way but it doesn't seem to be working.

    <fieldType name="removeWhitespace" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.TrimFilterFactory" />
            <filter class="solr.PatternReplaceFilterFactory" pattern="\s" replacement="" replace="all" />

<field name="whiteSpaceRmved" type="removeWhitespace" stored="true" indexed="true"/>
<copyField source="original" dest="whiteSpaceRmved"/>

Basically, if I have a field like,

Hello World

I want to have that field, and a new field name that looks like,


But when I try it, it copies the field, but doesn't change it in any way. Any ideas?

preguntado el 08 de noviembre de 11 a las 15:11

2 Respuestas

You need to move the tokenizer <tokenizer class="solr.StandardTokenizerFactory" />to the end of your analyzer chain. Currently, it is breaking the field values into tokens before you are removing whitespace. And actually since you are removing whitespace, you might not even need a tokenizer, since it looks like you want to store the values as strings really.

respondido 08 nov., 11:20

Thanks! I was wondering, if I want the field to be shortened, is that the proper syntax? The reason I ask is because when I do a Solr query and I get the resulting XML response, the whitespaceRmved field value does not have the white space removed. It just as the field duplicated. Basically, does using copyField actually perform the analyzers when it copies from another field? - user611105

As per your conversation on @Jayendra's answer. Solr does not store the field in the form of what the analyzers have applied to it, regardless of whether it is inserted as an original field or via the copyfield command. If you really want the field value stored (e.g. return value from the index) in a different format you will need to perform formatting of the data prior to inserting it into the index. - Paige Cook

You should use KeywordTokenizer, which does no actual tokenizing, so the entire input string is preserved as a single token

<fieldType name="removeWhitespace" class="solr.TextField" sortMissingLast="true" omitNorms="true">
     <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.TrimFilterFactory" />
    <filter class="solr.PatternReplaceFilterFactory"
            pattern="(\s)" replacement="" replace="all"

respondido 08 nov., 11:21

If you see my comment to Paige below, the resulting XML still does not apply ANY of the filters to the results. It just copies it word for word with no changes. I'm not even sure if you can apply something to a copied field? - user611105

This is the indexed value and not the stored value. So you would not have HelloWorld return as a part of the response. response will always return Hello World to you. Analysis does not affect stored value. It just processes what would be indexed. - Jayendra

Is there a way to force the stored value to be modified in such a way? I guess a better question is, let's say I have indexed and query analzyers on, when I do a query for Hello World, the ideal case is that the query becomes, hello_world, and that when it searches through Solr, it'll hit upon the WhiteSpaceRmved field and get an exact match with hello_world when the stored value is Hello World. From what you said, that is what is supposed to happen if I understand you correctly? - user611105

yes. The analyses are usually at both Index and Query time, so that the terms are converted at both query and index time to produce a match. Stored values are unaffected terms. - Jayendra

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.