Solr DataImportHandler no funciona con archivos XML

I'm very new to Solr. I succeeded in indexing data from my sql database via DIH. Now I want to import xml files and index them also via DIH but it just won't work! My data-config.xml looks like this:

<dataConfig>
    <dataSource type="FileDataSource" encoding="UTF-8" />
    <document>
    <entity name="dir" 
            processor="FileListEntityProcessor" 
            baseDir="/bla/test2" 
            fileName=".*xml"
            stream="true"
            recursive="false"       
            rootEntity="false">
            <entity name="PubmedArticle"
                    processor="XPathEntityProcessor"
                    transformer="RegexTransformer"
                    stream="true"
                    forEach="/PubmedArticle"
                    url="${dir.fileAbsolutePath}">


                <field column="journal" xpath="//Name[.='journal']/following-sibling::Value/text()" />
                <field column="authors" xpath="//Name[.='authors']/following-sibling::Value/text()" />

             ..etc

And i have the following fields in schema.xml:

<field name="journal" type="text" indexed="true" stored="true" required="true" /> <field name="authors" type="text" indexed="true" stored="true" required="true" />

When i run Solr i get no errors and no document is indexed:

<str name="Total **Rows Fetched**">**2000**</str>
<str name="Total **Documents Skipped**">**0**</str>
<str name="Full Dump Started">2012-02-01 14:59:17</str>
<str name="">Indexing completed. **Added/Updated: 0 documents.** Deleted 0 documents.

Can anyone tell me what i did wrong?! I have even double checked the path syntax...

preguntado el 01 de febrero de 12 a las 14:02

2 Respuestas

I'd suggest reviewing the answers to a similar question:

Necesita ayuda para indexar archivos XML en Solr usando DataImportHandler

Using a scripting language like groovy is a lot less complicated and easier to test.

contestado el 23 de mayo de 17 a las 12:05

Well i'm not familiar with groovy, although, the example looks easy, but i still don't even know what to do with that script! However, i did find out that it indeed has something to do with the xpath expression. The Xml files look like this: <Feature> <Name className="java.lang.String">journal</Name> <Value className="java.lang.String">The Journal of organic chemistry</Value> </Feature>.Although the expression is correct, when i change it to only '//Name' DIH indexes document, but thats not what i want. //Name[text()='journal'] won't work as well :( I just can't understand why! - miel

I recently encountered the same problem when trying the same thing; i.e., when using FileListEntityProcessor (to read multiple local .xml files) and XPathEntityProcessor (to grab certain XML elements).

Causa principal: is in this line:

<field column="journal" xpath="//Name[.='journal']/following-sibling::Value/text()" />

Explicación: the argument for the xpath attribute ("//Name..."), while valid xpath syntax, is NOT supported by Solr. The "Apache Solr 4.4 Reference Guide" simply says: The XPath expression which will extract the content from the record for this field. Only a subset of Xpath syntax is supported.

solución: Change the argument for xpath to be the full path from the document root:

<field column="journal" xpath="/full/path/from/root/of/document/Name[.='journal']/following-sibling::Value/text()" />

Respondido 17 Oct 13, 15:10

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.