Importar archivo XML no válido (1.5G) en MySQL con ampersand sin escape

I can't import my big xml file (1,5g) into database. Then I use XMLReader->read() i have error where element have a ampersand. maybe you can help me where I convert invalid XML file to valid?

I use tidy, xmlsoft, sed on Windows 7 but this command line software breaks on limit memory error.

PHP:

$reader = new XMLReader();
$reader->open('sm.xml');

    while ($reader->read())
        {
        // check to ensure nodeType is an Element not attribute or #Text
            if ($reader->nodeType == XMLReader::ELEMENT)
                    {
                        if ($reader->localName == 'brand')
                                {
                                    $reader->read();
                                    $data['brand'] = $reader->value;
                                }
                        if ($reader->localName == 'number')
                                {
                                    $reader->read();
                                    $data['number'] = $reader->value;
                                }
                        if ($reader->localName == 'descr')
                                {
                                    $reader->read();
                                    $data['descr'] = $reader->value;
                                }

                        if ($reader->localName == 'price')
                                {
                                    $reader->read();
                                    $data['price'] = $reader->value;
                                }
                        if ($reader->localName == 'deadline')
                                {
                                    $reader->read();
                                    $data['deadline'] = $reader->value;
                                }
                        if ($reader->localName == 'rest')
                                {
                                    $reader->read();
                                    $data['rest'] = $reader->value;
                                }
            } //Checking if the </person>tag is reached.
            elseif($reader->nodeType == XMLReader::END_ELEMENT AND $reader->name == 'article')
                {

                    $sql = 'INSERT INTO tec (brand_name,brand_art,name_tov,cena,srok,kolvo) 
  VALUES ("'.$data['brand'].'","'.$data['number'].'","'.$data['descr'].'","'.$data['price'].'","'.$data['deadline'].'","'.$data['rest'].'");';
    $mysqli->query($sql);

                // Insert the content of array $data to database or some other action.
                //print_r($data);

                }
}

If this code read element <number>111&111</number> I have an error. I can remove this ampersand using a command line tool, but I have out of memory on very big xml file.

My example run:

xmllint.exe --recover --maxmem 10000000000 --noout --encode utf8 sm.xml -o smtt.xml
tidy.exe -m -utf8 -xml sm.xml
sed.exe 's/&/\&amp;/g; s/&amp;amp;/\&amp;/g; s/&amp;quot;/\&quot;/g;' sm.xml > smtt.xml <-- can't run

Maybe have other way use PHP XMLReader with skip validation?

preguntado el 05 de mayo de 13 a las 20:05

Can we see your PHP code as well? Is the problem that you have an ampersand, or are you running out of memory? What is the exact error you get? -

problem where i have ampersand, i cant escaped or skip this char? out of memory i have then revalidate a big xml file for prepare read for php xmlreader. Error php warning : XMLReader->read() <number>111&111</number> -

Right, so you have two errors: one is that if you try to fix the invalid XML, you run out of memory, and the second is that if you don't fix the invalid XML, you get a reader error. The second one is expected, so you should try to fix your XML. 1. exactamente which command line utility ran out of memory? 2. Can you replace that with another utility? -

xmllint and tidy have out of memory, sed.exe not run, but i find option for start this utility. i have not found another utility... i think this unix port help me, but two utility out of memory. if they can not find other ways to solve my problem - i write new my "xml reader" which will be read line by line from a file. -

Who generated this bad XML? How bad is it: do you even know? Is fixing 1.5Gb of bad data really going to be easier than fixing the program that generated it? -

1 Respuestas

XMLMax editor (from xponentsoftware) will locate the error and allow you to fix it in its virtual text editor. 1.5 GB should be no problem.

Disclaimer: I am affiliated with the vendor.

Respondido el 05 de junio de 13 a las 18:06

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.