¿Qué está ralentizando mi archivo por lotes?

fooling around a bit with batch files, and wondering why there is a huge difference in time required to output a file in below scenarios :

Scenario 1 : Simple traverse through a log file,and for every row always taking the 5th token, unless it contains a filter string.

(for /f "tokens=5" %%a in (test.log) do @echo(%%a) |  findstr /v "filter_1 filter_2" > !filter!.txt

This works great, going through a 50M file returns me a smaller 10Mb file in 10 seconds.

Scenario 2 : Do exactly the same, but add something in front and end of the token so I can output as an xml file rather than a text file. To do so I had to rebuild it a bit as below

echo ^<rows^> > test.xml

>>test.xml (
for /f "tokens=5" %%a in (
    'findstr /v "filter1 filter2" test.log'
    ) do echo ^<r a="%%a"/^> 

echo ^</rows^> >> test.xml

It works as expected for small files,but takes like forever for large files. Is there anyway to achieve what I want with scenario 2 but using the scenario 1 syntax, as that seems much more efficient.

preguntado el 28 de mayo de 14 a las 14:05

repl.bat is an efficient tool, as is findrepl.bat for locating and changing text. if you explain the task then you can get some help with them, if you need it., -

Por favor lee aquí for a possible reason of your problem. -

1 Respuestas

FOR /F always buffers the content of the IN() clause prior to beginning any iterations. This is true for both reading a file, as well as processing the output of a command. However, I believe there is some fundamental difference in how command output is buffered that makes it particularly slow with large output. Edit: MC ND has a nice explanation for why buffering of large output is so slow.

Most people are surprised to learn that sometimes the fastest batch solution is to write the command output to a temp file, and then use FOR /F to read the temp file. This will be fast as long as your disk drive is fast.

I believe the following will speed things considerably:

findstr /v "filter1 filter2" test.log >test.log.mod
>test.xml (
  echo ^<rows^>
  for /f "tokens=5" %%A in (test.log.mod) do echo ^<r a="%%A"/^>
  echo ^</rows^>
del test.log.mod

Another option would be to add the XML wrapper to the left side of your original pipe, and then modify your FINDSTR filters appropriately. But the above solution may still be faster, depending on the number of lines that get filtered out.

  echo ^<rows^>
  for /f "tokens=5" %%A in (test.log) do echo ^<r a="%%A"/^>
  echo ^</rows^>
) | findstr /v /c:"modifiedFilter_1" /c:"modifiedFilter_2" > test.xml

The FINDSTR will also need the /R option if the filters are regular expressions.

But a far faster solution would be to use something like sed for Windows, or either of the JScript/Batch hybrid utilities, my REPL.BAT, or Aacini's FINDREPL.BAT.

contestado el 23 de mayo de 17 a las 13:05

Wow, that's a huge difference. Thanks a lot ! - Wokoman

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.