La inserción masiva de SQL SERVER ignora las líneas deformadas

I have to import SAP unconvered lists. These reports look quite ugly and are not that well suited for automated processing. However there is no other option. The data is borderd around minus and pipe symbols similar to the following example:

02.07.2012
--------------------
Report name
--------------------
|Header1 |Header2  |
|Value 11|Value1 2 |
|Value 21|Value2 2 | 
--------------------

I use a format file and a statement like the following:

SELECT Header1, Header2
FROM  OPENROWSET(BULK  'report.txt',
FORMATFILE='formatfile_report.xml'  ,
errorfile='rejects.txt',
firstrOW = 2,
maxerrors = 100 ) as report

Unfortunately I receive the follwing error code:

Msg 4832, Level 16, State 1, Line 1
Bulk load: An unexpected end of file was encountered in the data file.
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 1
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".

The rejects txt file contains the last row from the file with just minuses in it. The rejects.txt.Error.Txt documents:

Row 21550 File Offset 3383848 ErrorFile Offset 0 - HRESULT 0x80004005

The culprit that raises the error is obviously the very last row that does not conform to the format as declared in the format file. However the ugly header does not cause much problems (at least the one at the very top).

Although I defined the maxerror attribute that very one deformed line kills the whole operation. If I manually delete the last line containing all that minuses (-) everything works fine. Since that import shall run frequently and particularly unattended that extra post-treatment is not serious solution.

Can anyone help me to get sql server to be less picky and susceptible respectively. It is good that it documents the lines that couldn't be loaded but why does it abort the whole operation? And further after one execution of a statement that caused the creation of the reject.txt no other (or the same) statement can be executed before the txt file gets deleted manually:

Msg 4861, Level 16, State 1, Line 1
Cannot bulk load because the file "rejects.txt" could not be opened. Operating system error code 80(The file exists.).
Msg 4861, Level 16, State 1, Line 1
Cannot bulk load because the file "rejects.txt.Error.Txt" could not be opened. Operating system error code 80(The file exists.).

I think that is weird behavior. Please help me to suppress it.

EDIT - FOLLOWUP: Here is the format file I use:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" 
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
   <FIELD ID="EMPTY" xsi:type="CharTerm" TERMINATOR="|" MAX_LENGTH="100"/>
   <FIELD ID="HEADER1" xsi:type="CharTerm" TERMINATOR="|" MAX_LENGTH="100"/>
   <FIELD ID="HEADER2" xsi:type="CharTerm" TERMINATOR="|\r\n" MAX_LENGTH="100"/>
 </RECORD>
 <ROW>
   <COLUMN SOURCE="HEADER1" NAME="HEADER2" xsi:type="SQLNVARCHAR"/>
   <COLUMN SOURCE="HEADER2" NAME="HEADER2" xsi:type="SQLNVARCHAR"/>
 </ROW>
 </BCPFORMAT>

preguntado el 02 de julio de 12 a las 11:07

I found a much better support when I use format file with fixed column width (CharFixed instead of CharTerm). Then you can check some columns for expected content in the where clause. However, since SAP varies the column width this is not an option in my use case. -

Sad to see that it seems true that SQL Server is just not able to handle a row that doesn't 100% comply to the provided format. Why can't it just ignore and log the row and then continue instead of aborting the whole import immediately. Even worse, a kind of a error file is created and as long that one is present (not deleted by user or external program) no further import can be started! Isn't that strange behavior of professional software? -

3 Respuestas

BULK INSERT is notoriously fiddly and unhelpful when it comes to handling data that doesn't meet the specifications provided.

I haven't done a lot of work with format files, but one thing you might want to consider as a replacement is using BULK INSERT to drop each line of the file into a temporary staging table with a single nvarchar(max) columna.

This lets you get your data into SQL for further examination, and then you can use the various string manipulation functions to break it down into the data you want to finally insert.

Respondido 04 Jul 12, 02:07

Thank you for your input. Actually I have to agree that sql server bulk import is not very sophisticated. And not just import but also conversion of data is cruel. What a shame for such a commercial product. Anyway, that trick with a one column temp table seems to be a kind of common practice. A Colleague had told me the very same thing. I hesitate to do so though, since I'm afraid of negative performance impacts! - Toby

I don't have a lot of metrics about preprocessing BULK INSERT like this. If it's just the one line at the end of file that's causing the issue, you might want to think about making a small console utility to look for that final line in the file and trim it out. That way you have something you can wrap up into the automated upload process. - Mikurski

This is actually pretty much what I ended up doing. Not very charming though! All that trouble just due to insufficiencies of both programs, SAP as well as SQL Server. I'm wondering which product is worse? (I personally would vote for SAP.) - Toby

SQL is actually really handy, but I think it focuses more on internal data handling and leaves the programmer to develop their own interfaces for data import and export. - Mikurski

I was in the same trouble, but using bcp command line the problem was solved, its simply dont take the last row

Respondido el 20 de Septiembre de 16 a las 22:09

Hi, welcome to stackoverflow. Please describe the answers more. Clear answer will help people understand what you mean and will increase the chance of selecting as an answer - Ashkan Sirous

I had the same problem. I had a file with 115 billions of rows so manually deleting the last row was not an option, as I couldn't even open the file manually, as it was too big.

Instead of using BULK INSERT command, I used the bcp command, which looks like this: (Open a DOS cmd in administrator then write)

bcp DatabaseName.dbo.TableNameToInsertIn in C:\Documents\FileNameToImport.dat -S ServerName -U UserName -P PassWord

It's about the same speed as bulk insert as far as I can tell (took me only 12 minutes to import my data). When looking at activity monitor, I can see a bulk insert, so I guess that it logs the same way when the database is in bulk recovery mode.

Respondido 06 Oct 17, 16:10

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.