La actualización de la tabla lleva mucho tiempo

I have a table in SQL Server 2008 (SP2) containing 30 million rows, table size 150GB, there are a couple of int columns and two nvarchar(max) columns: one containing text (from 1-30000 characters) and one containing xml (up to 100000 characters).

Table doesn't have any primary keys or indexes (its is a staging table). So I am running a query:

UPDATE [dbo].[stage_table] 
SET [column2] = SUBSTRING([column1], 1, CHARINDEX('.', [column1])-1);

the query is running for 3 hours (and it is still not completed), which I think is too long. Is It? I can see that there is constant read rate of 5MB/s and write rate of 10MB/s to .mdf file.

How can I find out why the query is running so long? The "server" is i7, 24GB of ram, SATA disks on RAID 10.


table contains one int column, two nvarchar(20) columns and two nvarchar(max) columns. Column1 and Columns2 in the update clause above are nvarchar(20) columns. The "big" columns are not updated.

Muchas gracias!

preguntado el 08 de enero de 11 a las 16:01

Are the updates columns indexed? -

5 Respuestas

Honestly, that's a huge amount of work that you're doing (text searching and replacing on 150 gigabytes). If the staged data originated outside the database you might consider doing the text operations there, without any of the database overhead.

Respondido el 08 de enero de 11 a las 19:01

Thanks for your answer. I have updated the question. The column1 and column2 are nvarchar(20) columns, so the text being searched is not that huge, only the table is huge. - rrejc

I suspect it's still true that you'd be better off doing this outside the database. There's a lot of overhead in processing and updating each one of those rows — not as much as if you were operating on the entire text base, but still a lot. - Larry Lustig

You are doing some string manipulation on a field - something that SQL is notoriously bad at. Consider writing a SQL CLR function that does what you need and use that instead of SUBSTRING([column1], 1, CHARINDEX('.', [column1])-1).

Respondido el 08 de enero de 11 a las 19:01

If there is no selection criteria, why does the lack of indexes matter? - sgmoore

I don't see how an index can improve the query. It just has to be a full table scan. - bernd_k

Why would an index speed this UPDATE up? There's no WHERE clause. In fact, for this UPDATE an index would slow things down (because of the time needed to update the indexes). - Larry Lustig

A practical way to test if something is out of the ordinary is to only update some of the data. Write a view that selects say the top 10,000 rows, and run the update against the view.

If 10,000 updates in what you would expect to be "normal" for your server, then it would follow that it is just "a lot of data to update".

If this small updates seems unduely long, then investigate more.

At least this gives you a decent testing ground.

Respondido el 08 de enero de 11 a las 22:01

I haven't done this kind of processing in SQL Server, so I'm not sure if the advice fully apply. But I'm confident enough to suggest you try it though.

What I usually do in Oracle is to avoid updates entirely when processing ALL rows in a situation like the one you describe (single user, batch event).

Either I migrate the logic from the update statement back to the statement that inserted the rows. Or if this is not possible, I create a new table and put the update logic in the select list. For example, instead of doing

UPDATE [dbo].[stage_table] 
SET [column2] = SUBSTRING([column1], 1, CHARINDEX('.', [column1])-1);

Yo lo haría:

create table stage_table2 as
   select column1
         ,substring(column1, 1, charindex('.', column1)-1) as column2
     from stage_table;

drop table stage_table;

alter table stage_table2 rename to stage_table;
-- re-create indexes and constraints, optionally gather statistics

I could also do this with parallel query and nologging option to generate very little redo and no undo at all, which would outperform an update statement with such a large marginal it's not even funny :) Of course this is because of Oracle internals, but I think it would be possible to replicate it with SQL Server as well. There is something in your description that may make this a less efficient approach though. You had some really large text columns that you would have to "drag along" in the CTAS statement.

Also, you need to investigate your hardware setup because it is not fit to work with the amount of data you have thrown at it. Either there is something wrong with the configuration, or you have a lot of other activity going on:

I can see that there is constant read rate of 5MB/s and write rate of 10MB/s to .mdf file.

I can beat that on my girlfriends 2 year old laptop. Given a read speed of 5 mb/s and a table of 150 GB, it would take 8,5 hours to scan through the table just once. This is assuming that the database adds 0% overhead, which is no es el caso.

Respondido el 09 de enero de 11 a las 01:01

There are a few options here. But without more information regarding what you intend to do with the data after this update is performed, Larry Lustig's answer sounds like the most appropriate. But other options follow:

  • Create column2 as a calculated column instead of a physical column.
  • Perform the calculation as you pull the data from the staging table (which is also what would take place if you go with the previous bullet.
  • Index column2 and then perform the updates in chunks of 10,000 records or so where column2 is null. This will keep the implicit transaction size down, which is probably what is currently killing your performance.

Respondido el 09 de enero de 11 a las 01:01

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.