Rendimiento continuo de T-SQL doce meses por día

I have checked similar problems, but none have worked well for me. The most useful was http://forums.asp.net/t/1170815.aspx/1, but the performance makes my query run for hours and hours.

I have 1.5 million records based on product sales (about 10k product) over 4 years. I want to have a table that contains date, product and rolling twelve months sales.

This query (from the link above) works, and shows what I want, but the perfomance makes it useless:

select day_key, product_key, price, (select sum(price) as R12 from #ORDER_TURNOVER as tb1 where tb1.day_key <= a.day_key and tb1.day_key > dateadd(mm, -12, a.day_key) and tb1.product_key = a.product_key) as RSum into #hejsan
from #ORDER_TURNOVER as a

I tried a rolling sum cursor function for all records which was fast as lightning, but I couldn't get the query only to sum the sales over the last 365 days.

Any ideas on how to solve this problem is much appreciated. Thank you.

preguntado el 22 de mayo de 12 a las 12:05

What version of SQL Server are you using? By the looks of it you could benefit from using SQL Server 2012 sum() over(order by ...). -

We use 2005 and will continue to use it for at least a year more. Thanks for the tip though! -

You're using a temp table. Please tell me that you have an index on (product_key, day_key) on that temp table? -

Can't you just build the sum of one day, store it, and collect the recent 365 sums? Or: Generate the yearly sum, and subtract each day the day sum of today-365, and add the daily sum of today? Store that and the daily sums, so that you just have to recalculate the sum for today. -

Dems: Of course I have an index. With index it takes about 30 seconds per product, without about 2 minutes. -

2 Respuestas

I'd change your setup slightly.

First, have a table that lists all the product keys that are of interest...

CREATE TABLE product (
  product_key    INT NOT NULL,
  price          INT,
  some_fact_data VARCHAR(MAX),
  what_ever_else SOMEDATATYPE,
  PRIMARY KEY CLUSTERED (product_key)
)

Then, I'd have a calendar table, with each individual date that you could ever need to report on...

CREATE TABLE calendar (
  date             SMALLDATETIME,
  is_bank_holdiday INT,
  what_ever_else   SOMEDATATYPE,
  PRIMARY KEY CLUSTERED (date)
)

Finally, I'd ensure that your data table has a covering index on all the relevant fields...

CREATE INDEX IX_product_day ON #ORDER_TURNOVER (product_key, day_key)

This would then allow the following query...

SELECT
  product.product_key,
  product.price,
  calendar.date,
  SUM(price)    AS RSum
FROM
  product
CROSS JOIN
  calendar
INNER JOIN
  #ORDER_TURNOVER AS data
    ON  data.product_key = product.product_key
    AND data.day_key    >  dateadd(mm, -12, calendar.date)
    AND data.day_key    <= calendare.date
GROUP BY
  product.product_key,
  product.price,
  calendar.date

By doing everything in this way, each product/calendar_date combination will then relate to a set of record in your data table that are all consecutive to each other. This will make the act of looking up the data to be aggregated much, much simpler for the optimiser.

[Requires a single index, specifically in the order (product, date).]

If you have the index the other way around, it is actually much harder...

Datos de ejemplo:

 product | date                   date | product
---------+-------------    ------------+---------
    A    |  01/01/2012      01/01/2012 |    A
    A    |  02/01/2012      01/01/2012 |    B
    A    |  03/01/2012      02/01/2012 |    A
    B    |  01/01/2012      02/01/2012 |    B
    B    |  02/01/2012      03/01/2012 |    A
    B    |  03/01/2012      03/01/2012 |    B

On the left oyu just get all the records that are next to each other in a 365 day block.

On the right you search for each record before you can aggregate. The search is relatively simple, but you do 365 of them. Much more than the version on the left.

contestado el 22 de mayo de 12 a las 14:05

This is how one does "running totals" / "sum subsets" in SQL Server 2005-2008. In SQL 2012 there is native support for running totals but we are all still working with 2005-2008 db's

SELECT  day_key ,
        product_key ,
        price ,
        ( SELECT    SUM(price) AS R12
          FROM      #ORDER_TURNOVER AS tb1
          WHERE     tb1.day_key <= a.day_key
                    AND tb1.day_key > DATEADD(mm, -12, a.day_key)
                    AND tb1.product_key = a.product_key
        ) AS RSum
INTO    #hejsan
FROM    #ORDER_TURNOVER AS a 

Algunas sugerencias.

You could pre calculate the running totals so that they are not calculated again and again. What you are doing it the above select is a disguised loop and not a set query (unless the optimizer can convert the subquery to a join).

The above solution requires a few changes to the code.

Another solution that you can certainly try is to create a clustered index on your #ORDER_TURNOVER temp table. This is safer cause it's local change.

CREATE CLUSTERED INDEX IndexName
ON #ORDER_TURNOVER (day_key,day_key,product_key)

All your 3 expressions in the WHERE clause are SARGS so chanes are good that the optimizer will now do a seek instead of a scan.

If the index solution does not give enough performance gains that its well worth investing in solution 1

contestado el 22 de mayo de 12 a las 13:05

I have tried the index solution but it is still too slow. How would I pre-calculate the running totals? Thank you. - user1410100

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.