Misterio de rendimiento de consultas de SQL Server

I've had far too many meetings today, but I think I still have my brainware in place. In my effort to improve the performance of some query I came across the following mystery (table names and fields paraphrased):

    INNER JOIN dbo.fn_A(16) AS VD ON (VD.DId = A.ADId)
    WHERE ((A.ADId = 1) OR ((HDId IS NOT NULL) AND (HDId = 1))) AND
           (P.PS NOT IN(5,7)) AND (A.ASP IN (2, 3))
) X
WHERE (dbo.fn_B(X.ADId, 16) = 1)

As you will see, the contents of the inner query are mostly irrelevant. The whole point initially was that I wanted to avoid getting fn_B() called on every record cause they contained duplicate values for ADId, so I did a SELECT DISTINCT internally then filter the distinct records. Sounds reasonable right?

Here starts the mystery...

The inner query returns NO RECORDS (for the specified parameters). If I comment out the "WHERE fn_B() = 1" then the query runs in zero time (and returns no results). If I put it back on, then the query takes 6-10 seconds, again returning no results.

This seems to beat common sense, or at least MY common SQL sense :-) If the inner query returns no data, then the outer conditions should never get evaluated right?

Of course I took the time to check the actual execution plans, saved them and compared them very carefully. They are 99% identical, with nothing unusual to notice, or so I think.

I fooled around with some CTEs to get the query results in the first CTE, and then pass it to a second CTE that had some conditions guaranteed to filter no records, then evaluate the fn_B() call outside all CTEs, but the behavior was exactly the same.

Also other variations, like using the old query (that might call fn_B() multiple times with the same value) had the same behavior. If I remove the condition then I get no records in zero time. If I put it back, then no records in 10 seconds.

¿Alguna idea a alguien?

Gracias por tu tiempo :-)

PS1: I tried to reproduce the situation on tempdb using a simple query but I couldn't make it happen. It only happens on my actual tables. PS2: This query is called inside another function so putting the results in a temporary table and then further filtering them is also out of the question.

preguntado el 03 de mayo de 12 a las 17:05

2 Respuestas

Just as a note, the optimizer does not read a query the same way you do. Even when you think that a certain order should take place, or that short-circuiting might make the most sense, the optimizer still might evaluate CTEs / subqueries in an order you might not expect. A workaround you might try is selecting the first query into a #temp table and then running the function filter on the #temp table. This should force the order of evaluation even if it is completely unintuitive and much less elegant.


Also, while it may perform slower, I am curious what happens if you run the query without the NOLOCK, or in RCSI instead. Different locking semantics may be tripping up the optimizer.

contestado el 03 de mayo de 12 a las 22:05

I think most people are already aware that the optimizer rearranges things in strange ways. BUT sometimes queries are written in such a way so that the programmer is in charge of what happens. If I do two SELECT DISCTINCTs then JOIN them, I am roughly sure what will happen. Anyway, I am going to try out today some invocations that where the inner query actually brings data, or where I replace fn_B() with a dummy function, just to see how the behavior changes. - Dimitrios Staikos

You would think so, but there are plenty of exceptions. The optimizer is not perfect. Did you read Hugo's blog post and bug report just today? sqlblog.com/blogs/hugo_kornelis/archive/2012/05/04/… I've seen several cases where the only way to enforce the behavior I expected from the optimizer was to break the query into separate queries. It's just an idea you can try, as I suggested above. - Aarón Bertrand

Extra info. I run the inner query with params that return 6 rows. Zero time. Add the WHERE ==> 30 sec. I made 6 explicit calls to fn(B) with these IDs, total 0 sec. I put the whole thing in the profiler and here what gives... SQL Server begins an avalanche of Table Scans, on the SAME 5 tables, over and over and over again (approx. 100.000 entries in the profiler log) and then executes the query. All these tables appear inside fn_B(), which never gets called in the original example. Removing NOLOCK made no differecnce. So I am starting to figure that something is confusing SQL server here. - Dimitrios Staikos

I don't doubt that I/we will find a way around it eventually. My point is that this appears like highly abnormal behavior, so I personally need to understand why it is happening. - Dimitrios Staikos

We submitted the issue to Microsoft support for SQL Server R2 (I must comment on their amazing response times and overall service procedures). We gave them a copy of our DB that reproduces the issue, and our workaround, they reproduced it themselves and after a couple of days here is the answer we got back:

I have analyzed both execution plans and would kindly ask if the workaround would be acceptable to use in production? The main reason behind it, is that a function does not have, as indexes have, statistics. And this lack of data makes the optimizer choose sometimes a not so good execution plan. If you already found a workaround it is best to implement this. The index changes we tried did not improve the execution.

This is quite a diplomatic way to say "yeah, the optimizer messes things up with your query, so please use the workaround". If you wanna call it a bug, call it a bug, it doesn't matter.

Just for the record, the workaround was to put the call to fn_B() in the SELECT list of a query one level above the SELECT DISTINCT, then filter its result on the WHERE condition. Kind of weird, but it does the trick.

contestado el 10 de mayo de 12 a las 08:05

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.