la consulta mysql no usa el índice cuando la tabla es grande

tengo una mesa:

`id` bigint(20) unsigned NOT NULL,  
`rtime` datetime NOT NULL,  
`d` int(10) NOT NULL,  
`n` int(10) NOT NULL,  
PRIMARY KEY (`rtime`,`id`,`d`) USING BTREE  

and i have a query:

select id, d, sum(n) from p where  rtime between '2012-08-25' and date(now()) group by id, d;

i'm running explain on this query on a tiny table (2 records) and it tells me it's going to use my PK:

id  | select_type  | table | type   | possible_keys key  | key     | key_len | ref  | rows | Extra
1   | SIMPLE       | p     | range  | PRIMARY            | PRIMARY | 8       | NULL | 1    | Using where; Using temporary; Using filesort

but when i use the same query on the same table - only this time it's huge (350 million records) - it prefers to go through all the records and ignore my keys

id  | select_type  | table  | type | possible_keys  | key  | key_len | ref  | rows      | Extra
1   | SIMPLE       | p      | ALL  | PRIMARY        | NULL | NULL    | NULL | 355465280 | Using where; Using temporary; Using filesort

obviously, this is extremely slow.. can anyone help?

EDIT: this simple query is also taking a significant amount of time:

select count(*) from propagation_delay where  rtime > '2012-08-28';

preguntado el 28 de agosto de 12 a las 13:08

you might want to consider asking this on, seems like they may be better equipped to explain this behaviour - ofcourse I'm not knocking the l33t SQL skillz of anyone on SO :) -

4 Respuestas

Su consulta:

...WHERE rtime between '2012-08-25' and date(now()) group by id, d;

employs rtime, and groups by id and d. At a minimum you ought to index by rtime. You might also want to try indexing by rtime, id, d, n in this order, but when you do, you see that your index will contain more or less the same data as your table.

Probably, the optimizer does some calculations and comes to the conclusion that it's not really worthwhile to employ the index.

I'd leave an index on rtime alone. The real clincher is how many records match the WHERE - if they're just a few, it is convenient to read the index and hop around the table. If they're several, maybe it's better to sequentially scan the whole table, saving on the to-and-fro reads.

the query is getting a big chunk out of those 350 mil - i'd say a few millions

Okay, then it is likely that the cumulative cost of quickly extracting a half dozen million records from the index, and then shuttling to and fro from the main table to recover that half dozen million records, is more than the cost of opening the main table, and trawling through all 350M records grouping and summing along the way.

En tal escenario, if you always (or mostly) run aggregate queries on rtime, AND the table is an accumulating (historical) table, AND each couple (id, d) sees several scores of entries per day, you might consider creating an aggregate by date secondary table. I.e., at (say) midnight, you run a query and

INSERT INTO aggregate_table
    SELECT DATE(@yesterday) AS rtime, id, d, sum(n) AS n
    FROM main_table WHERE DATE(rtime) = @yesterday GROUP BY id, d;

Los datos en aggregate_table has one entry only per each couple (id, d) holding the sum on n for that day; the table is proportionately smaller, and queries faster. This assumes that you have a comparatively small number of (id, d) and each of them generates lots of rows in the main table each day.

With one logging per minute per couple, aggregation should speed up things by more than three orders of magnitude (conversely, if you have the twice-daily take of a huge number of different sensors, the benefits will be negligible).

Respondido 28 ago 12, 17:08

i've tried losing the 'group by' (aiming for grouping it in code), but it's still not using it. as for you question - the query is getting a big chunk out of those 350 mil - i'd say a few millions. - phistakis

In your second query, the date range was going to return so many rows that MySQL decided not to use the index. It did this because n is not included in the index. A non-covering index is still a lookup, and doing a high number of lookups is slower than scanning the table.

In order to utilize an index, you'll need to reduce the number of selected rows, or include n in your index to have a full "covering" index.

Respondido 28 ago 12, 14:08

You may have MySQL use a certain index with the Sintaxis de sugerencia de índice.

Respondido 28 ago 12, 16:08

Just a hunch, with some little experience in the back, try changing the engine from MyISAM to InnoDB. MyISAM has some problems with many recordings and other bugs and InnoDB is now better. Also, as of MySQL 5.5 the default engine is InnoDB :

Respondido 28 ago 12, 13:08

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.