tengo una mesa:
CREATE TABLE `p` ( `id` bigint(20) unsigned NOT NULL, `rtime` datetime NOT NULL, `d` int(10) NOT NULL, `n` int(10) NOT NULL, PRIMARY KEY (`rtime`,`id`,`d`) USING BTREE ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
and i have a query:
select id, d, sum(n) from p where rtime between '2012-08-25' and date(now()) group by id, d;
i'm running explain on this query on a tiny table (2 records) and it tells me it's going to use my PK:
id | select_type | table | type | possible_keys key | key | key_len | ref | rows | Extra 1 | SIMPLE | p | range | PRIMARY | PRIMARY | 8 | NULL | 1 | Using where; Using temporary; Using filesort
but when i use the same query on the same table - only this time it's huge (350 million records) - it prefers to go through all the records and ignore my keys
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra 1 | SIMPLE | p | ALL | PRIMARY | NULL | NULL | NULL | 355465280 | Using where; Using temporary; Using filesort
obviously, this is extremely slow.. can anyone help?
EDIT: this simple query is also taking a significant amount of time:
select count(*) from propagation_delay where rtime > '2012-08-28';
preguntado el 28 de agosto de 12 a las 13:08
...WHERE rtime between '2012-08-25' and date(now()) group by id, d;
employs rtime, and groups by id and d. At a minimum you ought to index by
rtime. You might also want to try indexing by
rtime, id, d, n in this order, but when you do, you see that your index will contain more or less the same data as your table.
Probably, the optimizer does some calculations and comes to the conclusion that it's not really worthwhile to employ the index.
I'd leave an index on
rtime alone. The real clincher is how many records match the
WHERE - if they're just a few, it is convenient to read the index and hop around the table. If they're several, maybe it's better to sequentially scan the whole table, saving on the to-and-fro reads.
the query is getting a big chunk out of those 350 mil - i'd say a few millions
Okay, then it is likely that the cumulative cost of quickly extracting a half dozen million records from the index, and then shuttling to and fro from the main table to recover that half dozen million records, is more than the cost of opening the main table, and trawling through all 350M records grouping and summing along the way.
En tal escenario, if you always (or mostly) run aggregate queries on
rtime, AND the table is an accumulating (historical) table, AND each couple
(id, d) sees several scores of entries per day, you might consider creating an aggregate by date secondary table. I.e., at (say) midnight, you run a query and
INSERT INTO aggregate_table SELECT DATE(@yesterday) AS rtime, id, d, sum(n) AS n FROM main_table WHERE DATE(rtime) = @yesterday GROUP BY id, d;
Los datos en
aggregate_table has one entry only per each couple
(id, d) holding the sum on
n for that day; the table is proportionately smaller, and queries faster. This assumes that you have a comparatively small number of
(id, d) and each of them generates lots of rows in the main table each day.
With one logging per minute per couple, aggregation should speed up things by more than three orders of magnitude (conversely, if you have the twice-daily take of a huge number of different sensors, the benefits will be negligible).
In your second query, the date range was going to return so many rows that MySQL decided not to use the index. It did this because
n is not included in the index. A non-covering index is still a lookup, and doing a high number of lookups is slower than scanning the table.
In order to utilize an index, you'll need to reduce the number of selected rows, or include
n in your index to have a full "covering" index.
Just a hunch, with some little experience in the back, try changing the engine from MyISAM to InnoDB. MyISAM has some problems with many recordings and other bugs and InnoDB is now better. Also, as of MySQL 5.5 the default engine is InnoDB : http://dev.mysql.com/doc/refman/5.5/en/innodb-default-se.html