Elegir índices para una consulta compleja

I am new to the world of databases so would like to get some help on creating Postgres indices based on the following query. I have a bunch of queries that look similar to this so I have made it generic and I am hoping to take what I learn here and apply to other queries as well.

This query sums up a column of values and returns the top-100 values grouped by a certain category.

SELECT sum(col1) as sum_col, t.col10
FROM table1 as s, table2 as up, table3 as g, table4 as t 
WHERE (s.col1 >= 0) AND (s.col2 = 'f')
AND (g.col3 = 1)
AND (up.col4 = s.col5)
AND (g.id = s.col6 )
AND ((g.col7 = up.col8) OR (g.col9 = up.col8))
AND ((g.col7 = t.id) OR (g.col9 = t.id))
AND (t.id = up.col8) 
GROUP BY t.col10
ORDER BY sum_col DESC LIMIT 100

En cuanto al WHERE clause, this is what I have identified as the index for the tables. I am not sure if this is correct or if I need to add more multi-column ones. ids are primary keys so I have left them out of the index below.

Table1 Index:
col1 and col2 (2-way index)
col5

Table2 Index:
col4
col8

Table3 Index: 
col3
col7
col9

Table4 Index: 
col10? 

preguntado el 27 de noviembre de 13 a las 04:11

1 Respuestas

Commenting to your findings:

Table1 Index:
col1 and col2 (2-way index)
col5

Change first index to (col2, col1). Regla de oro: index for equality predicates first (s.col2 = 'f' then for ranges s.col1 >= 0). And please don't believe the most selective first myth.

Without execution plan, it is impossible to tell if you'd need the index on col5 (we don't know the used join algorithm nor join order).

In general, you'd like to have only one index per table mention in the from/join clauses. Hence, the correct index might be (col5, col2, col1).

For the same reason it is hard to tell about your index suggestions on table2 (join algo&order?).

Similarily, table3 except that the unconditional clause g.col3 = 1 tells you that you should that column first into the index. Adding col7 y col9 could be valid (depending on join algo&order ;)

table4 is nowhere joined yet used for sorting? Doesn't make sense to me that early in the morning.

I've written an indexing guide called Use The Index, Luke. If you'd like to really know what's best, please read it: http://use-the-index-luke.com/

EDITAR re join algorithms and order

In principle, the database chooses the join algorithm that fits your query best automatically. PostgreSQL uses on of the following three algo's: nested loops join, hash join or sort/merge join. Besides choosing the algorithm, the order in which the tables are processed can also affect performance—hence the database tries to take the best one.

However: indexing affects the databases choice regarding join algo&order and vice versa. To really know what indexes to place, you'd need to know which algo& order is used. Unfortunately, that doesn't guarantee best performance because other indexes might make other join algorithms faster as the one the database has taken in the first place.

The way to find out what the database thinks it's the best is to use explain. However, the explain plan is recreated quite often and may change without notice—e.g. because the table has grown so that another join algorithm makes more sense. That's why you should never optimize an more-or-less empty development database. That's just wasted time. You'll need realistic data to test against.

Pretty complex stuff, unfortunately.

respondido 27 nov., 13:16

Thank you Markus! I will review the link. Some follow-up questions: when you say Join Algo & Order, do you mean something I am doing but have not shared? (if so, I am using the query as-is). If you mean, I need to specify it to make it better: can suggest what type of algo I need to specify? Or is something totally different: where I need to run Explain on the query and see what Postgres chooses to do? Will it change over time that I need to monitor it or just one time setup. - ecognio

Btw, Table4 is joined but uses only the Primary Key so I did not list it in the index. I just added the group by column. - ecognio

@Ecognium Updated above. Hope that helps. - Markus Winand

Thank you so much for the updated info. I will try to simulate some data and use that with explain. - ecognio

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.