Si los hilos reciben 3000 publicaciones cada uno, ¿es mejor crear una nueva tabla por hilo?

There's 12 million posts already and people seem to be using things as a chat. I don't know if it's more efficient to have a bunch of little tables than having the database scan for the last 10 messages in a database with so many entries. I know I'd have to benchmark but just asking if anyone has any observations or anecdotes if they've ever had similar situations.

edit add schema:

create table reply(
id int(11) unsigned not null  auto_increment,
thread_id int(10) unsigned not null default 0,
ownerId int(9) unsigned not null default 0,
ownerName varchar(20),
profileId int(9) unsigned,
profileName varchar(50),
creationDate dateTime,
ip int unsigned,
pic varchar(255) default '',
reply text,
primary key(id)) TYPE=MyISAM; 

preguntado el 01 de julio de 12 a las 04:07

don't denormalize if there is no need, and don't index blindly: tinkering with things might make them worse. are you actually having a performance problem, or are you just cautious? how is the schema like? do you log slow queries? -

I do have indexing by thread id in the comments. I was just wondering if that was a valid alternative. Recently I've been studying leveldb and other key value stores which often use multiple tables so I've been questioning the setup. edited for scheme -

is there any open source forum or blog engine involved? -

Everything is custom mysql. I'm probably going to switch to postgres later. -

12 million is next to nothing... Don't worry, databases are made to handle billions of records. -

3 Respuestas

It's not a good idea to use variable table names. If you've indexed the columns that would be turned into separate tables, the database will do a better job using the index than you can do by creating separate tables. That's what the database was designed for.

Respondido 01 Jul 12, 04:07

I assume that "thread" here means thread in a pool of postings.

The way you are going to get long-term scalability here is to develop an architecture in which you can have multiple database instances, and avoid having queries that need to be performed across all instances.

Creating multiple tables on the same DB won't really help in terms of scalability. (In fact, it might even reduce throughput ... due to increasing the load on the DB's caches.) But it sounds like in your application you could partition into "pools" of messages in different databases, provided that you can arrange that a reply to a message goes into the same pool as the message it replies to.

The problem that arises is that certain things will involve querying across data in all DB instances. In this case, it might be listing all of a user's messages, or doing a keyword search. So you really have to look at the entire picture to figure out how best to achieve a partitioning. You need to analyze all of the queries, taking account of their relative frequencies. And at the end of the day, the solution to might involve denormalizing the schema so that the database can be partitioned.

Respondido 02 Jul 12, 00:07

Dynamic tables are typically a very bad idea in relational schema. Key/value stores make different trade-offs, so some are better at things like dynamic tables but at the cost of things like weak data integrity/consistency guarantees. You don't appear to have defined any foreign key references and you're using MyISAM so data integrity/reliability probably isn't a priority; the important thing to understand is that different designs have different things they're good at so what's good design for one DB can be bad design for another DB.

I can't help with much else as I focus on Pg and this is a MySQL question. Untagging.

(Note that in PostgreSQL at least, many operations on the relation set are O(n), so huge numbers of relations can be quite harmful.)

respondido 05 nov., 13:02

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.