¿Por qué los diccionarios .Net cambian de tamaño a números primos?

Según esta pregunta a .Net dictionary resizes its allocated space to prime numbers that are at least twice the current size. Why is it important to use prime numbers and not just twice the current size? (I tried to use my google-fu powers to find an answer, but to no avail)

preguntado el 09 de enero de 11 a las 09:01

as a side idea to you question, does anyone know a tree-like balanced data structure that resizes to prime sizes ? maybe I should post another question -

what is the tree data structure behind .Net's dictionary then ? -

I asked the question over here stackoverflow.com/questions/4639122/… -

@costy None, it's a hash table not a tree. -

3 Respuestas

It is an algorithm implementation detail related to choosing a good hashing function and which provides uniform distribution. A non-uniform distribution increases the number of collisions, and the cost of resolving them.

Respondido el 09 de enero de 11 a las 12:01

Choosing prime number does no es provide uniform distribution, no need to oversimplify. With hashsize = prime_number, you have absolutely same chance of getting collisions as with hashsize = 2^k or any other. It's just that some hash sizes make collisions look 'unpredictable', 'random' or 'uniformly distributed'. On the other hand, having hashsize = 2^k would mean that any hash function based on xor will suck. - Nikita Rybak

The bucket in which an element is put is determined by (hash & 0x7FFFFFF) % capacity. This needs to be uniformly distributed. From this it follows that if multiple entries which are a multiple of a certain base (hash1 = x1 * base, hash2 = x2 * base,...) where base y capacity aren't coprime (greatest common divisor > 1) some slots are over used, and some are never used. Since prime numbers are coprime to any number except themselves, they have relatively good chances of achieving a good distribution.

One particularly nice property of this is that for capacity > 30 the contribution of each bit to the hashcode is different. So if the variation of the hash is concentrated in only a few bits it will still lead to a good distribution. This explains why capacities which are powers of two are bad: they mask out the high bits. A set of numbers where only the high bits are different isn't that unlikely.

Personally I think they choose that function badly. It contains an expensive modulo operation and if the entries are multiples of the prime-capacity its performance breaks down. But it seems to be good enough for most applications.

Respondido 04 Jul 14, 13:07

Because of the mathematics of prime numbers.They can not be factored into different smaller numbers. When you divide the hash number from the stored items you thus get an equal distribution. If you would not have a prime number, depending on the objects, the distribution may not be even.

Respondido el 09 de enero de 11 a las 12:01

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.