¿Cómo funcionan los hash en la programación?

How do hashes work in programming? How I think of a hash is something that allows me the ability to use some unique value to retrieve some data. Like if we have an array and I start to put things in the array, if I have another variable that keeps track of what item is in slot 0,1,2... then I have that instant ability to find an item. Is that hashing?

What is the purpose of a hash?

When should a hash be implemented? What's a hash similar to in terms of data structure?

What I think I know about hashes is that it allows us the ability to retrieve the item within O(1). Is that correct?

preguntado el 09 de enero de 11 a las 03:01

Take care: There are hashing algorithms, and there are hashtables, which are data structures using a hash algorithm (specifically, one way to implement a map/associative array). You mean the latter but say "hash" which usually refers to a hashing algorithm or the output of a hashing algorithm. -

3 Respuestas

A hash map / dictionary is a key/value data structure that stores objects in buckets based on the value of a hash function. These keys must be unique but the hash function values (sometimes called hashcodes) aren't necessarily unique.

Like if we have an array and I start to put htings in the array, if I have another varible that keeps track of what item is in slot 0,1,2... then I have that instant ability to find an item. Is that hashing?

No. A hash function is a deterministic function that always gives the same value for an object. The hash code does not change depending on where the object is stored.

What I think I know about hashes is that it allows us the ability to retrieve the item within O(1). Is that correct?

Nearly. A dictionary has O(1) complexity for lookups if there are not too many hash code collisions. However if the hash function is poor and every object has the same hash value then a dictionary could have O(n) performance instead.

Respondido el 09 de enero de 11 a las 12:01

Also note that the keys don't have to be strings or characters. Mostly they are, but they can also be pointers (besides the fact a string is a pointer), structs, or other datatypes. - user142019

A hash is like a person's first name -- it's a short way of remembering a person, even though it doesn't have to be unique. If you need to find some information about someone, you can just look them up by their name, and you only need to perform other checks if two or more people have the same name.

That's the power of hashing, and just as remembering people is much easier by name than by Social Security Number, finding an object by its hash code is much easier than actually comparing the object to everything already in your collection.

Now, in this example, if you're looking someone up in a phone book by name, you'd probably find them in O(log n) time, because the names are sorted alphabetically, and because you need to do a binary search. If, however, you instead "hashed" 100 people born in the 1900s by their years of birth, then you'd only need at most 4 comparisons in the hashtable/phonebook (one per digit) to find any one year by hash, which is constant time. Then, if two people are born in the same year, you can use other information to find the person you need, and on average, if your table isn't too full (say, if you have at most 50 people for 100 different years of birth), your lookups will be constant-time.

(If your table gets more than, say, 50% full, you can always double its size, to keep the number of collisions low and hence to keep your lookups fast.)


Más información:

Si alguna vez has oído hablar de MD5 or SHA-1 SHA-2 hashes for files, they're like the "fingerprints" of the file. While it's possible to have two files with the same hash, this is made so unlikely that, for practical purposes, it's impossible; hence, if you have the hash of two files, you can compare the files by their fingerprints rather than by their data, which is immensely faster.

contestado el 17 de mayo de 18 a las 23:05

A hash makes it fast to lookup instead of iterating over an array or tree. It makes it possible search O(1) time with little use of memory.

Respondido el 09 de enero de 11 a las 06:01

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.