¿Estructura de datos para un diccionario? [cerrado]

Which is the best data structure for storing a dictionary? A hash table or a trie? Consider the possibility that the more words can be added to the dictionary later on.

preguntado el 24 de diciembre de 12 a las 21:12

What is the best way to learn from a computer science class? Posting the question from your homework verbatim on StackOverflow, or trying to understand what's actually going on and then talking with others if you run into problems, providing context as to what those problems are? Consider the possibility that this comment may be snarky but is still written with your (eventual) best interests in mind. -

Both are used in the standard. std::map (probably uses a tree) while std::unordered_map (probably uses a hash table). -

@LokiAstari Trie != tree (actually, a trie is a tree, but it's very different from the trees that are suitable for std::map). -

You seem to ask lots of very similar questions. Two have even been closed. Please, think it through, spend some time on wikipedia, on tutorials. Then come back with a specific question. -

No need to close for that, @Cheers. Just fix the tag. -

2 Respuestas

An std::unordered_map or std::map would be the best data structure for a dictionary. std::unordered_map is the C++11 equivalent of a hash table. While std::map is the regular associative container.

Respondido el 24 de diciembre de 12 a las 21:12

I believe this is about a dictionary in the literal, non-computer sense, not the alternative term for associative arrays. With that in mind, are those two really the best options? - user395760

@delnan I assumed based on context of the sentence, i.e. "hash table or [tree]" - Rapptz

Dice trie. A trie is a kind of tree, but more importantly, it's almost but not quite entirely unlike the trees used in std::map and their ilk. - user395760

@delnan thanks for the info :) - Rapptz

+1 To the degree that the question has anything to do with C++ (which it's tagged as), this is the , solamente sensible answer. - Saludos y hth. - Alf

Neither of these data structures is "better" than one another. It depends completely on what your needs are.

A hash table for strings is good if you are primarily interested in answering the question "does string X exist in my hash table?" It supports (usually) fast lookups and has a low memory footprint; each string is stored exactly once. However, it relies on the existence of a good hash function, is susceptible to hash collisions for pathological inputs, and does not let you do searches like "what string is closest to my string?"

A trie is a good data structure for storing strings that gives good worst-case lookups (you need only look at each character of the input string once). It also has the advantage that strings with similar prefixes can be stored compactly. Additionally, the trie allows you to search for strings with a given prefix, or to do regex searches efficiently, or to find nearby words efficiently. It has the drawback that the memory usage of a trie tends to be much higher than that of a hash table due to the number of pointers being stored.

There are other data structures besides these that you could consider. Radix tries and Patricia trees give a more condensed representation of tries but at some additional programming complexity. Árboles BK can be used if you are interested primarily in finding all strings "close" to some initial string efficiently.

In short, if memory is at a premium or you don't need to do "close string" searches, a hash table is a good choice. If you need to look for nearby strings or do other string operations, a trie is probably a better choice.

¡Espero que esto ayude!

Respondido el 24 de diciembre de 12 a las 21:12

Considering the speeds of modern pc's and since it is only a semester project; do you really think i should be that worried about the memory? And are radix trees constructed after creating tries? I was also thinking about the DAWG structure. - Usman Amjed

@UsmanAmjed- I would suggest coding both up and comparing them against one another. It would be a good learning experience, and if it's a semester project then you would learn a lot more by doing both and presenting the one that had better performance characteristics. - templatetypedef

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.