Hash Table Overview
A hash table, also known as a hash map, is a data structure that allows for the efficient storage and retrieval of key-value pairs. It utilizes a hash function to compute an index into an array of slots or buckets, from which the desired value can be found.
Key Components of a Hash Table
- Keys: Unique identifiers used to store and retrieve values.
- Values: Data associated with a particular key.
- Hash Function: A function that takes a key as input and returns an index into the array of buckets. Ideally, it distributes the keys uniformly across the array.
- Buckets/Slots: Storage spaces in the array where the key-value pairs are stored.
- Collision: Occurs when two different keys produce the same index using the hash function. Common techniques to handle collisions include separate chaining (where each bucket contains a list of key-value pairs) and open addressing (like linear probing, where we search for the next available slot).
Hash Table Cheat Sheet
Operations and their average time complexities
- Insert: O(1) on average, O(n) worst-case due to collisions.
- Delete: O(1) on average, O(n) worst-case.
- Lookup: O(1) on average, O(n) worst-case.
Handling Collisions
- Separate Chaining: Each bucket contains a list or another data structure. If two keys hash to the same index, their key-value pairs are stored in the same list.
- Open Addressing: If a bucket is already occupied, search for the next available slot. Variants include linear probing, quadratic probing, and double hashing.
Resizing
When the hash table becomes too full (often determined by a load factor), it may need to be resized to maintain efficiency. This involves creating a larger table and rehashing all existing key-value pairs into it.
Choosing a Good Hash Function
- Distributes keys uniformly across the bucket array.
- Is consistent, i.e., it returns the same output for the same input.
- Is efficient and computes the hash in constant time.
Common Uses
- Databases for quick data retrieval.
- Caching to store and retrieve data rapidly.
- Associative arrays in programming languages like Python’s dictionaries.
Potential Issues
- Collisions can degrade performance.
- Poorly chosen hash functions can lead to clustering of keys.