Hashing Techniques
Direct Address Table
- Direct Address Table is a data structure that has the capability of mapping records to their corresponding keys using arrays. In direct address tables, records are placed using their key values directly as indexes. They facilitate fast searching, insertion, and deletion operations.
- We can understand the concept using the following example. We create an array of size equal to maximum value plus one (assuming 0 based index) and then use values as indexes. For example, in the following diagram key 21 is used directly as an index.
- Advantages:
- Searching in O (1) Time: Direct address tables use arrays which are random access data structures, so, the key values (which are also the index of the array) can be easily used to search the records in O (1) time.
- Insertion in O (1) Time: We can easily insert an element in an array in O(1) time. The same thing follows in a direct address table also.
- Deletion in O (1) Time: Deletion of an element takes O (1) time in an array. Similarly, to delete an element in a direct address table we need O (1) time.
- Limitations:
- Prior knowledge of the maximum key value
- Practically useful only if the maximum value is very less.
- It causes wastage of memory space if there is a significant difference between total records and maximum value.
Hashing can overcome these limitations of direct address tables.
Hashing
- Hashing is used to index and retrieve items in a database because it is faster to find the item using the shortest hashed key than to find it using the original value. It is also used in many encryption algorithms.
- Hashing is a technique in which a given key field value is converted into the address of the storage location of the record by applying the same operation on it.
- The advantage of hashing is that allows the execution time of basic operation to remain constant even for the larger side.
- Hashing is an important Data Structure that is designed to use a special function called the Hash function which is used to map a given value with a particular key for faster access of elements. The efficiency of mapping depends on the efficiency of the hash function used.
Hash Table
- A hash table is a data structure that is used to store keys/value pairs. It uses a hash function to compute an index into an array in which an element will be inserted or searched. By using a good hash function, hashing can work well.
- Each position in the hash table is called a slot, can hold an item, and is named by an integer value starting at 0. The mapping between an item and a slot where the item belongs in a hash table is called a Hash Function. A hash Function accepts a key and returns its hash coding or hash value.
- Application of Hash Tables:
- Associative arrays: Hash tables are commonly used to implement many types of in-memory tables. They are used to implement associative arrays (arrays whose indices are arbitrary strings or other complicated objects).
- Database indexing: Hash tables may also be used as disk-based data structures and database indices (such as in dBm).
- Caches: Hash tables can be used to implement caches i.e., auxiliary data tables that are used to speed up the access to data, which is primarily stored in slower media.
- Object representation: Several dynamic languages, such as Perl, Python, JavaScript, and Ruby use hash tables to implement objects.
- Hash Functions are used in various algorithms to make their computing faster
Collision
- According to the Hash Function, two or more items would need in the same slot. This is said to be called Collision.
- How to handle Collisions?
- Separate chaining - Separate chaining is one of the most commonly used collision resolution techniques. It is usually implemented using linked lists. In separate chaining, each element of the hash table is a linked list. To store an element in the hash table you must insert it into a specific linked list. If there is any collision (i.e., two different elements have the same hash value) then store both the elements in the same linked list.
- Open Addressing Like separate chaining, open addressing is a method for handling collisions. In Open Addressing, all elements are stored in the hash table itself. So, at any point, the size of the table must be greater than or equal to the total number of keys (Note that we can increase table size by copying old data if needed).
- Open Addressing is done in the following ways:
- Linear Probing - In linear probing, we linearly probe for the next slot. Let’s assume that the hashed index for a particular entry is an index. The probing sequence for linear probing will be:
- Quadratic Probing - Quadratic probing is similar to linear probing and the only difference is the interval between successive probes or entry slots. Let hash(x) be the slot index computed using the hash function.
- If the slot hash(x) % S is full, then we try (hash(x) + 1*1) % S.
- If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S.
- If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S.
- This process is repeated for all the values of i until an empty slot is found.
- Double hashing - Double hashing is similar to linear probing and the only difference is the interval between successive probes. The double hashing technique uses two hash functions. The second hash function comes into use when the first function causes a collision. It provides an offset index to store the value. The formula for the double hashing technique is as follows:
- Where i is the offset value. This offset value keeps incrementing until it finds an empty slot.
- Example: Consider inserting the keys 76, 26, 37,59,21,65 into a hash table of size m = 11 using double hashing. Consider that the auxiliary hash functions are h1 (k)=k mod 11 and h2(k) = k mod 9.
Hash function
- The hash function is a function that is applied on a key by which it produces an integer, which can be used as an address of the hash table. Hence one can use the same hash function for accessing the data from the hash table. In this, the integer returned by the hash function is called the hash key.
- A hash function is a function or algorithm that is used to generate the encrypted or shortened value to any given key.
- Characteristics of the good hashing function
- The hash function should generate different hash values for a similar string.
- The hash function is easy to understand and simple to compute.
- The hash function should produce the keys which will get distributed, uniformly over an array.
- The number of collisions should be less while placing the data in the hash table.
- The hash function is a perfect hash function when it uses all the input data.
- Types of Hash Functions:
- Division method: - In this, the hash function is dependent upon the remainder of a division.
- For example:-
- Pros:
- This method is quite good for any value of M.
- The division method is very fast since it requires only a single division operation.
- Cons:
- This method leads to poor performance since consecutive keys map to consecutive hash values in the hash table.
- Sometimes extra care should be taken to choose the value of M.
- Mid square method:
- Suppose we want to place a record of key 3101 and the size of the hash table is 2000.
- Here, firstly the key is squared, and then the mid part of the resultant number is taken as the index for the hash table. So in our case: key = 3101 => (3101)2 = 3101*3101 = 9616201 i.e.
- Pros:
- The performance of this method is good as most or all digits of the key-value contribute to the result. This is because all digits in the key contributor to generating the middle digits of the squared result.
- The result is not dominated by the distribution of the top digit or bottom digit of the original key value.
- Cons:
- The size of the key is one of the limitations of this method, as the key is of big size then its square will double the number of digits.
- Another disadvantage is that there will be collisions but we can try to reduce collisions.
- Digit folding method:
- Here, the key is divided into separate parts and by using simple operations these separated parts are combined to produce a hash.
- Consider a record of key 12345678. Divide this into parts say 123, 456, 78. After dividing the parts combine these parts by performing add operation on them. h(key) = h(12345678) = 123 + 456 + 78 = 657
- Note: - The number of digits in each part varies depending upon the size of the hash table. Suppose for example the size of the hash table is 100, then each part must have two digits except for the last part that can have a lesser number of digits.
- Multiplication method:
- This method involves the following steps:
- Choose a constant value A such that 0 < A < 1.
- Multiply the key value with A.
- Extract the fractional part of kA.
- Multiply the result of the above step by the size of the hash table i.e. M.
- The resulting hash value is obtained by taking the floor of the result obtained in step 4.
- Pros:
- The advantage of the multiplication method is that it can work with any value of between 0 and 1, although there are some values that tend to give better results than the rest.
- Cons:
- The multiplication method is generally suitable when the table size is the power of two, then the whole process of computing the index by the key using multiplication hashing is very fast.
Comments
Post a Comment