Gentle introduction to Hashing

Introduction

Suppose we have a list of employee records and we need to search for an employee.  What are the data structures that come to your mind ? Linked List, Trees, Arrays ?

Let’s see the time complexities for each.

• Binary Search Trees: O(log n)
• Arrays: O(1) provided we know the index of each record.

As you can see using Arrays is the most efficient way. Hashing exploits this feature of Arrays.

So how do we achieve this?

For each record we calculate a corresponding index in the Array. To do this we can use a Hash Function. A very basic Hash Function can be H(k) = k mod 11, where k is the input value.

Example – suppose we have the following list of Employee Id’s to be inserted {101, 220, 452, 321, 600} The Array is of size 5 and Hash Function to be used is H(k) = k mod 5. Let’s calculate Hash Value for each input.

H(101) = 101 mod 5 = 1
H(220) = 220 mod 5 = 0
H(304) = 304 mod 5 = 4
H(303) = 303 mod 5 = 3
H(202) = 202 mod 5 = 2

As you can see we get the following mappings (101, 1), (220,0), (304,4), (303,3), (202,2). The elements are stored at these respective indices in the Array.

At the time of retrieval we again use the Hash Function. So if we want user with id 304 we perform H(304) = 4 followed by Employee[4]. Simple isn’t it.

Hash table is often the best data structure to maintain dictionary. There are other applications as well.

Problems with Hashing

You may have observed that if we have more than one element which gets the same Hash Value then what do we do ? Like if we had a number 600 then it’s H(600) = 0 but we already have 220 at index 0. Now what. This problem is termed as Collision. No matter how good the hash function is collision is kind of bound to happen.

In order to resolve this we have a few ways:

1. Chaining is a simple solution. We can store elements in the same bucket by using Linked List approach. So Hash Table becomes an array of linked lists (reminds me of Adjacency Lists used to store Graphs). A problem is that space is consumed by the pointers.
2. Open Addressing is another popular approach which uses Probing. We keep on recomputing the Hash Value till we find an empty bucket. Searching an element can also be done in a similar fashion. Deletion can have some issues though. Like when we delete an element then the slot is empty. Consider trying to find an element whose location was supposed to be next to the deleted one. A solution can be to flag the deleted slot with something and use it in next insertion.

Probing can be classified as Linear Probing, Quadratic Probing and Double Hashing.

I won’t go into the details of these approaches in this article. But each has it’s own pros and cons.

Summary

To conclude we saw a data structure which is very efficient. It does have some problems but those can be dealt with.

Author: Ankur

I am Enthusiastic about Learning new things and sharing my Knowledge. I like programming and have a pretty good background in Computer Science.