Gentle introduction to Hashing

Introduction

Suppose we have a list of employee records and we need to search for an employee.  What are the data structures that come to your mind ? Linked List, Trees, Arrays ?

Let’s see the time complexities for each.

  • Linked Lists: O(n)
  • Binary Search Trees: O(log n)
  • Arrays: O(1) provided we know the index of each record.

As you can see using Arrays is the most efficient way. Hashing exploits this feature of Arrays.

So how do we achieve this?

For each record we calculate a corresponding index in the Array. To do this we can use a Hash Function. A very basic Hash Function can be H(k) = k mod 11, where k is the input value.

Example – suppose we have the following list of Employee Id’s to be inserted {101, 220, 452, 321, 600} The Array is of size 5 and Hash Function to be used is H(k) = k mod 5. Let’s calculate Hash Value for each input.

H(101) = 101 mod 5 = 1
H(220) = 220 mod 5 = 0
H(304) = 304 mod 5 = 4
H(303) = 303 mod 5 = 3
H(202) = 202 mod 5 = 2

As you can see we get the following mappings (101, 1), (220,0), (304,4), (303,3), (202,2). The elements are stored at these respective indices in the Array.

At the time of retrieval we again use the Hash Function. So if we want user with id 304 we perform H(304) = 4 followed by Employee[4]. Simple isn’t it.

Hash table is often the best data structure to maintain dictionary. There are other applications as well.

Problems with Hashing

You may have observed that if we have more than one element which gets the same Hash Value then what do we do ? Like if we had a number 600 then it’s H(600) = 0 but we already have 220 at index 0. Now what. This problem is termed as Collision. No matter how good the hash function is collision is kind of bound to happen.

In order to resolve this we have a few ways:

1. Chaining is a simple solution. We can store elements in the same bucket by using Linked List approach. So Hash Table becomes an array of linked lists (reminds me of Adjacency Lists used to store Graphs). A problem is that space is consumed by the pointers.
2. Open Addressing is another popular approach which uses Probing. We keep on recomputing the Hash Value till we find an empty bucket. Searching an element can also be done in a similar fashion. Deletion can have some issues though. Like when we delete an element then the slot is empty. Consider trying to find an element whose location was supposed to be next to the deleted one. A solution can be to flag the deleted slot with something and use it in next insertion.

Probing can be classified as Linear Probing, Quadratic Probing and Double Hashing.

I won’t go into the details of these approaches in this article. But each has it’s own pros and cons.

Summary

To conclude we saw a data structure which is very efficient. It does have some problems but those can be dealt with.

Traversing a Graph using Depth First Traversal

Earlier we have seen how to build a simple Graph using Adjacency List and how to traverse it using breadth first traversal. The Graph which was used is presented below.

Friend_Graph

Building on that we are going to see how to traverse that Graph using another popular approach known as “depth first traversal”, or DFT. This video acts as a supplement to this article.

At the end of this Post I have presented code written in C to show you how can DFS be implemented. After all we should be able to write the code and not just talk about the approach.

A popular application of depth first traversal is to do Topological Sorting or Ordering.

Understanding DFT

In DFT we pick a vertex to start with. In our example friendship graph we start with Ankur. We visit this and mark it as visited.

Next we explore the adjacent vertices to this vertex, like Reena. But we don’t explore the next adjacent vertex to Ankur i.e. Dikshit, instead we explore further and visit Nidhi, who is connected to Reena. So what we are doing is traversing the depths till we reach a leaf or a node that has been visited. Then we backtrack and start exploring the next adjacent node i.e. Dikshit.

Doesn’t this sound recursive? It is and as you will see our code makes use of recursion to perform DFT.

Continue reading “Traversing a Graph using Depth First Traversal”

Data Structures using C

Lately I have been implementing the standard Data Structures in Computer Science using the C Programming Language. These have the minimal functions to demonstrate the basic usage.

Going forward I will be adding more and also try to explain these.

As of now the list includes:

In the Binary Search Tree program the delete node functionality hasn’t been implemented fully.

The basic algorithm has been presented though which is as follows:

Algorithm to delete a Node from Binary Search Tree (BST)

1. Find the Node in the BST.
2. If Node is a Leaf, then go ahead and delete it.
3. If not find the “in-order successor / predecessor”. Say we use the in-order successor Node_S.
4. Swap the node with Node_S. Now delete Node_S and if it has any children then update the pointers.

For Step 3 we can find wither the minimum node in the right sub tree or the maximum node in the left sub tree.