Gentle introduction to Hashing

Introduction

Suppose we have a list of employee records and we need to search for an employee.  What are the data structures that come to your mind ? Linked List, Trees, Arrays ?

Let’s see the time complexities for each.

  • Linked Lists: O(n)
  • Binary Search Trees: O(log n)
  • Arrays: O(1) provided we know the index of each record.

As you can see using Arrays is the most efficient way. Hashing exploits this feature of Arrays.

So how do we achieve this?

For each record we calculate a corresponding index in the Array. To do this we can use a Hash Function. A very basic Hash Function can be H(k) = k mod 11, where k is the input value.

Example – suppose we have the following list of Employee Id’s to be inserted {101, 220, 452, 321, 600} The Array is of size 5 and Hash Function to be used is H(k) = k mod 5. Let’s calculate Hash Value for each input.

H(101) = 101 mod 5 = 1
H(220) = 220 mod 5 = 0
H(304) = 304 mod 5 = 4
H(303) = 303 mod 5 = 3
H(202) = 202 mod 5 = 2

As you can see we get the following mappings (101, 1), (220,0), (304,4), (303,3), (202,2). The elements are stored at these respective indices in the Array.

At the time of retrieval we again use the Hash Function. So if we want user with id 304 we perform H(304) = 4 followed by Employee[4]. Simple isn’t it.

Hash table is often the best data structure to maintain dictionary. There are other applications as well.

Problems with Hashing

You may have observed that if we have more than one element which gets the same Hash Value then what do we do ? Like if we had a number 600 then it’s H(600) = 0 but we already have 220 at index 0. Now what. This problem is termed as Collision. No matter how good the hash function is collision is kind of bound to happen.

In order to resolve this we have a few ways:

1. Chaining is a simple solution. We can store elements in the same bucket by using Linked List approach. So Hash Table becomes an array of linked lists (reminds me of Adjacency Lists used to store Graphs). A problem is that space is consumed by the pointers.
2. Open Addressing is another popular approach which uses Probing. We keep on recomputing the Hash Value till we find an empty bucket. Searching an element can also be done in a similar fashion. Deletion can have some issues though. Like when we delete an element then the slot is empty. Consider trying to find an element whose location was supposed to be next to the deleted one. A solution can be to flag the deleted slot with something and use it in next insertion.

Probing can be classified as Linear Probing, Quadratic Probing and Double Hashing.

I won’t go into the details of these approaches in this article. But each has it’s own pros and cons.

Summary

To conclude we saw a data structure which is very efficient. It does have some problems but those can be dealt with.

Traversing a Graph using Depth First Traversal

Earlier we have seen how to build a simple Graph using Adjacency List and how to traverse it using breadth first traversal. The Graph which was used is presented below.

Friend_Graph

Building on that we are going to see how to traverse that Graph using another popular approach known as “depth first traversal”, or DFT. This video acts as a supplement to this article.

At the end of this Post I have presented code written in C to show you how can DFS be implemented. After all we should be able to write the code and not just talk about the approach.

A popular application of depth first traversal is to do Topological Sorting or Ordering.

Understanding DFT

In DFT we pick a vertex to start with. In our example friendship graph we start with Ankur. We visit this and mark it as visited.

Next we explore the adjacent vertices to this vertex, like Reena. But we don’t explore the next adjacent vertex to Ankur i.e. Dikshit, instead we explore further and visit Nidhi, who is connected to Reena. So what we are doing is traversing the depths till we reach a leaf or a node that has been visited. Then we backtrack and start exploring the next adjacent node i.e. Dikshit.

Doesn’t this sound recursive? It is and as you will see our code makes use of recursion to perform DFT.

Continue reading “Traversing a Graph using Depth First Traversal”

Traversing a Graph using Breadth First Traversal

Earlier I have shown you how to build a simple Graph using Adjacency List. We used a Friendship Graph to demonstrate that. If you haven’t read that I will recommend you to take a look at the following two posts that cover Graphs and Queues, since we will make use of both these structures.

Friend_Graph

Building on that we are going to see how to traverse that Graph using a very popular approach known as “breadth first traversal”, or BFS.

At the end of this Post I have presented code written in C to show you how can BFS be implemented. After all we should be able to write the code and not just talk about the approach.

Few applications of Breadth First Traversal (BFT)

  • BFT can be used to generate a Tree. This tree can help us find out shortest distances from root node to other nodes. Also it can be used to find Paths between vertices.
  • BFT can be used to determine the connected components in a Graph. A connected component means that any vertex in that component can be reached from any other vertex in the same component.
  • It can help us solve Vertex Coloring problem, which has applications like assigning resources etc.

Understanding BFT

Let’s start by understanding BFT. In BFT we pick a vertex to start with. In our example friendship graph we start with Ankur. We insert this in a Queue (Q) and mark it as visited.

Continue reading “Traversing a Graph using Breadth First Traversal”

Basic Graph Data Structure to demonstrate Friendships

Graphs have numerous applications out of which Social Networks is quite popular.

In this Article I have tried to present a simple use case to demonstrate how to represent it using Adjacency Lists.  Idea is to help you get started with Graphs.

Since we are discussing friendships the Graph used is an undirected one.

Consider a simple case, we have a really small town with only 6 people. The figure below shows how they are connected to each other. So for example Ankur is friend with Dikshit and Reena.

Friend_Graph.png

So how can we capture this data?

A Graph is a set of Vertices and Edges. In our case the people are the vertices and the relationship between them are the Edges.

There are two popular ways to represent a Graph. One is the Adjacency Matrix approach and another one is the Adjacency List approach. The second one is more popular because of it’s efficiency. If you need more details on these two representations kindly visit the following links Adjacency Matrix  &  Adjacency List

Please drop a comment if you want me to explain these approaches as well. For now I want to keep things short and simple.

In the List approach we maintain an Array of Vertices and each Vertex in the Array has a Linked List which points to it’s Edges. See the figure below. I have shown one of the Linked List that shows Ankur’s connections.

AL

Now let’s see how to implement this using C. If you are not quite familiar with Arrays and Linked List I will recommend you to visit Wiki for the same. I have written sample programs for Array and Linked List here and I am planning to explain them soon.

Continue reading “Basic Graph Data Structure to demonstrate Friendships”

Data Structures using C

Lately I have been implementing the standard Data Structures in Computer Science using the C Programming Language. These have the minimal functions to demonstrate the basic usage.

Going forward I will be adding more and also try to explain these.

As of now the list includes:

In the Binary Search Tree program the delete node functionality hasn’t been implemented fully.

The basic algorithm has been presented though which is as follows:

Algorithm to delete a Node from Binary Search Tree (BST)

1. Find the Node in the BST.
2. If Node is a Leaf, then go ahead and delete it.
3. If not find the “in-order successor / predecessor”. Say we use the in-order successor Node_S.
4. Swap the node with Node_S. Now delete Node_S and if it has any children then update the pointers.

For Step 3 we can find wither the minimum node in the right sub tree or the maximum node in the left sub tree.