Blog

Why study Relational Algebra?

I am familiar with SQL so when I had to study Relational Algebra(RA) I found it difficult to understand why should I study RA or how is it related to SQL? Ideally it should have been other way round, not quite in my case.

I couldn’t find good analogies so I came up with a couple of them. Hope this helps you get an idea.

I present two simple analogies here

  1. Say you need an Algorithm to sort a set of numbers. And you write an algorithm, say a version of quick sort. Next you need to implement it. You can code it in any programming language – Java/C etc. using language specific constructs. As long as the idea is retained the code should get you the desired result. Similarly you can think of RA as the theoretical model to query a relational databases. It defines operations like Select, Project etc. This idea has been implemented by a query language like SQL.
  2. Consider specification document written for a Server. Any vendor who plans to build a server should comply with this specification document. As long as the server follows the rules / guidelines specified in the document it’s all good. On similar lines you can think of RA and SQL

The dangers of AI are far more than that of a Nuclear War — Elon Musk

We know Elon Musk is pretty close to the cutting edge stuff in Artificial Intelligence (AI). In this talk he discusses his concerns about AI.

He answers some of the pressing questions and concerns and why some kind of regulation is much needed.

He understands the threats that the so called AI experts fail to see. And he admits that it scares the hell out of him.

Just 5 mins into this talk and you get an idea of the kind of impact AI can have. Let me add here that he mention the difference between Narrow or Weak AI and Digital Super Intelligence, discussed later.

I am sharing some of the highlights from his talk.

Several AI “experts” think more than they do and they think they are smarter than they actually are. They don’t understand the repercussions. He mentions that the rate of improvement is exponential in this area.

  1. Consider this AlphaGO, in a span of 6–9 months, was able to defeat the champions in the Game. AlphaGo Zero crushed AlphaGo. It learnt by playing itself. You can put in rules for any game and it can pretty much beat the best players in that. Question is did the experts predict that?
    Similarly for self driving cars. They are predicting it to be 100–200% safer in an year or two.
  2. Narrow or Weak AI does not pose a risk to species. It will result in lost jobs, better weaponry etc. But Digital Super Intelligence does. Thats why we need to do it very very carefully.

A super intelligence is a hypothetical agent that possesses intelligence far surpassing that of the brightest and most gifted human minds. –Wiki

3. He talks about regulations to ensure everyone is developing AI safely. Even though the dangers of AI are far more why no regulations?

To conclude he wishes that these developments are symbiotic with Humanity. And that we don’t create systems that pose a threat to us.

I will highly recommend that you watch this talk.

What is Insertion Sort?

 

Introduction

When we need to perform sorting on a list of elements then Insertion Sort is one of the basic algorithms which can be used. It’s simple but as we will see it’s not very efficient.

Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time. It is much less efficient on large lists. –Wiki

It’s quite similar to the way we sort playing cards in our hand. It involves comparison and movement for each of the elements. It uses two loops. That’s the reason worst case complexity is O(N²) (you can read about “Big O” notation here). The best case scenario means that the list is already sorted. So we only perform comparison and no movements. That gives us a time complexity of O(N).

Let’s look at the Algorithm

Suppose we have a list of numbers  4,2,3,5,1,7 and we want to sort these.

  • Put elements in an Array. Say array name is input. Code : int input = {4,2,3,5,1,7};
  • Start with the second index i.e. input[1] which has a value 2. Let’s denote this as “current_number“.
  • Run a loop in reverse.
  • Compare current_number with each of the numbers prior to it. If it’s less than any then shift or move the element to right. At the end of this iteration we will find a place for current_number.
  • Repeat this Procedure for all elements in the Array.

The C Program below will help you understand this better.

Continue reading “What is Insertion Sort?”

Gentle introduction to Hashing

Introduction

Suppose we have a list of employee records and we need to search for an employee.  What are the data structures that come to your mind ? Linked List, Trees, Arrays ?

Let’s see the time complexities for each.

  • Linked Lists: O(n)
  • Binary Search Trees: O(log n)
  • Arrays: O(1) provided we know the index of each record.

As you can see using Arrays is the most efficient way. Hashing exploits this feature of Arrays.

So how do we achieve this?

For each record we calculate a corresponding index in the Array. To do this we can use a Hash Function. A very basic Hash Function can be H(k) = k mod 11, where k is the input value.

Example – suppose we have the following list of Employee Id’s to be inserted {101, 220, 452, 321, 600} The Array is of size 5 and Hash Function to be used is H(k) = k mod 5. Let’s calculate Hash Value for each input.

H(101) = 101 mod 5 = 1
H(220) = 220 mod 5 = 0
H(304) = 304 mod 5 = 4
H(303) = 303 mod 5 = 3
H(202) = 202 mod 5 = 2

As you can see we get the following mappings (101, 1), (220,0), (304,4), (303,3), (202,2). The elements are stored at these respective indices in the Array.

At the time of retrieval we again use the Hash Function. So if we want user with id 304 we perform H(304) = 4 followed by Employee[4]. Simple isn’t it.

Hash table is often the best data structure to maintain dictionary. There are other applications as well.

Problems with Hashing

You may have observed that if we have more than one element which gets the same Hash Value then what do we do ? Like if we had a number 600 then it’s H(600) = 0 but we already have 220 at index 0. Now what. This problem is termed as Collision. No matter how good the hash function is collision is kind of bound to happen.

In order to resolve this we have a few ways:

1. Chaining is a simple solution. We can store elements in the same bucket by using Linked List approach. So Hash Table becomes an array of linked lists (reminds me of Adjacency Lists used to store Graphs). A problem is that space is consumed by the pointers.
2. Open Addressing is another popular approach which uses Probing. We keep on recomputing the Hash Value till we find an empty bucket. Searching an element can also be done in a similar fashion. Deletion can have some issues though. Like when we delete an element then the slot is empty. Consider trying to find an element whose location was supposed to be next to the deleted one. A solution can be to flag the deleted slot with something and use it in next insertion.

Probing can be classified as Linear Probing, Quadratic Probing and Double Hashing.

I won’t go into the details of these approaches in this article. But each has it’s own pros and cons.

Summary

To conclude we saw a data structure which is very efficient. It does have some problems but those can be dealt with.

Traversing a Graph using Depth First Traversal

Earlier we have seen how to build a simple Graph using Adjacency List and how to traverse it using breadth first traversal. The Graph which was used is presented below.

Friend_Graph

Building on that we are going to see how to traverse that Graph using another popular approach known as “depth first traversal”, or DFT. This video acts as a supplement to this article.

At the end of this Post I have presented code written in C to show you how can DFS be implemented. After all we should be able to write the code and not just talk about the approach.

A popular application of depth first traversal is to do Topological Sorting or Ordering.

Understanding DFT

In DFT we pick a vertex to start with. In our example friendship graph we start with Ankur. We visit this and mark it as visited.

Next we explore the adjacent vertices to this vertex, like Reena. But we don’t explore the next adjacent vertex to Ankur i.e. Dikshit, instead we explore further and visit Nidhi, who is connected to Reena. So what we are doing is traversing the depths till we reach a leaf or a node that has been visited. Then we backtrack and start exploring the next adjacent node i.e. Dikshit.

Doesn’t this sound recursive? It is and as you will see our code makes use of recursion to perform DFT.

Continue reading “Traversing a Graph using Depth First Traversal”

Traversing a Graph using Breadth First Traversal

Earlier I have shown you how to build a simple Graph using Adjacency List. We used a Friendship Graph to demonstrate that. If you haven’t read that I will recommend you to take a look at the following two posts that cover Graphs and Queues, since we will make use of both these structures.

Friend_Graph

Building on that we are going to see how to traverse that Graph using a very popular approach known as “breadth first traversal”, or BFS.

At the end of this Post I have presented code written in C to show you how can BFS be implemented. After all we should be able to write the code and not just talk about the approach.

Few applications of Breadth First Traversal (BFT)

  • BFT can be used to generate a Tree. This tree can help us find out shortest distances from root node to other nodes. Also it can be used to find Paths between vertices.
  • BFT can be used to determine the connected components in a Graph. A connected component means that any vertex in that component can be reached from any other vertex in the same component.
  • It can help us solve Vertex Coloring problem, which has applications like assigning resources etc.

Understanding BFT

Let’s start by understanding BFT. In BFT we pick a vertex to start with. In our example friendship graph we start with Ankur. We insert this in a Queue (Q) and mark it as visited.

Continue reading “Traversing a Graph using Breadth First Traversal”

Queue Data Structure Basics

In this Article let’s explore the basics of Queue and how to implement it.

In computer science, a queue is a particular kind of abstract data type or collection in which the entities in the collection are kept in order. – Wiki

Why use a Queue?

Consider a website that manages movie tickets. Priority is given on a first come basis. It’s pretty obvious that someone who came earlier should get the ticket first. Here we are talking about “First In First Out”. In such scenarios Queue Data Structure is used.

What are the primary operations that can be performed on a Queue?

We can insert elements to a Queue – known as enqueue, delete elements from a Queue – known as dequeue and check elements in a Queue without deleting – known as peek.

For ease of understanding the Queue DS shown below has enqueue and dequeue methods.

How to implement Queue in C

Below is the code that can be used to perform basic operations on a Queue. The program is in C, but the idea can be used to implement this in other popular languages like Java or Python.

Continue reading “Queue Data Structure Basics”