I'm pretty excited, we're going to be talking about the open addressing collision resolution technique for hash tables. So let's get going. First, let's just do a quick recap on hash tables so that everyone's on the same page. So the goal of the hash table is to construct a mapping from a set of keys to a set of values, and the keys need to be hashable. Now, what we do is we define a hash function on the keys to convert them into numbers, then we use the number obtained through the hash function as a way to index into the array or the hash table. However, this isn't a foolproof method because from time to time, we're going to have hash collisions, that is two keys that hash to the same value.
So we need a way to resolve this and open addressing is a solution for that. Alright, so whenever you're going to be using the open addressing collision resolution technique, the one thing to keep in mind is the actual key value pairs themselves are going to be stored in the table itself. So as opposed to say, an auxiliary data structure like in the separate chaining method we saw in the last video. So this means that we care a great deal about the size of the hash tables and how many elements are currently in hash table. Because once there are too many elements inside the hash table will also be really hard to find an open slot or a position to place our elements. So just an important piece of terminology, we say that the load factor is the ratio between the number of items in a table and the size of the table.
So this means we need to keep tabs on the load factor. Here's a neat chart from wicked pedia so on the chart, what you can see are two different methods. One of them is chaining that is separate chaining and linear probing and open addressing technique. And we can see from the linear probing one is that once it gets to a certain threshold, it gets exponentially bad. So you don't want it to go anywhere near that say point eight mark. In fact, we're going to be keeping it a lot lower than that usually.
And what this says is we always need to keep the load factor which we denote by the Greek letter alpha, below a certain threshold, and we need to grow the size of our table once that threshold is met. Right, so when we want to insert a key value pair into our hash table, here's what we do. We take the key, we use our hash function defined on it. Key and we hash the value. And this gives us an original position inside the hash table for where the key should go. But suppose as a hash collision, and there's already a key in that slot, well, we can't have two keys in the same slot.
That's not how arrays work. So what we do is we use a probing sequence, which I will define as p of x. So now on to tell us where to go next. So we hashed to the value, H of K, the original position and now we're going to probe along using this probing function in such a way that we're going to eventually find an open slot along the way. So as it turns out, there are actually an infinite amount of probing sequences to choose from. So here are a few of them.
We have linear probing, which probes via a linear function. So given an input parameter x, so when we're probing we start, usually x at zero or one. And as we're unable to find free slots, then we just increment x by one, and it works the same for all of these probing functions. for linear probing, we use a linear function for quadratic probing, we use a quadratic function. And then there's double hashing, which is super neat. Actually, what we do is we define a secondary hash function on our key, find its value, and then use that inside the probing function.
And the last one is the pseudo random number generator probing function that we can use. So given a random number generator, we're going to see that using the hash value of our key which we know is deterministic. So it's always gonna be the same thing. And then we can use that inside our probing function, which is pretty neat and increment by x each time. And we know x increments by one. So we're just getting the next number in the random number generator, and then the next one after that.
All right, so here's a general insertion method for open addressing. Suppose we have a table of size n. And here's how the algorithm goes. First, we initialize x to be one. So x is a constant, or sorry, a variable that we're going to use for the probing. And we're going to increment x each time we fail to hit a free position, then we get the key hash just by hashing our key and that is actually going to be the index where we're going to look at a table first. So while The table indexes occupied meaning it's not equal to No, we're going to say our new index is the key hash or the original position, we hash to plus the probing function, mod n, and so that we always land back inside the table.
And then we're going to increment x, so that the next time we run this loop, well, we probe at a different position. And then eventually, we're going to find a free position. We always set up our probing function in such a way that we will always find a probing a sorry, we will always find a free slot because we know that the load factor is being kept below a certain amount. All right, so here's the big issue with open addressing and it's that more probing sequences that we choose module and are going to end up producing some sort of cycle shorter than the table size itself. So imagine your probing sequence just hops between three different values and cycles. And your table is of size 10.
But you're only ever able to hit three slots because it's stuck in a cycle. And all of those three slots are full. Well, you're stuck in an infinite loop. So this is very problematic and something we absolutely absolutely need to handle. Alright, so let's have a look in the example. So right here I have a hash table and using open addressing, it's got some key value pairs already inserted, and assumed that the circle with the bar through it is the null token.
Further assume that we're using the probing sequence p of x equals four for x, and suppose we will insert a new key value pair into the table and that the key hashes to eight. So that means we want to insert that key value pair at position eight. But oh, it's already occupied because there's key five value five already there. So what we do? Well, we probe, so we compute P of one, which is four x at one, so we get eight plus four my 12. Well, that's zero.
So then we go slot zero, and we see Oh, that is also occupied, because key one and value one is already there. So now we compute p f2. And then that gives us 1612, which is four and then, oh, that cell is already occupied. And then we keep probing and as you see right now we've entered a cycle. So we Keep probing and probing and probing, and we will always be getting back to the same position. So although we have a proving function, it does not work.
In this particular situation, the probing function is flawed. So that's quite concerning. Because not all probing functions are viable. They produce cycles are shorter than the table size. How do we handle this? And in general, the consensus is that we don't handle this issue.
Instead, we try to avoid it altogether by restricting the domain of probing functions that we choose to be those which produce a cycle of exactly like n, and those probing functions do exist. I have a little Asterix here and says there are a few exceptions and this is true. There are some probing functions we can use which don't produce a full cycle but still work. We're going to have a look at I think one of those in the quadratic probing video. All right. So just to recap, techniques such as linear probing, and quadratic probing and double hashing, they're all subject to this issue of the cycles.
And what we need to do is we need to find probing functions which are very specific that produce a cycle of length and to avoid not being able to insert an element and being stuck in an infinite loop. So this is a bit of an issue with the open addressing scheme, but it's something we can handle. Although notice that this isn't something you have to worry about if you're in the separate chaining world, just because we have that auxiliary data structure that just captures all our collisions. So in the next video, we're going to be going into great detail on linear probing. So guys, if you learn something Please like this video and subscribe. I'll catch you in the next video and drop a comment.
I always love reading those. Thank you