Hello, and welcome back. My name is William. Today I want to talk about a fascinating topic and that is strongly connected components and how we can use Tarzan's algorithm to find them. So what are strongly connected components or s CCS? I like to think of them as self contained cycles within a directed graph, where for every vertex in a given cycle, you can reach every other vertex in the same cycle. For example, in the graph below, there are four strongly connected components.
I have I've outlined them here in different colors. If you inspect each strongly connected component, you'll notice that each has its own self contained cycle and that for each component, there's no way to find a path that leaves a component and comes back. Because of that property, we can be sure that strongly connected company ponents are unique within a directed graph. To understand Tarzan's strongly connected components algorithm, we're going to need to understand the concept of a low link value. Simply put, a low value is the smallest node ID reachable from that node including itself. For that, to make sense, we're going to need to label the nodes in our graph using a depth first search.
Suppose we start at the top left corner and label that node with an ID of zero. Now we continue exploring that graph until we visit all the edges and have labeled all the nodes. Alright, now that we're done labeling the nodes, inspect the graph and try and determine the low link value of each note. Again, the low link value of a node is the smallest node ID reachable from that Note including itself. For example, the loading value of node one should be zero since node zero is reachable from node one via some series of edges. Similarly, node for us loading value should be three, since node three is the lowest node that is reachable from node four.
So if we assign all the loading values, we get the following setup. From this view, you realize that all nodes which have the same loading value belong to the same strongly connected component. If I now assign colors to each strongly connected component, we can clearly see that for each component, all the low link values are the same. This seems too easy, right? Well, you're not wrong, there is a catch. The flaw with this technique is that is highly dependent on the traversal order of the depth first search Which for our purposes is at random.
For instance, in the same graph, I rearranged the node IDs, as though the depth first search started at the bottom middle node. In such an event, the loading values will be incorrect. In this specific case, all the low link values are the same. But there clearly are multiple strongly connected components. So what is going on? Well, what's happening is that the low link values are highly dependent on the order in which the nodes are explored in our depth or search.
So we might not end up with a correct arrangement of node IDs for our low link values to tell us which nodes are in which strongly connected component. This is where Tarzan's algorithm kicks in with its stack invariant to prevent strongly connected components from interfering with each other. Others low link values. So to cope with a random traversal order of the depth first search, Tarzan's algorithm maintains a set often as a stack of valid nodes from which to update low link values from how the stack works is that nodes are added to the stack as nodes are explored for the first time and nodes are removed from the stack each time a strongly connected component is found. Taking a step back if the variables u and v our nodes in our graph and we are currently exploring node you then our new low link update condition is that to update node use loading value to node V's low link value there has to be a path of edges from u to the end node v must be on the stack and not There's a small difference we're going to make to finding the correct loading values is that instead of finding all the loading values after the fact, we're going to update them as we do our depth first search on the fly, if you will.
This will allow us to obtain a linear time complexity. We'll be doing an example in the following slides but this is totient algorithm in nutshell. Start out and mark each node as unvisited. Start the depth for search somewhere and don't stop until all the nodes are visited. Upon visiting a node, assign it an ID and a low link value. Additionally, also mark the node as visited and add it to the scene stack.
On the depth first search callback after the recursion comes back. If the previous node is on the stack, then min the current nodes low link value with the last nodes low value. This is Essentially what will allow loading values to propagate throughout cycles. After visiting all a nodes neighbors, if the current node started the strongly connected component, then pop off all nodes from the stack, which are in the strongly connected component. You know, a node started a strongly connected component if its ID is equal to its low link value. I'll let you think about that a bit more, and it'll start making sense.
Let's do an example. I'm going to mark unvisited nodes as blue nodes for which the depth first search is still exploring some neighbors as orange and nodes which the depth per search has explored all of its neighbors as gray. Note that if a node is orange, or gray, then it is on the stack and we can update the floating value. I will also be tracking the nodes which are on the stack in the left column. So keep your eyes peeled on as well. So let's start our duct research.
So just randomly pick a node and start there. as we explore unvisited nodes give each node an ID and a low link value equal to the ID. So now we're at node two and our only option is to now visit node zero. Since node zero is already visited, we don't want to visit it again. So now we backtrack. On the backtracking, since node zero is on the stack, we take the minimum of the current nodes, low link value and node zeros low link value.
Similarly, now min, the low link value of the node we were just at which is node one, with node two and also the same for node zero upon Returning back to node zero, we realized that we've actually finished a strongly connected component. Since we visited all the neighbors of node zero, and its ID is equal to its low link value. This means we need to remove all the nodes associated with a strongly connected component from stack. However, we're not done exploring the graph, so pick another node at random. Let's start at node three. And go right.
Now, our only option is to go down. Now we're at node five. Let's take the edge to node zero. So node zero is already visited, so we can't go there. On the callback, we notice that node zero is not on the stack at the moment. So we can set min node five as low link value against node zero.
This is actually very, very Good, because if we did, then we would contaminate the strongly connected component node five as part of with a lower loading value, which node zero has to offer. So let's go to node six. So now we have three edges to choose from. Let's take the one on the right, note two is not on stack, so don't min with its low link value. Now let's take the left edge to node four. Node four is on the stack so we can make this low link value, giving notes six also a loading value of for the last edge we need to visit is the one going to node zero.
This is a situation where node zero is not on the stack, so we can't min with its low link value. On the callback node five can min with node six is low in value because it is on the stack. Similarly for node four Coming back to node four, we've visited all its neighbors and its ID is equal to its loading value. So it marks the start of a strongly connected component. So we now have to remove all associated nodes in the strongly connected component from the stack. These would be all of the purple nodes.
Now coming back to node three, we cannot min its loading value with node four, because we just removed node four from the stack. You will also notice that node threes ID is equal to its littling value. So it should be the start of a strongly connected component. However, we have not finished visiting all of node threes neighbors, so we cannot make that assessment just yet. Now see the downward edge to visit note seven. Now, take the edge to node five.
On the callback, notice that node five is not in the stack. So we don't mean with its low link value, how up to node three. On the callback, we can min with no threes low link, since node three is on the stack. Also men with node seven. So now, we finished with the last strongly connected component. All we need to do is remove all associated nodes from the stack.
And that's how tired Jan's algorithm works to plan a strongly connected components. Very beautiful, isn't it? Let's look at some pseudocode. For how this works. I think it will solidify your understanding. To get started in the global or class scope, I define a few variables that we'll need.
The first is a constant to represent unvisited nodes. Then comes an In the number of nodes in the graph, and G, an adjacency list of directed edges, both n and g are inputs to this algorithm. Then comes two variables ID to give each node an ID and s cc count to track the number of strongly connected components. After I define a few arrays, which store auxiliary information about the nodes not graph, the first array is IDs, which stores the ID of each node, then is low to store the loading values. And finally on stack to track whether or not a node is on the stack. Finally, is the stack data structure itself, which should at minimum support, push and pop operations inside the find s CCS method?
The first thing I do is assign the ID of each node To be unvisited, the IDS array will be serving to track whether or not a node has been visited, as well as what a nodes ID is. In the following loop, I iterate through all the nodes in the graph. There I start a depth first search on node AI, if node AI has not yet been visited, at the end, I returned the array low an array of low link values, which will be the final output of the algorithm. Now let's look at what's happening inside the depth first search method which is really where all the magic happens. So this is the inside of the depth or search method. The input argument to the depth or search method is a variable called at which I use to denote the ID of the node we are currently at.
On the first three lines, I do some housekeeping stuff which is add the current note the stack mark the current Note as being on the stack, and give an ID and a low link value to the current note, then comes the part where I visit all the neighbors of the current node. To do this, I reach into our graph store as an adjacency list and loop over a variable called to which represents the idea of the node we're going to. The next line says that if the node we're going to is unvisited fan visit it. Remember the IDS array tracks the ID of node i, but also whether or not note AI has been visited. This next line is very important. In fact, it's probably the most important line on the slide.
The first thing to notice is that this line happens after the recursive call to the depth first search method, meaning that this line gets called on the callback from the depth first search line. says that in the node we just came from is on stack than men. The current loading value with a node we were just at. This is what allows the loading values to propagate throughout a cycle. So after we've finished the for loop that visited all neighbors of the current node, we need to check if we're at the start of a strongly connected component. To check if we're at the start of a strongly connected component, check the ID of the current node is equal to the low link value for that node.
After we have identified that we're at the beginning of a completed strongly connected component, pop off all the nodes inside the stripe connected component from the stack. As we're popping nodes from the stack also marked off nodes as no longer being on stack. One more critical thing we need to do while we're removing nodes from our stack is make sure that all nodes which are part of the same strongly connected component have the same ID. So here I just assigned each node have the same ID as the ID of the node which started the strongly connected component. Last things are to start popping off nodes from the stack once we reach the start of the strongly connected components, and also increment the strongly connected component count, if you want to track the number of Sonic connected components that were found. Well, that's all I have for Tarzan's strongly connected components algorithm.
As always, there's working source code for this algorithm@github.com slash Wm deserve slash algorithms. The link should be in the description. And folks, thanks for watching. Drop a comment. If you have any inquiries, like this video and subscribe for more mathematics and computer science videos. Cheers