When you say that you want to fine tune your SSIS package, you need to know that how your components actually execute. So these components, which are part of our SSIS package, you should know that how they execute, how they consume the memory, do they run in a multi threaded manner or not. So in other words, you know, you need to understand how your components actually execute. And the logic what you see over here, how it is actually executing and can be improved on that. So, in order to understand your component better, in order to understand your logic better what you're doing, you need to understand five important key things synchronous as synchronous, full blocking, semi blocking, and no blocking all these five keywords I have discussed in the previous part of the video as well. But in this video, I'm going to go deep dive in these five keywords so that you know we can understand SSIS in a better manner.
So let's first start with that. synchronous and synchronous components. Okay, what exactly these two keywords are. In order to understand these two keywords, you know, let us go back to our one of our previous videos where we discussed about buffer size tuning. If you remember in that video I said that when data moves from one component to another component, it does not go row by row, but it goes in buffers. But this is not so simple as it looks like and there are more complications to how this blues move in the buffers.
There are two scenarios. Let us see now data is moving from one component to another component, right. So one is the other component where the data is moving, he can either create new buffers or he will reuse the existing buffers. So for example, now you can see over here, let us say this area dotnet sends data and he creates 10 buffers and those 10 buffers due to the conditional split. So, there is a high possibility that the conditional split can create new buffers or he can just use the existing buffers and updated. So, if the component creates new buffers, it is treated as a synchronous components.
And if the component reuses, the existing buffer and does not create new ones, then we that means you know those components are synchronous components. So, in short, we need to go and identify the as synchronous components. Because if a synchronous components are going to create new buffers, so that's a big problematic thing, right? For example, think about it from a dotnet source let us say 10 buffers come out. The conditional split says that I might have synchronous components and it creates more 10 buffers. After that data set the salt component is also as synchronous he creates more than buffers.
So if we have not have such kind of assets synchronous components, then the buffer will just get multiplied. So, somehow we need to identify these components when your SSIS package is running right. So, the first thing is that how do we identify a synchronous components? Because you can see over here that we have something like 5678 you know six, eight components, right? How do we identify that which one of these components are synchronous and which one of these components are as synchronous. So, to identify that the component is synchronous or asynchronous, we need to go to the control flow and we need to enable logging.
We have already seen how to enable logging in the buffer size tuning video, but you know, you just want to repeat this So, over here we will go and enable logging. And we will say that, we want to enable logging into a text file. And this text file will actually exist so, We'll keep this text file over here. So I'm going to create a text file here called as log txt right. And now, you can see when you enable logging, it is enabled at the package level right and after that it is also enabled at the data flow level. So this is a control flow enable enabling of the logging and also on the data flow level.
So yes, I want to enable on the data flow level as well. And in this, I'm going to go and select two events. One is the pipeline execution plan and the pipeline execution tree. These two events will actually help us to identify if the component is executing in a synchronous mode or is it executing in a synchronous mode, right. So remember, enable the logging on the control flow level as well as enable the logging on the Dataflow level and ensure that you select both of these events. So that we do okay and let us try to run this package and let us see the output so I'm going to go and run this package.
So that you can see my package is executing and if you see at this moment in our package, we have a conditional split we have a derived column we have aggregate column, we have a salt column, we have a copy component and so on right. So, we do not know you know from these components, you know, which are synchronous and which are as synchronous, so that the program has executed. So now let us go to our folder where the log file is created. So our log file is yours. Right. Okay, and we see it on if we go to the folders, then the log file is let us open this log file and not back so I'm going to go and edit this in notepad and I see a lot of things over here confusing right?
Let us go one by one, let us understand what each one of these things here means. So first, let us start with this thing over here, you can see that let me delete all this. So we can see there is this thing, you know where we are seeing some kind of ID, and it is saying that, no, I'm excluding this package, okay. So this ID number here indicates actually this control flow. So, this ID number 126, t 09 e is nothing but this control so you can see this ID, every component in SSIS is assigned a unique identifier. So, this text here the first thing what you see here is saying is that he's trying to execute this package with this ID number, right.
So, that is the first thing. Second thing is, you know, this control flow actually executes this data flow task right even the data flow task also has an ID you can see 4054 So, if you see the next line in the text file is saying is that once this package is executed after that is trying to execute this data flow task, okay. So, the first two paragraphs should you see that indicates which control flow task is running and which data flow task is running. So, I'm going to go and delete all of this you know, so, that is easy to understand. Let us concentrate on things you know which you see from this beginning part here, this is where the the the main thing is, so, you can see here there is a begin part zero then there is a begin part one and then there is a begin Part Two right.
So, let us remove all the other things and let us just concentrate on the picking paths because that is where you know we will get the answer that this is a synchronous component or this is a As synchronous compliment Now, let's first try to understand these lines what they are saying here because if you understand what these lines are saying, then we will understand things better. This line over here, just put a comment, there is no comment like this. I'm just saying that it is that just for notes okay. So, it is not a C sharp program or a Java program where you put a slash comment like that, but this line here, what is C says that there is a do dotnet source component who is sending output who is sending output. So, there is a do dotnet source component with ID one, look at this right hand side with ID one he's sending output two, output number six and talk about what is number six Now, the way every control flow task and data flow task are identified uniquely in the same way, every component over here is also identified with a unique ID.
So if you select this ad or dotnet source, you can see this ID one on the video dotnet source. So that's, that's what exactly is that number one over there. So, this is that component A do dotnet source number one is sending the output to six, what is the six? So, if you go here, to Show Advanced Options, you will see that this a do dotnet source has only output via output, because this is the source right? So it does not have an input, isn't it? So it is saying that that output number is six, you can see the identification string also.
Right. So, in other words, this one line here indicates that what is a component and is he sending in, is he taking the input or is he giving output Right, second one, look at the second line, conditional split component number 25. So here is the conditional split, you can see the ID is 25. Right now, this person is actually taking an input, you can see. So the output of the A dotnet source goes to the input of the conditional split, right? So conditional split number is 25.
And if you go to the advanced editor, now, because this is a transformation component, it has an input as soon as it has an output, so it is saying that I'm sending the input to 26. So look at this 26 right. So you can see that you know, the eight In short, what it's saying is that this a do dotnet source output is sent to the input split component. Now, from the inputs, the split component, there are two paths, one part which goes to the sword and the other part which goes to the derived column, right? So we can see and he's saying that okay there are two paths sub paths, one is sub path zero and the other one is sub path one. And in the same way, if we go inside the sub path for example, you can see here we are saying that the output of condition one is sent to the sought input.
So you can see the output of this conditional split is sent to the SOT here and the output of this conditional split is sent to a derive column right. So this output of this conditional split is sent also to the derive column. And further that derive columns input is sent to the output of the other derived column. So you can see that we are able to navigate the complete tree of our SSIS from this thing over here, right? But the most important thing why we enable this because we want to know that are we creating New buffers here or are we creating new buffers here or where are the new buffers getting created that is what our goal here is. Now, wherever you see this begin path zero right wherever you are seeing this begin path word in those places new buffers are getting created.
So, what it is saying over here, this begin path zero till this end path zero they are working on the same buffer, but from this begin Part One till this end path one they are working on new buffers. So, here new buffers are getting created and on this begin part two new buffers are getting created. So in short over here, first buffer is getting created here when the ad dotnet source reads the input, right. So that means this is the place where the first buffer is getting created. Let me go and Okay, so these are people When the first buffer is getting created, now look at this begin part one look at the first component it is sought output 35. So, that means, this is the sort component if you go to the properties, so, that is 33 so, that is this.
So, again a new buffer is getting created here. So, the way it is going is like this from this ad or dotnet till this conditional split the same buffer is used, but the time he goes to the salt a new buffer is getting created. After that again look at begin Part Two over here, new buffer is getting created when the aggregate output is component is getting executed that means over here. So, in short, we are having new buffer first thing this is the buffer for reading what is getting created. After that the buffer is again new buffers are getting created here and new buffers are getting created here. So, we are creating buffers three times.
So here it is Begin part 01 and two. So, in other words first one the begin part zero is understood because for reading purpose you need some buffer right. But after that this begin part one and begin Part Two are additional buffers which are created by those components. So, in other words here now, in this case, the source component and aggregate components are a synchronous components, because they are creating new buffers, even the dotnet sources are a synchronous component, okay. So, a dotnet source sought an aggregate, they actually create new buffers while derived column copy column conditional split they are using the existing buffers. So now that we know that the buffer is getting created here, the buffer is as new buffers are getting created here on the salt and also aggregate.
So there are three places in Nevada the buffers are getting created. Let us see that if we can remove some of these components you know, so that We don't need those extra buffers right. So, first thing either open source we cannot remove because we have to read data, right. So this buffer creation is understood, let us give a thought on this sort, this sort of synchronous component can be removed, at least in this condition. Because what you can do is on the source itself, you know when the data is read, read, you can go and say that do a salt. And you can say that do a salt on the ID column.
So, I can say that salt key position one. So it says that this is sorted on this ID column, right. So, we can go and remove this sword you can see now, here it is also giving a warning saying that this data is already sorted. So, do we want to really put this extra salt component over here, right? So I can now go and remove this Yes. And I'm going to go and directly connect from this conditional split.
Now to this derive column right And you can see that time we give this input to the derive column there are some errors here which are coming in. So let us go and fix this error. So I'm going to say Edit. So it says that okay these things are as it is. So let me say Apply. Right.
So looks like the errors have got fixed. But so now if we go and run this, so what we will do is let us go and delete the lock txt file, let us create a new lock txt file. And so I'm going to go and delete this text file at this moment what we have here right and let us read on the application and let us see that what kind of output we are getting in the in the new run. So let us run this that our application is running. Right, so I'm done. Now, let us go back to our log txt file.
So let us go back to our log txt file. Let us edit it again. See, so, the first two paragraphs I have already explained. So, let us go and delete all these two paragraphs what we have here and let us just concentrate on the beginning part thing, right. So you can see now, there are only two paths, path zero and path one, you can see that the part two has gone off. In other words, there are only two buffers getting created one by the age of 10 souls and one by the aggregate column.
So, in other words, you know by looking at your your log files and looking at the way your application is running, you can make out that how many buffer new buffers are getting created, and at which place they're getting created. And if the logic permits, you know, you can replace those as synchronous components and make your application less memory stressed. For example, over here I said that I would like to sort the component right you know, when the ad or dotnet reads the records. So, with that, you know, the extra buffer created by the source component is now removed. So, that was all about how to identify synchronous and asynchronous component. But now, we also like to identify components you know, which are going into semi blocking mode or full blocking mode.
In other words, for example, let us say when we execute this application you can see over here data has arrived till the aggregate but aggregate it has been blocked. So, in other words until the aggregate completely finishes the work, it will not give the data to the copy column, but you can see other column for example, conditional split and derived column they are taking the data and then that they are giving it right ahead. Right. So, in other words, I would like to also go and identify that you know if that component is doing any kind of blocking or not if it is doing any kind of blood Then probably I would like to do multi threading. Remember, in the previous video we talked about multi threading. So, probably, I would like to not have only one aggregate component, but I would like to have multiple aggregate components.
So, we would like to also identify now, that each of these components are going in a blocking mode or in a in a full blocking mode or semi blocking mode. Now, let me make a statement here it is very easy to identify if a component is synchronous or if a component is a synchronous, but it is extremely difficult to identify if a component is a non blocking component or it is a semi blocking or blocking component. So, the best ways that you can see that I have put up a chart here and in this chart, you know all the components are categorized, you know as per that they are blocking or they're semi blocking. But let me tell you, that how you can identify in general that if a component is a blocking or semi blocking or no blocking, if a component is doing activities like more Merge merge join in also union in other words, it is taking data from two places and it is trying to give it ahead those count components you know, they try to be they they tend to be semi blocking.
So, in other words they don't fully block things, but they tend to be semi blocking but components like SOT aggregate grouping these components are fully blocking okay. So, so, the way to identify between semi blocking and blocking is we are not worried about non blocking because non blocking is good right, it is not blocking anything it it just takes the data and sends it to the output, but we need to identify new components which are semi blocking and blocking. So, remember that you know components like SOT things which are doing joints, you know these kind of components you will always find that they are either semi blocking or they are either blocking certain them that may also give you one more way of identifying semi blocking and blocking. You know this way is not the perfect way, but it still gives you idea that Do you know if that component is in one of these categories, because, for example, now in SQL 2014, again, new components have been introduced in SQL 2016.
Again, new components have been introduced. So you don't get these kind of charts from Microsoft saying that this is blocking and semi blocking, right? So we need to have a way of identifying these components. And second thing about if a component is made in C sharp, you know, which is a custom component. So how do you know that that custom component is doing a blocking transformation or it is doing a semi blocking transformation? So I'm going to give you a methodology here to identify them.
But as I've said that this methodology is not foolproof, but yes, it does give you an idea that if the component is blocking or semi blocking, now, when any component executes in SSIS, it has five important events which happens or there are five important steps which happens the first one is it goes and it validates the components so it checks like you know if The connection is happening or not or must be there are some XML checks etc. The second thing what happens is pre execution process. So, pre execution process is nothing but the initialization process where a component gets initialized, it gets it has some necessary data that is filled up finally, the process input happens and the last is the post execution happens. Now, once all of these four events are finished, it goes and it fills the output and gives it to the next input.
Now, from all of these events, this last event is the most important one. This last event is where you know we can recognize that this is a synchronous component or not. So, if you find that any component is taking a lot of time in this event, that means, you know that that component is a synchronous component and there is a high probability that that component can also be a blocking component or a semi blocking component or a full blocking component right So, let us go ahead and let us see that how do we identify the prime output and let us try to identify 70 blocking and full blocking component. So, what we'll do is let us go to our control flow. So, in other words what we have to do is now we have to go and figure out the time needed in each one of these and especially the time needed in prime output right.
So, let us again go back to our logging and over here you will find that there is a very interesting event called as pipeline component time. So, this pipeline component time informs about each fees like validation etc okay. So, let us go ahead and enable this component. Let's save this and let us run this so that it is running and first thing it is very it's visually here you can see that it is blocking an aggregate but remember that in bigger projects, you know, it is very difficult to find out this. So, definitely we need some more science. way of identifying these components.
So, visually Yes, I can figure out that yes this is blocking over here right. So, there it is everything green everything fine. So, let us go now to our log file and let us try to read this and understand it. So, you can see that there we have the output. So, you can see for example, you can see the validate event for every component and it and here you can see the time spent on it. So, on conditional split it has spent zero milliseconds etc.
So, you can see validate event after that probably, you will also find pre execute event and so on right, but our interest is the final one the prime output filling buffer So, there it is. So, you can see over here now, this prime output filling buffer is happening for both if you see at the top, you know prime output is not happening for derived column it is not happening for copy column it is happening only for the A do dotnet So as you can see a do dotnet source prime output filling buffers and then for aggregate again there is prime output filling buffers. So that clearly means that both of these components are taking time to fill that output. If it was no blocking, then these components won't take any time right. So, we won't find any time duration from Prime output, but if a component is taking time to fill the output buffer that means that it is first to a synchronous component.
And second, you know, there is a high suspicion that it is one of the semi blocking components of blocking component. So I can easily see over here that aggregate is taking approximately some time here, right there is a prime output filling buffer. And also I can see that ad or dotnet sources also taking some time, right. This clearly indicates that both of these components are a synchronous, and there is a probability that both of these components are either blocking or semi blocking. Right so I hope that you enjoyed this video. In this video.
We were trying to understand filing important concepts as synchronous component, synchronous component, blocking, fully blocking or no blocking. So once you know how to go and see these log files, so once you know how to go and read the log files and you enable logging, you'd get a very nice idea of how internally SSIS package is executed. And then you can go and fine tune on that on that SSIS package. So remember, you have to go and enable logging. So you have to go to the package level and enable pipeline component time pipe and execution plan and pipeline execution please. So I hope that you enjoyed this video.
Thank you very much.