Lab 5: - For dimension, measures, star schema, snow flake, shared connection managers & packages tasks.(SSIS)

MSBI Step by Step Training Lab 5: - For dimension, measures, star schema, snow flake, shared connection managers & packages tasks.(SSIS)
30 minutes
Share the link to this page
Copied
  Completed
This video is Lab 5 of Learn MSBI in 4 days with Project series where we will learn dimension, measures, star schema, snow flake, shared connection managers & packages tasks.

Transcript

So welcome to lab number five. And in lab number five, we are going to do a lot of things because the requirement has increased tremendously. If you see till now you know we have been capturing customer code, customer name, customer amount and sales date. But now, the customer has added four new fields. So one is country name, where the sales has happened, states name where the sales has happened, product name, what has been sold, and the final one is the salesperson who is selling it. So there are four new fields which has been added country states product and salesperson.

And if you watch these new fields very closely, they are actually master data. You can see the country data over here. This clearly says that this has to go into some kind of a country master table. You can see state's table This should clearly go into some table like state's table right, and so on. So basically, this new fields, which has been added here are nothing but you know, they are master data. That means that in our database at this moment, if you see in our database, we have only one table, right?

So we need to go and redesign our database in such a way that we can accommodate these changes, right. So, first thing is we need to go and create the necessary master tables to accommodate all of these new fields. Now, when we talk about database systems like OLAP, or when we design database for data warehouse, we don't design by using those normal normalization techniques. A lot of developers or a lot of people you know, when they do database design, they follow first normal form, second Normal Form third normal form. In case you're new to these data, As design techniques of normalization, I would suggest you to go and watch the video from the SQL Server section where I've explained all these three normalization techniques. But when it comes to OLAP databases, your people prefer star schema or snowflake.

They don't prefer normalization techniques. Now, you must be wondering that, why shouldn't we be doing normalization? The purpose of normalization database design is that to avoid redundancy, to avoid duplicate data. And that makes sense when the data is coming into the system. So when you're doing insert updates and deletes, yes, you'd like to have a normalization database design you know so that you avoid duplicate values you avoid redundancy. But if you look at OLAP systems and if you look at SQL Server Data Warehouse or any kind of database House system, the main goal there is reading the records, fetching the records getting the records as soon as possible doing analysis as soon as possible.

So, for those kinds of systems, people prefer star schema or snowflake. So, when you talk about talking about OLTP systems, Yes, I understand normalization design is the best way to go because data is coming into the system, you want to ensure that you get clean and proper data, but when we are talking about fetching, analyzing, and that time, reading the records is the most important thing that you want to be better in terms of performance, right. So at that time, star schema and snowflakes are a better design. Now, in order to understand that why star schema and snowflake is designed in this way, we need to understand the concept of measures and dimensions first, because if you don't understand the concept measures and dimensions, it would be very difficult to understand why star schema and snowflakes have that kind of a view or design. Now, the whole purpose of business intelligence is to derive intelligent information from your data, when I say intelligent information means to do forecasting to do analysis.

So, basically the whole purpose why why an end user or a customer I will say wants to do msbi is to derive intelligent information from a data. Now, normally, when you say you want to do analysis, when you want to do forecasting, when you want to derive intelligence from a data you normally do on numbers, numbers means, like financial data must be financial data means the profit of the company sales of a company population. So normally when you say you want to do forecasting or you want to do analysis or you want to do intelligence, it is normally on numbers. But standalone numbers don't have any value until you don't have a context. For example, if you look at our data at this moment, so we have yes we have total sales amount, but I would like to say okay sales amount as for what I need a context, who can say okay sales amount as per India, right, or sales amount as for the year 2012, sales amount as per the product like shoes or shirts.

So in other words, one is you have that number, on, on on what you want to do the forecasting and analysis, and the other is the context to that number. Right this number on video The customer intense or wants to do analysis is termed as measures. The context which talks more about that measure is termed as dimensions. In other words, in our database now, we have to create two categories of tables. One category of table, which will store measures, and one category of table which will store dimensions. So in other words, you know, we have to go and revamp or database design.

So let's go back to our SQL Server. So at this moment, if you see we have only one table here called as the TBL customer. So this TBL customer at this moment has the customer sales amount. So fundamentally speaking, this is the measure table. Why this is a measure table, because it has the amount it has that field on which the customer wants to do it. Testing.

Now, the measure tables like at this moment this TBL customer table people also turn this table as a fact table. A fact table is again the same thing it is synonym. So, you can think about then facts and measures of synonym words. So, they have fields in a which will help you to do forecasting on which you can do analysis right. So, what I will do is in order to sync up with our vocabulary what we have just talked, I will go and rename this TBL customer table to fact customer. So, this means that this table has has columns you know, which will help you to do forecasting, analysis and etc.

Okay. Now, I do know that you know once I'm going to rename this table, I'm going to have a lot of problem with Master size. So we'll fix that that's not a worry, but let us try to make this a professional project. So I'm going to go and rename this table to affect customer. So that is done. Okay.

And we need to go and create now couple of dimension tables, which will connect with this fact table, right? So very quickly if I see here at this moment if I go and see here, so we need at least four tables, one is country. One is states. One is products, and one is the employee name or the salesperson name, right? So we need four dimension tables, which can connect to this fact table, right? So let me go and start creating the tables one by one.

So let me go and right click and say I'm gonna go and create a new table here. So let me start here with first first step, let us go and create the country dimension, right? So I'll say country ID. So we need one primary key. So I'm going to just make this is a primary key or set as primary key and country name, right? In the same way we create stayed stable, I will create a product table and I will create a salesperson table.

Right? So just ID and Name ID and Name. So with the N var car data type, and we'll name this table as dimension y dimension, because these are the master tables which will connect to the fact right. So I'll say this is dim country. And in the same way, in the same way, let us go and create the other tables as well right the dim states and dim employ and sorry, dim salesperson and dim product, right? So let me go and create all the tables one by one and sales force.

So you can see I'm done with creating. This is the CCE for some of the final tables. So What I've done now is I've created all the necessary dimension tables, you can see here I've created a country with a country ID and a name. Right? Then I have created a product with a product ID and a name, then I've created salesperson with the salesperson ID and Name and so on and states, right? Good.

But at the end of the day, these dimensions are context to the fact table. So we need to connect them, right. So that means that you know, in our fact table we need to go and create foreign keys to the dimension table, isn't it? So basically, this fact table will connect to the foreign key. So basically over here, what I will do is I will say, okay, every sales will have a context of country ID. It will have a context of states ID.

So these are all foreign keys you can see it will have a context of product. product ID underscore fk. So context of product, it will also have the salesperson who's sold. Right? So you can see that you know, what I've done is you know, this is a centralized measure table of fact table. And by using these foreign keys, he will connect to the dimension table.

So we need to go and establish the relationship between the fact table and the dimension table. So let me go back here. View Object Explorer. And let me create a new database diagram. So we say add all of these tables and select all of them. Add and add add Yeah.

So I've created a new database diagram with all the tables. And now I will go and create a foreign key and primary key relationship between them. So remember, our fact table is our centralized table. And all of these other tables actually surround around this fact table. Right? agree or not?

Right? So basically, the states we go here as a foreign key. So yes, so you can see our dem states is a primary key and the fact table has a foreign key that is right. In the same way the salesperson. Right, same way the country so you can see how I'm establishing the primary key and the foreign key relationship and product and don't forget Don't forget to press Ctrl S to save this if you don't save this this relationship is not saved okay. So, customer okay nice.

This design what you see at this moment before you is nothing but the star schema design in star schema design at the central you have your measure table or fact table and they are connected by a primary key and foreign key relationship to the dimension table. While in snowflake, we will do snowflake later on in snowflake it is the same design as star schema, but even the dimension tables have connection for example, you can see here it is very much possible that one country can have multiple states but still you can see the country and the state stables are not connected. But if the country and state state were connected by any way by using a foreign key or a primary key, then it becomes snowflake. So, in star schema, the dimension tables further do not have any kind of relationship they do not have any kind of more drill down, while in snowflake, the dimension tables can have relationship with other tables.

So, now the first thing is, we need to go and load all these dimension tables first and then load the fact table. So, in other words, our SSIS program now has to go through huge changes, new additions, wherein first step would be to load all the dimension tables, and then finally load the fact table. So, first thing our SSIS package this package you know, definitely would give a lot of problems now, right because it is pointing to the TBL customer table and the TBL customer People namely notice change, but also even new fields have been added, right. So definitely at this moment, this complete program here would go for a toss. So what I would do is I'm going to go and disable this at this moment. So at this moment, I know that I need to do a lot of things here.

So I'm going to go and disable this so that I don't get any error. So I have disabled this complete component here, right. So the first step would be to load the dimension table and then load the transaction tables right or then load the fact table. So I'm going to go and make both of these processes complete separate, you know, so that I don't mix up all the code in one DTS package. So this package, which is loading, currently the customer table, I will rename this package as fact, dot DTS x. So I'm going to go and rename this package as fact.dt x x so it's a fact customer by fact customer because this DTS package will load To my customer data, right?

And let me go and add a new package. So you can see here, we know you can go and add a new SSIS package. Remember, we have package, then we have project and then we have solution. Right? So I'm gonna go and add a new package here. New SSIS package and this package I will name it as let's first do for country master.

Okay, so let us load the country master database first. Okay, so this package I will say country dot DTS x, okay. So this is your country dot DTS x. And we will have one main package, right, so we'll have one one main package, which will first execute all the master packages that is country master state master and all masters and then he will exclude the fact customer. So let us add one One more package here called as mean or something. So, this package one here, I will rename this and this I will say main package, right.

So in this package, we would first like to go and execute the master packages right so this main package this main dot DTS x will define the flow of the packages right. So, I will go here and say SSIS Toolbox So, in this I will say okay first execute the package which is related to the Masters So, at this moment I only one country right. So, I'll say Edit So, let's let's put some nice names here or else later on this will become confusing, right? So, this is country master loading, whatever it is, and this is your fact customer loading. Okay. So you can see that you know, when you watch with bigger SSIS projects, how to go and organize your packages logically, right?

So over here, I'll say Okay the first thing it will go and it will run is the package which is country, right? And okay. And second here this will run customer, right. Great. So you can see now first the country packages will run and then the customer transaction package will run. Great.

So now if we run this program here, just quickly, let me just remove all breakpoints right. So now if I go and run this program, nothing will happen. So you can see the main DTS x package has ran. That is because there is no code at this moment. Because if you remember I have disabled the fact customer DTS, right? DTS x.

Right? Good, but you can see that at least the flow is happening the country master DTS x package is first running, and then the fact customer is running. So, all these things are working at least the flow is running. So, let's first go and try to load the country master into the dim country table. So, if you remember, we had this dim country table here, let us try to go and load the dim country data for so we'll false load dim country, then we will load must be dim states then we will load dim product you know, then we will load them salesperson, so, all the Masters would load first. So, what the end customer has done as you know, he has given us all the master files.

So, we can see for the country Master, you know, there is a CSV here, so, this actually loads the country, right in the same way. For the product, you know, there is a product CSV here. Right. So actually what what they have done is, you know, they have provided all the Master csvs to us. So the first thing, what we have to do is we have to go go to these folders and load all the Masters first. And then finally go and load the customer transaction table.

And also, before I move ahead, one of the things probably which would have come to your mind is that we have your three DTS x package at this moment. So how does SSIS know that he has to run the main first, right? So remember that, you know, in order to run one of the backups as a startup package, you have to right click on this and say, set as startup object. So if I make this set of startup object, then this guy actually runs first. So remember this menu here, right click Set as startup object. So this will make the main dot DTS x package, run at the start.

Okay? So even if I am on fact customer here you can see if I go and run, what it will do is it will try to run the main DTS x package. So you can see that the main DTS x package is running. So now let us go to the country dot DTS, DTS x. And let us go and load that CSV file. Okay.

So first thing we need to go and put a data flow task. This process is all same as you have done for the customer text file. Okay? So load country. And inside this, the first thing what I would need here is I would need the source so my source is a flat file source. My destination is SQL Server, right?

So all the steps are all same. So I'll say this is Country text. So you can see that I have configured this file input control. The configuration steps are all same as I've explained in lab one, okay? So that is go and configure the A do dotnet destination. Now remember that the A do dotnet destination is the same database, and we need to go and connect to the country table, right?

So now so let's go and do an edit. And let us say new. Right. And if you remember, in this, we go and give the server name and all those things right. But before I go ahead and configure this Connection Manager, just one small thing. If you see at this moment, all the DTS X Files here will connect probably to the same database right so if you remember In fact, customer, right will connect to the same database even probably will be creating other DTS x package, which will load the salesperson, which will load states which will load product, everyone will connect to the same database, right?

So rather than going and creating connections again and again, or Connection Manager again and again, you can see, for example, when I'm on the customer failure, I have a connection manager here. In the same way when I'm doing country, right. Again, I will create one more Connection Manager. So is it possible that we can go and just create one global Connection Manager and reuse it? Right? And yes, we can do that.

So what we can do is because we are connecting to the same database, right, what we can do is we can go and create a global Connection Manager. You can see that There is a connection managers folder here. So we can say add a new Connection Manager and we say okay, I want to use a do dotnet and I will say add and I will use this database, right. And there it is. So we can see now that we have a global Connection Manager, and I will just reuse this. So, tomorrow if you go and change the database you have to just change at this place and not in every individual DTS x file right.

So, now, let me go and again do edit. And you can see that now, I find this Connection Manager in my global section here and my interest is to connect to the country table that it is the mappings Yes, all these steps are same. Great. So country dot txt is loaded. I will just go and test this. So let us go and just run our main TTS.

Exe file and let us test Once. So, let me go to my Object Explorer. And let me just go and see if the country has loaded into the country table. So you can see all the country in the text file. So if you just go and see the text files of the country master they have all been loaded into their school. So now, right?

So we have three countries and three country records have been loaded over here. Great. So the same way, probably you'd like to repeat the same process also for the other tables as well right for other master tables as well. So the way we have created country dot DTS x, we would like to create state Master we would like to create product master we would like to create sales table and so on right. So, we would like to go and add probably all the loading of the of the Masters accordingly. So, that brings us to the end of last five Now I know that this is a half done lab because we have a lot of things over here we have to actually go and fix off or customer fact table customer fact loading I'm sorry, we have to go and load all the Masters etc.

But you know, let us try to keep the lapse of 3030 minutes because you know if I try to give you a one hour lab it would be very difficult to digest in one go right. So, let us keep these learnings 3030 minutes let us digest them with small small pieces so that you know it is manageable. Now I would like to give you a small homework at this moment I have done ETL for country master. So in the same way You know if you can do four states master if you can do four products Master, and also if you can do for the sales Master, right? So at this moment I've done for country Master, I leave it as a homework for the other masters to you guys. So what have we done in this lab?

In this 30 minutes of the lab we have talked about dimensions and measures. We talked about star schema and snowflake design. We also talked about the differences between them and why they are best suited when it comes to OLAP systems. We also talked about something called a shared connection managers. So rather than creating the same Connection Manager again and again, in different packages, we can just create a shared Connection Manager. And also we saw how we can invoke packages right DTS packages, why our task.

So before we complete lab five or before we end lab five Let's talk about the future roadmap for lab six and lab seven. Lab six and lab seven will try to complete lab five. So in lab six unit we'll talk about something called as SCD. So, we will talk about two types of SCD and how they will help us to load the Masters in a more proper way. And in lab seven, we will use merge merge joins lookups you know, to complete that fact table. So, lab six and lab seven will be more of things trying to complete lab five.

And I think you know, the coming to labs are very important because whatever components will be covered in the coming to labs are the most used components in SSIS. Thank you. So that brings us to the end of this 30 minutes and yes as usual. We would like to hear from you on which lab you are on currently. Are you facing any issues? Do you really like this learn MSP section and any feedbacks, anything, you know, it's most welcome.

So you can go to facebook.com slash questpond. And you can put the feedback over there. With that, you know, what happens is we as trainers get encouraged. Okay, so that brings us to three and a half hours of our training and a lot of things done, but the coming two hours are very, very crucial. So hope the best and keep moving ahead. Thank you so much.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.