Okay, so, in this data mining process, so, this is Chris DMR data mining process from these are IBM okay. So, this one you can fire these are Chris de m here in these are Wikipedia so you can know more about Christie I'm here. So, a business understanding in this business understanding states we will try to understand more about why is the requirements what is the goal or wise objectives that we want to get from this data mining project then we can also define some of the people who are the people to do this our data mining project Okay, then we add this data understanding states so data understanding, so are we have data and then we have understanding, so we try to understand that the time the stage So, in data understanding states we say use some of the descriptive statistics or some of the inferential statistics or some of the regression analyses to help us understand about the town know more about the data.
So, data understanding. So, this stage means that we try to understand the data. So, for descriptives that is the mean median and mode are the most common measure for central tendency. central tendency is a measure that are best summarize the data. So, Mo Mo is a value in the data has a highest frequency and is useful and the differences are non numeric and seldomly occur. So in Python, we can get a more of the data by using this code here.
So I have tried this code So, we need to import these are empty or these are NumPy. And then we need to import this pandas then we can read a CSV. So, we are this rich CSV here, then this is our data and then this is the delimiter then we can get the most using df dot more than we can also gather more or less a one variable using something like D So df DT and sepal length sepal length is the variable name. So, it's a power usage see a separate line here, then you put a semicolon here, can you ever get a variable for these are are these variables sepal length from these data set, and then we get a more for these are sepal length variable. So you can write this code in spider So in for NumPy as NP import pandas as PD, okay there we can read the data.
So the data is data frame df equals PD dot read CSV data file should be D drive high res dot CSV and then the separator or the delimiter should be a comma okay then we can get more by using the edit mode so I will do something pre df.mo okay then I see if I want to print more for sepal length I can do something like These green gate rd D okay SEPA LAN then da mo Okay, so I can run this program Okay, so I can do something like this Cray slashing okay Randy socko here okay so I get the more for all the variables or the whole data set. So the separate into this is the categories so they separate into this setosa in this whole setosa category. The mo is by Ah three four, these are columns here 1.5 for this column here and 0.2 for this column here, for this category there is no more.
And then for this category there is no more and more for these are separated by here.