Dataset Processing and Analysis (Sample-1)

Microsoft Azure Machine Learning Studio Microsoft Azure Machine Learning Studio
56 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$49.99
List Price:  $69.99
You save:  $20
€47.98
List Price:  €67.17
You save:  €19.19
£39.88
List Price:  £55.84
You save:  £15.95
CA$72
List Price:  CA$100.80
You save:  CA$28.80
A$80.48
List Price:  A$112.68
You save:  A$32.20
S$67.96
List Price:  S$95.16
You save:  S$27.19
HK$388
List Price:  HK$543.23
You save:  HK$155.23
CHF 45.02
List Price:  CHF 63.03
You save:  CHF 18.01
NOK kr567.82
List Price:  NOK kr794.99
You save:  NOK kr227.17
DKK kr357.85
List Price:  DKK kr501.02
You save:  DKK kr143.17
NZ$88.81
List Price:  NZ$124.35
You save:  NZ$35.53
د.إ183.61
List Price:  د.إ257.07
You save:  د.إ73.46
৳5,964.32
List Price:  ৳8,350.53
You save:  ৳2,386.20
₹4,276.51
List Price:  ₹5,987.45
You save:  ₹1,710.94
RM223.53
List Price:  RM312.96
You save:  RM89.43
₦77,334.53
List Price:  ₦108,274.53
You save:  ₦30,940
₨13,894.99
List Price:  ₨19,454.09
You save:  ₨5,559.10
฿1,698.56
List Price:  ฿2,378.12
You save:  ฿679.56
₺1,761.67
List Price:  ₺2,466.48
You save:  ₺704.80
B$308.91
List Price:  B$432.51
You save:  B$123.59
R937.05
List Price:  R1,311.95
You save:  R374.89
Лв93.87
List Price:  Лв131.43
You save:  Лв37.55
₩73,727.16
List Price:  ₩103,223.92
You save:  ₩29,496.76
₪183.78
List Price:  ₪257.31
You save:  ₪73.52
₱2,908.86
List Price:  ₱4,072.64
You save:  ₱1,163.78
¥7,886.37
List Price:  ¥11,041.56
You save:  ¥3,155.18
MX$1,014.45
List Price:  MX$1,420.31
You save:  MX$405.86
QR181.52
List Price:  QR254.14
You save:  QR72.62
P693.18
List Price:  P970.50
You save:  P277.32
KSh6,473.70
List Price:  KSh9,063.70
You save:  KSh2,590
E£2,543.38
List Price:  E£3,560.93
You save:  E£1,017.55
ብር6,354.82
List Price:  ብር8,897.26
You save:  ብር2,542.43
Kz45,590.88
List Price:  Kz63,830.88
You save:  Kz18,240
CLP$49,508.59
List Price:  CLP$69,315.99
You save:  CLP$19,807.40
CN¥364.88
List Price:  CN¥510.87
You save:  CN¥145.98
RD$3,040.26
List Price:  RD$4,256.60
You save:  RD$1,216.34
DA6,777.76
List Price:  DA9,489.42
You save:  DA2,711.65
FJ$116.09
List Price:  FJ$162.54
You save:  FJ$46.44
Q384.44
List Price:  Q538.26
You save:  Q153.81
GY$10,442.13
List Price:  GY$14,619.81
You save:  GY$4,177.68
ISK kr6,962.10
List Price:  ISK kr9,747.50
You save:  ISK kr2,785.40
DH503.31
List Price:  DH704.68
You save:  DH201.36
L921.81
List Price:  L1,290.62
You save:  L368.80
ден2,953.39
List Price:  ден4,134.98
You save:  ден1,181.59
MOP$399.27
List Price:  MOP$559.01
You save:  MOP$159.74
N$928.04
List Price:  N$1,299.34
You save:  N$371.29
C$1,836.54
List Price:  C$2,571.30
You save:  C$734.76
रु6,797.51
List Price:  रु9,517.06
You save:  रु2,719.54
S/185.85
List Price:  S/260.20
You save:  S/74.35
K202.57
List Price:  K283.61
You save:  K81.04
SAR187.72
List Price:  SAR262.83
You save:  SAR75.10
ZK1,381.27
List Price:  ZK1,933.89
You save:  ZK552.61
L238.77
List Price:  L334.30
You save:  L95.52
Kč1,207.81
List Price:  Kč1,691.03
You save:  Kč483.22
Ft19,726.55
List Price:  Ft27,618.75
You save:  Ft7,892.20
SEK kr550.10
List Price:  SEK kr770.19
You save:  SEK kr220.08
ARS$51,427.46
List Price:  ARS$72,002.55
You save:  ARS$20,575.09
Bs344.88
List Price:  Bs482.86
You save:  Bs137.98
COP$220,435.64
List Price:  COP$308,627.53
You save:  COP$88,191.89
₡25,341.90
List Price:  ₡35,480.70
You save:  ₡10,138.79
L1,268.10
List Price:  L1,775.44
You save:  L507.34
₲389,249.85
List Price:  ₲544,980.94
You save:  ₲155,731.08
$U2,221.61
List Price:  $U3,110.44
You save:  $U888.82
zł205.13
List Price:  zł287.20
You save:  zł82.07
Already have an account? Log In

Transcript

Welcome back to my fifth video. Actually for Azure Machine Learning Studio, Microsoft Azure Machine Learning Studio. This is our fifth class for, you know, for machine learning. So in this, so so we in the previous let's recap, let's have some recap. It's been a very, I mean, I did a study when I performed on the external, so today is a completely new day. So let's do some recap here for what we did is we have this four, but we actually do go into the reason it's showing two because we have four have debt services for that.

If you look at it, you have these two prediction experiments. So one is prediction and when we expand that make more sense Yeah. So we have this we did income prediction. And then web services, we deployed a web services in income prediction with predictive in experiment. And then we did auto buy price prediction that we did you know, that kind of tutorial. And also like Windows, as you already know, you have this document and this document will take some time.

So you have this document where you can view this tutorials for machine learning, which you can perform your machine learning actually. He here I'm not performing Actually, today what I wanted to do, but it's not running for me probably because my internet is slow or the data is too large a toddler data, right? So the data find me something. So we have done this, right? We have predicted a price and we clean the data, we clean the data. Most of this if you're performing any prediction, I would assume that you have to be in the data most of time to clean the data of scrubbing the data or whatever.

It's up to you the scrubbing instance, let's say, let's assume you have, you know, a partition between let's say, let's stick stick. There's a name, which has little Do you have this module, but is always uppercase and the wrong one is always uppercase. So you're in the data scrubbing, right? So that is something again, that's a cleaning of data, but that's not possible with this you want to use probably you want to use a SQL query. When you already have the sequel, remember, that will show up only when when I open experiment here I cannot open I will show you actually. So that data scrubbing through you can also do data scrubbing that we didn't discuss actually.

So, um, you know, so we, we performed this linear regression method algorithm, linear regression that was, you know, will perform and oh, we actually split the data in point five, and we didn't put any seed random seed we just put it on default level and we perform this predicted value. So this is already there, and we got this result this graph and the value again, this no we validate the model and you know, we got this and there is one more Actually, I wanted to show you this tutorial. But unfortunately it's not working too bad. Too bad from my side. Actually, not that it's not working, it's working but it's taking too long then it's showing us like 79 KB of data, but for running it's 79 kB it's like digging I mean more than five minutes 10 minutes I don't know. It was like five minutes was running so you cannot push five minutes right?

Or probably there is some data mismatch with that. So probably you can try it try this tutorial you have this document right. We will perform another tutorial you know they are using this our script probably no, we like to use that one but let's see and see how we can do that. So yeah, this was this was a tutorial actually wanted to show you how the How we can use an algorithm. So here we use a lot of things here you see here you know use a some speed data of course. And then this is this is common.

Right here use the Eric meta data, you want to edit meta data. Because the data you want to put what the columns metadata, right, so what is metadata data of data? So data data is like columns, you know, the space college fees or something, you know, you want to add those to the reason you're using. So this model, then you're training this model into school model and in you're evaluating it. So see, this is the conflict within the complex process. Here you're have to our script.

And you're running this in analyzing the data and claiming it. You're trying to tomorrow's hearing in invalid into the results pretty complex scenario This is, but nevertheless, we will, what we'll do is just deploying this web services, we will do that right. Since this gentleman credit card is actually not working so. Okay, so what we'll do today is well, you know, what, guess what, what were you doing here? We'll take some of the samples. So this is for your data set.

So what are the data sets? And yeah, so we know what we're doing here. We will Yeah, we will try to see an example what is there already there in this you know, so so you will have this you know, Not to worry. So but if you have any questions you can reach out to me. So that's an issue. So here, what we do here is we already have some of the data sets.

As they told you go to this experiment section, you have this project, which is to see here we have imported to project site. We have this income prediction with income predictive, that's web services when we deploy it, because the flow chart actually completely changes. So there's the reason you have this, that prediction is completely different. And we train two models, the income prediction and see it's it's don't have to be income, you don't have to be prime, it can be any column which you want to predict. Okay, that's how it was. I did as a tutorial point of view.

But it don't have to be a price. You don't have to be just income, it could be age, it could be martyr. It could be if you're taking automobile, a medical We make it could be engine size engine make, you know, you can predict in almost anything and just the amount of algorithms you have you you can do as much experiment as possible. I mean, you don't have to stick to a linear regression, you've seen a lot of methods. So I'll show you once again to lead to understanding. So we have done this transformations transformations are very important.

So this confirmation says more of the data was pretty dirty, but you have to use a machine learning to that's pretty interesting because this is somehow related to my ETL because I'm a good dispatcher. Although I didn't use Informatica and didn't use this that's like in Informatica you might have this map let's it's kind of map lets you know this was this flowchart right the way you're we are using this method that's called MapReduce Informatica. So pretty interesting less. We use similar process industry. So then you ran that experiment. Okay, so that's how we do so let's do today's class.

What we'll do here is, um, that was pretty easy. If we go back, we did automatic addition. Right? Let's let's go back and see. So, if you're not understood, you know, you can have some. So we took our raw data.

So this has brought it up, which you can visualize the data. And I told you, you don't have to stick to price, I have done price for the tutorial point of view. Or you won't have to do that, if you can use any of this prediction. So this is prediction how prediction will predict horsepower, you can predict ino. So this is interesting. You can predict also make, but problem here is the make isn't straight.

Feature. Now, you want to see this, this is a string, this is not something which is serious, this becomes really complex in price either a numerical feature, and if you want to predict things like strength feature, and that's a different story andriana. Now, although you can do that thing that says, you know, you can still do that, but it's a little bit different thing, different process to do that. So. So what we did is we predicted price and we've performed an operation and clean the missing price, we removed the entire row. And then what we did is we selected the columns which are required here, I mean, I could, I mean, I don't need this one.

Technically, I can use all those but these are the Started things what I need when I'm doing it. So that's the reason and then split the data train the model using linear regression. So, here you can use any kind of model, right not just linear model, we can use anomaly detection, you can use based on classification. We use this classification, right? None of this for income prediction. You can use clustering, you really want k means clustering, you can use regression, I'm sorry, we used to mean reversing the previous to your boosted regression boosted decision tree.

So these are algorithms to predict. You know, the, these are the, this is the model, you lead them and you train this model. On this, you can do one column at a time. So if you open the second one, probably Do you want to use this one and, you know, use something like this, and you know you have the gene. So this was kind of who you are. And actually, you can clean the model and go the score model and see how this prediction will visualize it again, how our credit Amina, the price score levels, you have generated a new column, and it predicted a score new score built on this President's price.

And in a predictive model, so here in this scenario, right, you can use automatic operations to remove this database and you row one with the digits, you can also remove that. And you can do that, you know, you have this transformation, you can do that. But we didn't do that made it simple. So it's completely it's kind of its kind of requirement. What the You want and how do you want to do so that's the whole point. So, and then we evaluated it.

So evaluation is like graph and you've got this mean absolute error, which is very important. And this is what probably you can publish many papers on this. This is you know, interesting thing you can publish many paper probably I'm also going to, you know, moving, I'll be publishing a few of the papers under so this is pretty interesting action. So this was on the regression model fine. This is good. This was a little bit of recap of this work, okay.

You know, I you know, visual, this you can create a lot of predictions, okay. So it would be really good. You know if you should know this, so let's go to new and I will check it out. A few of this. We'll go to all of this and probably we'll close this out, probably also want to work on Notepad. Right?

And we're bred to be honest, I didn't realize this, but we'll see know what we can do on that. So, let's see here. I've tried this one actually view, actually, you know, we move use data set processing and analysis. Our inputs are two inputs regression. So let's open that in to our model. I'm in the studio and see what is the soundbar.

So that's me my will see here in order to take some time, okay. So this experiment, okay. So this is complex, because this contains a lot of things here. Okay. Let's find out. input data.

Let's be on You cannot visualize just very interesting now but well we'll see what is this all about import data that means they are importing data from this URL. That's interesting. So you can do that too and see what are you think this gets downloaded? Let's see what's the data here? I guess I can open it in dark, dark data format, so I can open it in. Gross, you know, production so.

So this is my data. Not my data. Okay. So we Okay, great. So this is the that's what the reading is. Okay, great.

So okay, good. Um, so here, we're what's happening is they're taking the data from a different source. They have From source and data format, CSV, so probably different to convert it to something. Okay, so let's see next what we're what they're doing is they're editing the metadata. Let's see where they're trying to edit the metadata. So in the column one, so they're selected in column one and the data type, they're making it into categories.

So we can visualize, right? We can visualize, okay, so we're let's do this. No, this part won't be explained this with data because if you will not understand this flow chart, if you're not looking into data simple enough, right. If you want to understand this flow chart, you want to look into data because what you are trying to do is important here what we are trying to do here, there, put some information that's good. We'll run this because I cannot visualize Because we are outlining this in the draft, so let's run it are this running will take some time for what we are doing is we're using someone else. Someone else flowchart and trying to visualize and trying to view that method.

And using this probably you can, you know, manipulate this and you know, do your own, you know, research under so you can do a lot more test. Using this, you don't have to invent new thing. You don't have to go ahead and create some new things using this. You can do a lot more things, but first, you need to understand what they're trying to do here. They're what they're trying to do is data set processing and analysis. So that's they're going to they're doing processing in our s analysis.

Let's find out Here we were not able to see the data who will this list is running, it will take some time I guess it takes like around one minute to run. So, one minute or two minutes. So meanwhile we'll just see here we can visualize this and see what we are trying to do here. Okay, so this Gilson a little bit idea. So what you know what we are trying to do here? Okay.

So they are doing the data analysis, so they're predicting data, okay. And using a probability function evaluation, they are actually selecting a particular column based on the requirement and they're generating that unique Coalition's let's find out Okay, so this has successfully this successfully ran she just checkmark, green tick, tick mark. Can we hide this from the year Okay, good. I want to make some space here because we look into this flow chart. Okay, great. Now let's visualize it great.

Now I can visualize it. No, you can visualize because it has the run, okay? So you have to notify roles and 36 columns, that's fine Sousa raw data. So that means they have data. of some data. This they don't didn't name this column properly, they only have this column one, column two, column three is kind of things.

But we don't know what exactly the you know, they're supposed to name this proper. Probably that's the reason they had this meter data from you check it out. What's the major data they are doing that okay? So this column one we don't know what is this three one code some code and column two. We don't know which 1643 number three number, column three. Okay, looks like make Make up an automobile fine and column four is like I would say it's a gas or diesel or what is Indian type kind of Indian type you know cannon fight so we don't know exactly what's going on here right as so this is just our data so let's find out oh I'm sorry let's let's hear they have made easy for let's pick on this this is a numeric feature okay.

So this is a numeric features and this is a string of course. So these are all you know, either numeric or string. This will be a numeric or most probably all will be numeric. And string. String sense if you have this alphabet is considering a fair number 10 is a new maintenance Good. Now let's see what they're trying to do this edit method.

So they are changing, they have this description that's good. You know this really they done really good job in putting the description I'm not put that in our previous class. But they have done a good job of data so you can view here. So what they're doing in this metadata is there changing the data type of column one, two integer making it categorical and make a name it a symbol, okay, let's visualize them. Let's try to see what they're going to do here. So I'll show you what they're under.

They're making it from numerical to categorical. So as we've seen this, okay, good column name they are, they've changed to symbolizing that for the metadata you do, right? What are you doing meta metadata, which is the columns you can change data of data, you know, that's what you can do. So they're doing Eric with an ADA. So that's what the metadata is so here if you see this they're chained into numerical to category or feature so probably Amazon the strings remain same. We don't change the string, I guess for only for one column they've done the trade only for Let's find out.

Yeah, column what you see here column now and they made the chain into column A symbol I symbol. Okay, try that. You can have one more minute it Oh, yeah. So let's go to the third floor. Again, they are changing this meta data, that's fine in here they're doing I don't know. Are they been doing the second one, they're using a third block for column two and column six there from integer than making it non category.

The reason they're doing it because let's find out to clean the data, because probably there's some data mismatch or there is no there is no data, right? See here, the missing data, it's good. So that's the reason they are changing this column two and column 26, the Geneva price that score. That's a good example. Actually, this is a good example of, I mean, I probably I hope you're trying to understand what I'm trying to say here. We are trying to change this metadata and we are doing this changing this metadata.

From New you know, here you're This is numeric feature, again, but this will be a categorical feature because they are changing the criteria. Here there is a missing data. So In this, we don't know what kind of predefined content, they've changed into Price column 2613 into play price. And this is a numerical feature. That's good. So that's what they did in this third block.

Let's find out the other lastly of other blocks to view here. So they are cleaning this data. Let's see what they're trying to be. Oh, so remember, we have done this cleaning work before, there was some Miss missing values. Remember, there's the missing values. Let's realize it again.

If you missed that word that you see here, there's a missing unless you don't have this. So what they're trying to do in this is cleaning and replacing with median either you can remove it or replace with medium, median, okay, so they will most probably remove it. Okay, and which numeric columns, all columns are there claiming it for this? Good, that's really interesting. Now they are performing this into three different varieties. So they're doing four.

So this is very interesting. So they're doing three models here you have this three mod different model. This is one model, this is second model. And this is third model. Okay, great. Right?

Does it make sense because they're doing it three cleaning, three different types of cleaning. One they're trying to do for custom substitution, replacing with zero. And the other one they're using is implicit media and removing the columns That's very interesting. Third, they're doing is replacing using probabilistic or all that that's really interesting point. So, now the whole point of, for me to understand this flow is they are using three different models. There are three different cleaning types and trying to analyze this data and they're not merging, basically, I don't think they're merging anything.

But they're trying to understand what's happening with the data. After doing the cleaning of three different types of cleaning, one is one type of cleaning your scene is substitute with zero values. Second is with a median. Third is replacing with completed probability. Right? That's very interesting.

I mean, I have not seen such kind of, but this this is really interesting actually, because you know, the bay they're trying to they're trying to analyze the data by cleaning it into three different ways. I end up heard of such a thing? Okay. That's, that's a really very interesting perspective actually do view this. So you can do that. Probably take it as homework, do it as homework, your homework is, you know, try, though, three different you have this models, remember, linear regression based in a Bayesian linear regression, Poisson linear relation, do this as your homework, you know, use three different, you know, use a similar model, use three different regression, right?

And evaluate the model with the same example that I've given to that homework. Okay, that's your homework, which is very interesting to you. To be honest, I have not seen this before. I'm just looking right, right now. So that's the home Look for you use three different models algorithms sorry, what I meant is algorithm three different random use as a linear regression has shown you right clustering. We will do more with clustering methods at this point of time, unless you know what you're doing.

Use linear regression use Bayesian linear regression, or probably you can use three different linear regression and assign a different value and evaluate the model and see the cream the model first fostering the martyr using three different things. And I you probably you can use three different methods of cleaning the data, and then train the model and scorecard, get the scorecard and evaluate the model. Do that homework, okay. So that that will be a good homework for you. When that should be simple, just you are actually doing two to three different Same thing with the three different In view of perspective, and trying to see what's happening, okay, okay, let's go. Let's go back to here.

Here, one cleaning them done. And this summarize the data. Let's visualize what's done here. So that's summarized here. So, okay, since they have been met into Mean, Median mode, I'm sorry, missing all values with zeros. So see what I'm done.

So this is summarizing, I mean, there's nothing to do with and I have just added up the whole thing is zero Max is 256. mean is a 97, may mean deviation and this all those things. So this is kind of, you know, they're trying to find out the whole Mean, Median mode and view the Okay, that's fine. It's summarizing the data from the database. Okay? Now what we're trying to do is evaluate the so this is a different thing altogether has nothing to do with anything, actually. But just summarizing and view the data.

How does the data looks like after the median mode? That's fine. Okay, Rob, evaluate probability function with the mean is 12946. Very interesting. Let's find out what is this? So mean that is in 120 949?

Standard deviation there is an 8079. I'm not exactly sure how this they came up with this number are they use some random number? This is something really, really interesting. I have no idea how they're probably they use some random value, or maybe not. But let's visualize it. Can you visualize understand something we can visualize it so Yeah, so what's happening is when they are doing is mean median, I'm sorry, mean, and standard deviation.

This is what they're getting one to 949 and the standard deviation of 807. So this is independent action. After cleaning the data houses after cleaning the data of missing values with zero, this is what they're getting. That's very interesting perspective. So let's go to number that is separately completely different, nothing to do with anything. Let's see here.

Now we visualize keep numeric columns only. Only, so you will have data which is a numeric so any string array will not have here, or that's fair enough. I don't see anything Just images string I need to see this may make an algo sticks the desert I Indian tidiness sighs all this is gone because they are selecting that particular only the numeric data fair enough here they are compute the linear correlation on the you know trying to correlate and get this data right this is sounds fun but what is missing here there's something missing Can anyone tell me what is missing here? All right. So, um there is of course, obviously there is something missing here. I add it up here something and show you are probably my work it may not work I've not tried that before.

Okay, so, let's go to the second you They are clean using the clean the data using replacing with median and we can you get result. And that result is will be completely different because the change did with mean median mode so mean Max, I'm sorry minimum max me, I mean deviation and median and mode is 115 Henry. So, you got this completely different thing because the way of cleaning the system was different, because you know, they are cleaning it with substituting with numeric values. Yeah, they're substituting with zero here they're substituting with some Mean, Median values very interesting. And then they are keep only numeric columns. Fair enough.

Compute the basic statistics statistics will be different here. Let's visualize it. So, this will be completely different because the way they clean the data was different so and then they compute the linear and they compute the linear correlation. So the data here is completely different, you know, are the current pricing native? Wow. Imagine that getting price you need you.

Okay? That's something odd. So, so here, this is a second floor. Let's go to the third floor. Now in the third mode, they're cleaning the different value with probabilistic. Okay, and summarizing this data will be completely different.

Let's visualize the data. Here, it's still there is missing out on this action. So you have the the values will be different because the way they are missing is the cleaning the data is using probabilistic, right. And they're selecting the numeric. So obviously, only numeric data, and they're computing linear correlation. Let's visualize it.

So all the three values will be different all together. Now there is something missing here, right? So what's missing here is let's find out our ad. what not, I'll try to evaluate this model. All right, we evaluate, try to do this. I am not sure.

This is an experiment. Of course, this is an experiment. So I'm not sure this will work or not. I'm being in front Have you, this is my first time. So if it goes wrong, I'm sorry. Because I have not done any practice on this.

So if it goes wrong, I'm really sorry. So what we're trying to do is we are evaluating it and run it. Let's run this thing. So this will run this flow as it will not run everything, it just runs. I hope it runs quickly. It's really great.

So this Yeah, it doesn't go through everything you do leave all you want and we are only looking into one actually, I can do that evaluate model for all the three. But for now, we are just looking at the first method of C dislike data set processing in an analysis. I'm sorry, analysis. So the scene analysis They're doing into three different cleaning method. This very interesting thing actually. So for your homework, I told you your homework we do this do the same example.

Income prediction no price prediction for the three different and you don't have to do this three different cleaning method do only one cleaning method but distinct phage ah that's what I thought Oh, it's so we just call that model on me Sorry. How? So we cannot be here. So, you cannot score this model industry. So you cannot evaluate this. So graph I wanted some graph here.

So, this is work should not trade. So, you can only do this for regression models, not for linear models, okay. So, let's find out we train this model okay. So we train this model Using this option A will we go to over? Or very simple thing, which is linear regression. And let's find out all right.

So training is smarter and smarter. Oh, so we want this data set right. So we run the data sets that will take it from here. Alright, good. So then this becomes really complex here. So we are printing this for, let's say price.

Since we are price, we can only do this once. So let's run this now, this will be a pretty complicated thing. So, um, so what I'm trying to do here is I'm trying to get a graph, technically, what I'm trying to do is I'm going to get a graph from this, there is no graph, if you see here. So what's happening is you don't have to train a model. I have trained a model. So they have done this good, this is good, everything is good.

I want to get a graph for their cleaning method that they use for missing and I can do this for all three things. And I can get three different graphs and I want to see the visualization of the graph. The root mean squared remember that I'm getting two sets of things in four to five things. And I want to visualize that and see that how does a graph looks like and are our to see that error as all the errors and accuracy, murder and all those things. So probably might take some time. So I cannot take the phone here, right.

So we use the same motor here. Again, I think this, this also has this nice to see the problem was, it has na n value So the problem is the columns are really not good. That's a problem. So if I visualize it, let's visualize it. So the problem is, the columns should be no na and I hope they know that we created some problem, okay? Let's regenerate.

I hope this should not do any problem, but if it does, then probably, we can create this module. That's what I can say. using linear regression, of course doesn't work. So we see this, if this works for this, then it works for all the other do so and I'll get to different graphs for an hour. So what I'm looking at is I'm trying to get the graph actually for this everything looks good. There, this method of three different so this is an experiment for this is an experiment, everything is an experiment here.

So in this experiment I'm going to create a graph and this model is still running ah finally, my model has been successfully trained. Let's find out I will visualize it sorry for that because this is my first time to do this. So okay, now I got this graph. So this is a frequency. So this is what I was looking for error and the mean absolute error and root mean squared values error and I was trying to get disabled so I so what I did is I trained model of this cleaning method, I can do that same thing for two different things. Okay?

I did only for price, because I use this clean data set because this has nothing to do with anything. Because this is a different thing altogether actually has nothing to do. It's pretty complex, so challenging. Okay, so what I did is I trained a model based on the cleaning method, they used different cleaning method for three different there are three different cleaning weather and the computer direct correlation and it's nothing to do with anything. What I did is I trained the price, not just price, I can train anything, what's data here, but unfortunately, these columns there, they don't have this proper college. That's the worst part.

Let's do since the price is common this. Okay, guess what? I will I won't train or I can, at this point of time we can only train a numeric value. So this prices and you might feature it's called 23, column 27. Let's try to do column 25. We did price, that's fine.

Let's go to listen. carnem gratified and go ahead and click one and evaluate this and run this. Alright, so I can do this for a string would be a little bit complex. Not done with a string string that really will take hours for me for to see how the thing works for the string, probably you might also train for the string too. It's not a big deal. But the way of algorithm that should be used will be different.

So let's see that graph for column number 25. I don't know what's that column because they just put the column didn't put the name properly. So let's find out. So the similar functionality you can do this for this cleaning method of there, okay, so that's great. We have run it successfully. And it's a good news.

Okay, let's visualize it. So here you got to mean absolute error, find 715244 group. Mean Squared. So what I did is I trained a model for column number 25. And this is what I got this graph and histogram. And guess what?

I don't have to be that you can get a different graph based on different models. So, okay, and Okay, so let's keep it simple. I love do four other two, the other two are also there to do this. Probably you can do it. If you know what you're trying to do. What I tried to do here is based on this scenario, what do you already have the sample I have craned a numeric value with a for price we have seen we, we have done for counting number 25.

Now let's run it the complete model. This will take more time but let's run this one. So this is pretty interesting actually have not done this before. But this will run the whole call. The flow chart fully directs. This is similar to ETL process actually, but only thing is difference is the way we're actually doing in machine learning.

This is machine learning. And this is good thing is it's free now as your machine learning. I assume after viewing my videos, you publish some papers here, okay? Because I'm not sure about you. But I've learned a lot here and I'll be probably publishing papers on this. This machine learning is sprit interesting.

There's so many examples that you can perform. You can use our script, you know, you can do a lot more things. My recommendation for you is To publish a paper on this, okay, I have many papers that are published in the machine learning, I have not done many, but I will be doing this. So I'm it's, I'm expecting you also probably published few of the papers. And there's a lot of few solera journals that you allow you to publish papers on. So this should be free.

So you can use your experiment and try to show the results that what you're doing and what the graph your greetings. Okay, good. So we have successfully ran this experiment and everything looks good. We will deploy this deploy this as a web service, predictive web service. And this is working. So here I'm creating your predictive experiment and This will take some time because the switch are displaying big oops, it's automatically changing you see here Oh, lucky me we didn't got any error.

Very fortunate to see that I thought we were running into again. Okay, great. Um so oh, okay and we deploy this run that and predict you okay. There is an error, that sort of okay. So we will you know, this is, this is good just reading to run this. So, you know, it is pretty interesting to see this model of data set passing.

And I've just trained this model and we deploy this as predicted experiments. This rule applies to web services, and we'll see this if I hope it works. Let's find out. Okay. Great. So let's test it.

This is the application has we've seen this API key, and let's test it. Unfortunately, there is everything is zero. So okay, So that's pretty much it. This is this takes a lot of time not sure. Okay. Okay, so this is a web services where you actually have this, totally don't need this at this point in time, but this is for destination.

And you can download this as a joke. You can download it and view this. And you know, and you can see your experiments right here. So you've got two experiments now, sample two data set. One is for the web services. And the other is for data.

For web services, I'm sorry, one is for predictive experiments. Another is for creating experiment. Right. So that's pretty much You are I'm showing you the linear regression because now that I only know that we can use Bayes, Bayesian Bayesian linear method or, you know, poison method is allowed them to actually seem to miss again. Oh, yeah, here you go. So here we have this, a Bayesian linear regression method, you can use Bayesian linear and do that as your homework.

Okay? For the same experiment what we performed for this price, do that only for price, but for the three different use the same cleaning, no, use that same and use three different methods and evaluate and train that and evaluate it. Get the graph. Okay. So that's pretty much for my site for this was for the sample to what we added new thing was we use a linear regression, we didn't try to modify it and we made a web services out of that. Will you do that for next experiment, probably to move earlier.

I also wanted to see this for the note book. So if I have time, I will do that too. So probably do more experiments, we can view it. And the rest I leave it to you, your creativity. There is a lot of so many algorithms you can perform a lot more things by just all using the samples. These are the samples I use those samples.

I have not invented something new from nowhere and put it here. You can do that too. We did that for one of those experiment. Remember that was four To go to point of view, but this was a completely different thing, which I've used the data set from some other people who have already created and I've trained the model for more on this cleaning method of one of the cleaning methods, those three different cleaning with it might may use 10 different cleaning methods and analyze the data, you can do that, you know, we can publish a paper on that one. So, I expecting that to be because this is pretty interesting thing. So, I hope you gained some knowledge on Azure Machine Learning for this experiment, this is data processing and analysis, okay, this is sample number two data processing analysis.

So, you know, I end this class, for this one, we go to the next class, we know in the next lecture Use a different data set. And it's all about data. It's all about data data. And we will view that and see I am cranium modeling that probably I use a different model, or we use too much of linear regression. Probably not use that anymore. Probably I'll use a different thing.

So and you know, and get the web services and deploy that web services. And so, you know, that's and deploy the web services and, you know, get the results. So that's what we learn, right. So that's pretty much it. I hope you liked this video, and thanks for watching.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.