Cross Validation for Regression (Sample-2)

Microsoft Azure Machine Learning Studio Microsoft Azure Machine Learning Studio
27 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$49.99
List Price:  $69.99
You save:  $20
€47.98
List Price:  €67.18
You save:  €19.19
£39.89
List Price:  £55.85
You save:  £15.96
CA$70.14
List Price:  CA$98.21
You save:  CA$28.06
A$76.87
List Price:  A$107.62
You save:  A$30.75
S$67.31
List Price:  S$94.24
You save:  S$26.93
HK$389.14
List Price:  HK$544.83
You save:  HK$155.68
CHF 44.67
List Price:  CHF 62.54
You save:  CHF 17.87
NOK kr553.51
List Price:  NOK kr774.97
You save:  NOK kr221.45
DKK kr357.84
List Price:  DKK kr501
You save:  DKK kr143.16
NZ$85.68
List Price:  NZ$119.95
You save:  NZ$34.27
د.إ183.61
List Price:  د.إ257.07
You save:  د.إ73.46
৳5,972.22
List Price:  ৳8,361.58
You save:  ৳2,389.36
₹4,221.07
List Price:  ₹5,909.84
You save:  ₹1,688.76
RM223.35
List Price:  RM312.71
You save:  RM89.36
₦84,627.22
List Price:  ₦118,484.88
You save:  ₦33,857.66
₨13,887.22
List Price:  ₨19,443.22
You save:  ₨5,556
฿1,722.96
List Price:  ฿2,412.28
You save:  ฿689.32
₺1,727.27
List Price:  ₺2,418.32
You save:  ₺691.05
B$289.99
List Price:  B$406.01
You save:  B$116.02
R907.60
List Price:  R1,270.71
You save:  R363.11
Лв93.82
List Price:  Лв131.35
You save:  Лв37.53
₩70,211.45
List Price:  ₩98,301.65
You save:  ₩28,090.20
₪185.71
List Price:  ₪260.01
You save:  ₪74.30
₱2,946.36
List Price:  ₱4,125.14
You save:  ₱1,178.78
¥7,736.95
List Price:  ¥10,832.35
You save:  ¥3,095.40
MX$1,021.22
List Price:  MX$1,429.79
You save:  MX$408.57
QR182.26
List Price:  QR255.18
You save:  QR72.92
P683.46
List Price:  P956.90
You save:  P273.44
KSh6,472.14
List Price:  KSh9,061.51
You save:  KSh2,589.37
E£2,482.01
List Price:  E£3,475.01
You save:  E£993
ብር6,118.22
List Price:  ብር8,566
You save:  ብር2,447.77
Kz45,640.87
List Price:  Kz63,900.87
You save:  Kz18,260
CLP$48,886.48
List Price:  CLP$68,444.99
You save:  CLP$19,558.50
CN¥362.07
List Price:  CN¥506.93
You save:  CN¥144.86
RD$3,012.01
List Price:  RD$4,217.06
You save:  RD$1,205.04
DA6,712.40
List Price:  DA9,397.90
You save:  DA2,685.50
FJ$113.77
List Price:  FJ$159.29
You save:  FJ$45.51
Q385.78
List Price:  Q540.13
You save:  Q154.34
GY$10,455.70
List Price:  GY$14,638.82
You save:  GY$4,183.11
ISK kr6,980.70
List Price:  ISK kr9,773.54
You save:  ISK kr2,792.83
DH502.76
List Price:  DH703.91
You save:  DH201.14
L910.90
List Price:  L1,275.33
You save:  L364.43
ден2,951.80
List Price:  ден4,132.76
You save:  ден1,180.95
MOP$400.70
List Price:  MOP$561.01
You save:  MOP$160.31
N$906.31
List Price:  N$1,268.91
You save:  N$362.60
C$1,838.97
List Price:  C$2,574.70
You save:  C$735.73
रु6,749.45
List Price:  रु9,449.77
You save:  रु2,700.32
S/189.51
List Price:  S/265.32
You save:  S/75.81
K201.21
List Price:  K281.71
You save:  K80.50
SAR187.68
List Price:  SAR262.77
You save:  SAR75.08
ZK1,382
List Price:  ZK1,934.92
You save:  ZK552.91
L238.86
List Price:  L334.42
You save:  L95.56
Kč1,216.06
List Price:  Kč1,702.59
You save:  Kč486.52
Ft19,746.05
List Price:  Ft27,646.05
You save:  Ft7,900
SEK kr551.69
List Price:  SEK kr772.42
You save:  SEK kr220.72
ARS$50,176.71
List Price:  ARS$70,251.41
You save:  ARS$20,074.70
Bs345.34
List Price:  Bs483.50
You save:  Bs138.16
COP$219,443.60
List Price:  COP$307,238.59
You save:  COP$87,794.99
₡25,456.77
List Price:  ₡35,641.51
You save:  ₡10,184.74
L1,262.95
List Price:  L1,768.23
You save:  L505.28
₲390,155.46
List Price:  ₲546,248.87
You save:  ₲156,093.40
$U2,130.69
List Price:  $U2,983.14
You save:  $U852.45
zł208.13
List Price:  zł291.40
You save:  zł83.27
Already have an account? Log In

Transcript

Come back for the sixth class for the sixth lecture for Microsoft Azure Machine Learning Studio. So, we have So previously we know we have seen this data set passing an array, sorry, dataset processing and an analysis analysis. Here, we actually seen this and trained a model using linear regression. For everything we are training the model. And so that's what we are doing, actually. But being is that's not the point.

The point is this, this became a little more complex. So we'll just revise it, we'll just revise it and go with the new model. They had this cleaning of three different types of cleaning of data, they clean the data, when they use data is substituting with zeros under Moving the columns and this next study the median thing. And then probably, what I did is I just used that data to train a model using linear regression algorithm and generated a score model and evaluated the model and saw the graph. That's what I need. And I kept this constant, I didn't change anything.

So that's what we did. We can do a lot more than this. But we used one of the sample which was there which were there in the actually in the market. So let's see use another sample. Probably. No, guess what probably I am assuming you will get used to this Asia Machine Learning Studio, and you will be using this and another homework.

So homework Number one, user one for October my price prediction for the three different I guided them. Okay. Any algorithm maturity regression or any linear regression or Bayesian or poison, it's up to you using this we'll use this one right using that method. humbug number two, you Sorry, I actually had to actually. Okay. Um Okay, so I've lost this homework number two, so no Miko, so this first homework, you know, use this one the next you know we'll see how it works.

So we'll use one more model. Oh yeah, good homework number to publish a paper, publish a paper on any of these, whichever you like. There's a lot of samples here. A lot of symbols use neural networks to their own can publish a paper. We don't have to put my name that's completely You know, I'm just using to train you. So you know you don't have to put any of my name but I would like you to publish a paper that says homework anywhere.

I mean, there are a lot of free journals. If you borrow money journals got the experience. He forgot the name, that's it. I'm gonna get back to those open, actually there. There's one generally is free of open. Here you can find largely free journals.

Write a paper and publish this paper in any of these journal, okay, there's search from here and, you know, do this, this should be good for you, you can predict a model three to model how this model works. And, you know, there's a lot of things here. Okay, so that was the homework number two. You don't have to put my name that's completely optional. I mean, it's not required technically. You know, but let's train another model.

And let's move to load book. And we close this class, you know. So just one more model actually. Let's see which one in how we use this. This was pretty good data set processing and analysis. They had three different training methods and very fascinating I really like that one.

Let's be honest, that was my first model to show you and for you, too. It was first and I really enjoyed the model actually. Okay. Um, let's see here. Probably go to allegation of binary classification cross validation. Okay.

So we'll use this. Let's view the same gallery actually, how does this look like and then we'll go to this one has four likes really interesting. Okay, let's go. So we go to this. This has four likes, and probably not that doesn't mean it is good, but cross validation for regression. Let's open this industry.

Okay. Let's see how does this thing worked out. So this is a third scenario for us. Don't report, but it will give you some brief idea for your paper rich people publication. I really I wanted to publish a paper. So, okay, so did did this already.

Okay, this is a little bit different. So this is similar thing with like your homework kind of thing. They have this data visualize we can visualize this. Then again there's something missing here. Can anyone tell what's missing here? No, actually, I don't think anything is missing.

Let's do run this model. First thing. Let's run this model will take some time. And then we'll visualize it and try to see what's what's going on. First, run this model so that these no errors They will actually cross validation for admission. So they use the same data actually we were never used I also know what if you want a different data Oh, I used that one.

You have this different data actually. Or you can use healthcare data if you don't like automobile use healthcare data. Just the Excel is not my Excel is not good. So, so you can also do that input data you have this option of input and that input the data into CSV decent dot CSV file. And okay, good. This thing has run successfully but my Excel is still in nobody.

Okay, let me show you first testing so Chi, so here, don't worry about it. This is my system. This file is not corrupted my system works. Okay, so you have this data for free. But Oh, is this too many rows? It's like 17,009 okay probably you can do what the columns are okay.

But I guess moves are too many parameters you can delete some of them put it over 100 okay because the, you have all the you have 10 GB of data, this is the point the running will take too much of time or remove, put only 100 probably you can remove this. From here, you can remove all this data, just delete it too much. I mean, not that you cannot use it. It's just too much to process this data for machine learning. It will take time has enough time. The seniors we know already I have probably intentionally actually use this one because I wanted this one for my experimental process to get the information.

So what you could do is just use hundred of them. And the bonus 17,000 oh my god in this deleted oops, delete this column. Okay, so you just do some mistake here is supposed to delete this column to discard this data, this will so this didn't get into that. Okay? So just use 100 of this one. And you know, you have the data, if you want, you can upload and you can data set Google data set from a local file, and you can choose from the local file, and then you can go ahead and reuse it.

The shoe Machine Learning Studio in this lecture number two and upload it. It's that simple. Okay. So you can do that. Okay. Don't like this auto number letter using too much of everything we just said why don't be us.

All right. So so let's visualize. We already know what's there. Here are the mobile data we have done that 100 times symbolizing Okay, we know price and all those things is this. That good? Cleaning missing data.

Of course, we have to clean this we See what we have done? Or they have done is they have custom substitution value replacing when they're put whichever the data is not there, they put a zero. That's very interesting. You can do that do so instead of null, you will have zero the value was now you know the difference between null and you right? Now means nothing. You can remove that, but instead of that the put zero that's very interesting because the values will be very different.

Okay? Because you have that separate row who gets added in that 03? That's a good thing. All right. Now, what's happening is they're going in linear relation, and they're doing a boosted decision tree regression and Python division. Very interesting.

Three different things. That's what I told you to do the homework right, but They already have this cross citation. So you can do the similar thing from the price thing. So now, let's visualize this cross validation model. So the three has cross validation model. And what happens here is very good.

They've got this graph, I'm sorry, little graph. So here the score levels got updated using cross validate model. Can anyone say what is missing here? there is something missing. Okay, so what we'll use here is evaluate model. You know what?

I use a different thing altogether. You know, why don't I use this evaluate probability function. I'm not sure what distance but All right. All right. So let's see what comes. This column is price.

Fair enough, we'll use price to move this column in use price here and run this model. So let's see what happens. I don't think there will be an error if there is no failure. Okay, there is no error. That's good, because I know what I trying to do and what I'm trying to see. So this thing showed up for our graph, right So one column with standard deviation.

So this thing for me, it's not working. So what I'll do is, I'll go back to evaluate model. So So this actually why this is, you know, useful is. So to get me the mean, and the root mean square and all those things and the graph, which shows the values, actually. So it's good. We have done this, and let's visualize this was what I was looking for.

So this is what I what I was looking for him and that's one. So an error histogram has only one graph, but there's push to graph 33 graph actually, probably that's a different model. So our I will use that for tomorrow. And I go here. Sorry, this one and I can run the whole, we'll just run the whole thing actually, let's find out what's happening. So I have this three different graph of three different values and see what's happening because this has three different algorithm for the same data set.

And they're validating with odors good pretty fast. So we also knew this. So we get the frequency and the error and this is the values okay 1932243 units find out if this value will be same as farming Knowledge and strength values will not be the same. Because what's happening here is we are doing the cross validation for aggression. Right? We're trying to do this cross validation for graduation, and trying to evaluate that model and reviewing those graphs.

Okay, that makes sense. So this was a simple one, actually. But they had this 3d. So let me explain this flow chart again, people who are missed, people know a little bit confused. So what they're using, as you show it's, it's in our data, fundraising data. So data, they're cleaning the data, they are using custom substitution.

They're not removing the role they are replacing with the with value with the zero that add completely different thing altogether, okay, putting that zero use and probably a different result because removing a row will, you know is a different thing and putting a median is a different thing and putting, you know, you have a lot of options of cleaning, how the whole point is, how do you want to do this and visualize and see how it's working out, right? So this is good thing. So then they're using this regression model of online gradient descent with learning a rate of point one number of training 10 and this is my regularization. Weight is 0.001. That's very good. And boosted tree regulation they are using this parameter a single parameter, which maximum number is 20 minimum as can learning 8.2 proper number of tricks and Swing testing.

And using the spicen regression, this is the same data, they're using same data. And they're using a three different algorithms to generate the results. Okay, one, I'm sorry, we need to pass minus seven and regularization weight of one in the one and memory size for predictors and their cross validating with the price. The end of this we're using all this right? So you don't have to be price you can select 343 different things might get selected, but it has to be numeric. I don't think this works if you're using a string.

Let's try this. I don't think this would work if it is a string. But let me change this price to some string value. All right. What are you wanting to do it? So let's say me make is a stream right?

Now this is the whole thing all together. Now let's run this particular flow. This is the only flow there are three different flows, right? One is for linear regression, the other is boosted regression and the other is Poisson regression. It's see how this thing works. This would be very interesting if all it did.

This was fantastic. I did part of running in a string value. Very interesting. So it does work for string. This is a proof and let's visualize it. What happens with the graph?

Really interesting. So we did this for the strain And this thing works fantastic but the Leos score levels 2.0 are for the string values. So the value is for the stream so probably this may not be what you're looking for because value they have calculated for the stream okay so the it takes for the make but the make is like you know four hours a Mercedes Benz oddities Lamborghini that's a string value. So it's taking that train and converting this Korean I will leave that is not something which you will be looking for so we can get back to price that makes more sense for me And we'll run this whole thing again. So because the string values I assume when you start adulated little different, so it's not taking the mean actually taking the string of that particular value and converting into this code cat.

So that is a little bit different. Okay, now let's set up a service. Oops, sorry. I did something Okay, so we deployed this as a web service and probably there might be some error because I did something wrong. But still it had some values legal. But you can put values here.

Okay, and you can distribute. So as you you can download this one. Okay. So that's pretty much for this project. So we get a lot of samples, if you want we can do as a sample, but probably, I think you get you got used to it or you can use that sample, various sample and try try to use this in This was pretty simple, actually. So, um, you know, so that's pretty much it for this one, or they have they have this for clustering, probably.

I will go for clustering algorithm in the next class, and we'll see that and with that, we will end this Oh, I promise no book tonight. So I thought so you know, we we go to this new book. And so clustering is very important clustering algorithm, we go to this clustering algorithm in the next class. So let me check pretty quick how much of Divi is left still shrink due to be unused a lot of data here, a lot of data is here for me. So I'll be running low on data where we cannot use this thing was too many. I'll have to delete it if I go beyond 10 GB of data.

So that's pretty much it. For cross validation for regression that was simple, you know if you have to get going to the all these lectures, right? So probably you might go through this one too This is pretty simple. Okay. So let me go to it again people who are missed. predict your experiment will go to craning experiment.

So what happened in this model was, oops, on this side. Sorry, wrong. So, mistake, okay. So what how what happened was like we use this automobile data. And of course, cleaning of data is always required. We clean the data and use it three different Gotta dose linear regression boosted and of course, they have huge a different learning rate and deploy method and Poisson regression they use three and cross validate the model, three different models, they use partition and sample for I didn't you can use the splitting also here they use partitioning sample very interesting.

And what I added here is to evaluate the model to see the graph and those numbers. Remember the numbers I cannot visualize, have to run this initialize. So, you have to always run this to visualize so. So what happens is, you know, here you can view those data in graph actually always interested is in the information what is there in the mean root square and what The Clean data when they substitute zero, and the frequency and the error. Right. So that was pretty much if you if you've gone through this first and second class, it should be a piece of cake.

It's not a big day. Okay. So that's pretty much if you have questions you can refer refer this documentation home in action I included in PDF actually. But nevertheless, you can still go to the Azure Machine Learning Studio. And second, you can use this clip to create your first action. And you can use this one.

I've included that in the PDF document. You don't have to use this one, but you can still use it. If you have any questions you can reach out to me. Okay. There is anything that I probably you know, probably this is A good time, you know to maraniss and publish a paper this is a very good thing. Probably you can do that.

Okay, so, work on it homework number two homework. So remember your homework homework number one, do this step three, we have done work this tomo by price we can do this linear regression boosted decision tree regression algorithm and Poisson regression. Three different similar to this one not saying it will not be seen similar because there were using the splitting data and the cleaning you can use whatever cleaning method you want, and predict and the train the model and evaluate it. That's a homework number one, homework number two publisher demand this, that should be good. So, I hope you like this class. Okay.

That's pretty much in thanks for watching.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.