Finding Correlations

12 minutes
Share the link to this page
Copied
  Completed

Transcript

Hello, and welcome to this bonus video. My name is Pato Majumdar. As we have been discussing previously during the course, we are required to find the value of correlation for various analysis of mutual funds. So, for those who are not very familiar with what is correlation, this video will give you a small introduction to the same correlation is a Statistical Association. correlations are very useful because they can indicate predictive relationship that can be exploited in various analysis. correlation is synonymous to dependence and it is represented by the Greek alphabet row.

So, when we say row x y it means that the correlation between the variables x and y now we discuss the mathematical formula for correlation So, correlation row x y is equal to summation of x i minus mu x where mu x is the mean of x into y minus b y where y is the mean of y divided by n minus one into sigma x into sigma y where sigma x and sigma y are standard deviations of X and Y this can be rewritten as summation of x i minus mu x into y minus mu y divided by square root of summation of x i minus mu x whole square into y minus mu y equals word. For the sake of completeness, let us understand that mu x or mean of x is equal to one by N of summation of excise, similarly, mu of y is equal to one by N of summation of y's.

Now, correlation row x y is always between minus one and one. If it is one It means it is absolutely positive correlation if it is minus one, it means it's absolute negative correlation. If the value of rho x, y is zero, it means there is no correlation between the two variables. If the value of rho x, y is greater than zero, then we say that x and y are positively correlated is the value of rho x, y is less than zero, we say the x&y variables are negatively correlated. Now let's see how we can calculate the value of correlation using the formula that we have just discussed. So here we see the data which we'll use for the illustration, for calculation of correlation.

We have the data regarding printer cartridges and school supplies, we have data for 35 days. So you can see there are 75 data points that we have got. The first thing we will do is we will calculate the mean for each of the data that is printer cartridges and school supplies. Now to calculate the mean or Mew, we can do some of all the data points that we have for printers and divided by the Total number of data points that we have got. So, we can find the total number of data points by using the function count in Excel. So we can say count and give the range of data.

So sum divided by count should give us the value of meal. We could have found this by using the average function of Excel also, but we are using the raw functions as we have been discussing so far. I'll set the calculation mode to automatic so that the numbers get automatically calculated. Now, we have found the value of mew that is the mean. Next, as for the formula, we go on to calculate x i minus mu x. So this is x i minus the mean I anchored it today Have it now I can calculate x i minus mu x and y i minus mu y and that i minus music by just copying this formula across.

So, there we have it. Here we representing printers by x cartridges by why and school supplies buys that. Next we find x i minus mu x and y i minus nu y similarly, we'll find exci minus mu x and tau di minus meals that and lastly we will find y minus mu y into z Minus music. I will calculate these values for all the data points by just copying this formula across. So there we have it. So we have got the terms which we require for the numerator.

Now let's calculate the terms for the denominator. So we have to find xi minus mu x squared. So we say, excited by this matrix carry two, so we get that number. Now I can copy this value across to get all the values for x i minus mu x squared y minus mu y whole squared minus the music whole square. So we have that now. We're supposed to find the sum of X i minus mu x squared.

So here we find the sum. You copy this formula to get y minus mu y whole square. And some have said it minus news at all squared. Now we find the product of excite minus new x squared and Y minus Y squared. Similarly for the other two terms also. Next we need to find the square root so we use the Sq RT function Now we need to find the sum of X i minus mu x into y minus new y.

So we sum up all these numbers to get that value, similarly to find the other two sums also. So now we have all the values for the numerator and the denominator. Let's find the correlation. So for printers versus cartridge it is a versus white. So we take the value of the numerator and the value of the denominator. And then we have the correlation between printer and cartridges.

Similarly, we find the correlation between printer and does multiplies And lastly we find the correlation between cartridges and school supplies. You must remember that the correlation of the same term against itself is always one. So correlation of printer to printer cartridge to cartridge as food supplies to food supplies is always going to be equal to one. We saw that to find the correlation we had to do a number of calculations. Now fortunately, Excel provides a function called curl using which you can find the correlation in a very straightforward manner. We will see how to use this function in Excel.

Now we see how to use the current function to find correlation. So we have the same data with 75 data points. So we first put the labels we want to find the correlation between the printers cartridges and school supplies. So we say printer cartridges school supplies Sam replant on the x axis also. Now first we find the correlation between the printers and the cartridges. So we can say coral, co RR e L. The first parameter is we give all the values of the printers.

The second parameter we give all the values of the cartridges. So there we have the correlation between the printer cartridges. We can repeat the same exercise for printer and supplies. Now we can do the same for cartridges and supplies. There we have all the correlations calculated, we found that this method saves us a lot of effort. We found that using the current function finding correlation became much simpler.

However, if in a data set we have a very large number of variables, then finding correlation for each pair of variables can be very tedious using the current function. Fortunately, Excel provides something known as data analysis tool pack using which we can find correlations. there let's see how to use data analysis toolpak to find correlations. Now, use data analysis toolpak to calculate correlation data analysis toolpak and found under data tab it's an add on So, in To add it on in Excel. In the data analysis tool pack, you will find there is a option for correlations. Select that press OK.

It gives a dialog where it asks for input range in the input range provide all the data for which you require to find the correlation. Now here you don't need to go pair by pair you can give all the data in a single row. After you've done that, you say that the data contains label because we included the first row also in our data which was defined. Then we specified output range and click OK. So there you have the correlations calculated. That concludes our discussion.

Thank you for listening

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.