Correlation Analysis in R

Name: Correlation Analysis in R | Open Video
Uploaded: 2024-05-16T07:12:13+00:00
Duration: 9 min 54 s

0:00
Welcome to the Data Hall YouTube channel
0:01
In this video, we are going to talk about how do we test correlation or apply correlation in R
0:08
There are different types of correlations. So we are going to discuss PSN correlations, spearman correlation, the Kandal Tau correlation
0:16
and then we are going to, these are the numeric, these are the correlation that we apply on numerical data
0:22
continuous data. And then we have categorical correlation related to categorical variables where we would
0:29
see at a tetrachoric correlation, then we would see at polychoric correlation, and we are also going
0:38
to discuss how do we do list-wise, case-wise and pair-wise deletion or correlation
0:44
So let's start with Pearson correlation. This is the mostly applied correlation
0:50
So we apply this on the continuous data. What we do is we use the first let's load the data
0:57
We are going to use the iris data that contains the length and weight of, sorry, length and width of different flowers and their species
1:08
So we are going to work with the length and width of these two columns
1:17
So when we do Pearson correlation, we use the CUR function and let's apply this correlation on all the data
1:26
So what we do is we use the iris data frame. We do not select any columns or rows
1:33
So all the rows would be taken and column one to four. So let's apply this on column one to four
1:42
And we are going to apply the Pearson correlation. So if I apply this, we can see that we have got the correlation
1:49
And this is the correlation between seepel length and simple length. So that means it would be one because it is the correlation between itself, the variable itself
2:01
This is the correlation between sepal width and sepal length. And we can see this is the correlation with petal length and sepal width, sepull length
2:10
So this is any value greater than 0.7 would be considered a high correlation
2:16
Any value between 0 and 0 would be considered moderate correlation and anything lower than 0 is considered a low correlation Also when we talk about correlation there is no dependent or independent variable This is just an association between two variables
2:34
We cannot say that petal length would have an impact on sepal length or sepal length
2:40
would have an impact on petal length. But it is just an association between these two variables
2:46
So there is no dependent or independent. There is no causality. We are just looking at the correlations associations
2:53
Let's say we just wanted to do the correlation between two variables
2:58
So we would select the first variable and the second variable and the rest of the one would be exactly the same
3:04
Apply this and we get that the correlation between CPL length and battle length is 0.87
3:10
This is what we got over here. That is the positive correlation
3:15
Next, let's move to Spearman correlation. And spearman correlation applied when we have certain
3:22
ordinal data or ranks in our data. So again, the only change that we would do is instead of the word Pearson
3:31
we would use the word spearman and the rest of the command would exactly be the same
3:38
So we get that the spearman correlation is 0.88 between these two variables
3:43
But we can also get the significant test, the significant values, whether this correlation is statistically significant or not
3:50
and that is what we get using this line of code. The only different is we use the exact equal to false
3:57
The rest of the command is same except for the function also because now instead of correlation function
4:03
we are using correlation test. And this is the P value that we are interested in
4:07
If P value is less than 0.05, then we would reject the NEL hypothesis
4:14
which suggests that there is no correlation or the correlation between these two variables is equal to zero and the alternate is that the
4:22
correlation is not equal to zero so we reject the null hypothesis in favor of alternate hypothesis
4:28
and that means that statistically speaking we are statistically significant that there is a positive
4:35
correlation between these two variables so a p value less than 0.05 so just a significant correlation
4:42
Let's move to Kandal Tau and Kandall Tao is also used when we have an ordinal data
4:49
The rest of the command is exactly the same. We just change the word Kindal and we get the correlation and the p value also so this this correlation
5:02
test would give you the p value and the p value and as well the correlation value right
5:07
so you if you are interested in in the test statistics then just apply the core cvr.test
5:14
function so we can also see that it is statistically significant so let's move to correlation
5:19
of categorical variable where we are going to use the tetrachoric correlation and for that we are
5:26
going to load the psych library and now i have already installed this library but if you haven't
5:32
installed it then you would install it using install dot packages and specify the name of the library
5:38
that you are going to load press control enter and it would be installed but i have already
5:43
install it so i am just going to comment this out so let's load this library the library is
5:49
We are just going to create a table that would contain a matrix that would contain certain data
5:58
So this is how the data looks like. We are going to apply the correlation on this data
6:04
Let's assume that is certain categories, right? So we just use the word tetrachoric and then the function tetrachoric and then specify the data that we want to apply it on
6:17
And this is what we get our correlation. Then we have polycorac correlation
6:25
Now this is applied when we have more than two or more ordinal categorical variable, right
6:31
So there should be ordinal, some order in the data, right? The categorical values rather than just being different categories
6:42
For example, gender is just categories. There is no order in that
6:46
So we use the tetrachoric correlation, but if we look at the customer satisfaction or service quality, then there obviously five is either more or less customer satisfaction, right
6:59
So there is an, it is an order in data. So let's just create this data
7:05
So I have created two vectors that is customer satisfaction and service quality
7:09
And I want to look at the correlation between these two variables
7:13
So I use the polychoric, the function polychorec. and then specify the both the vectors or columns if you if your data was in a data frame plus control enter and we okay so I haven loaded the the library for that we are going to use the polychoric library
7:33
and press control enter and we get the the correlation last we are going to
7:39
discuss the list wise or pairwise correlation so list wise and case wise is one in
7:44
the same thing list wise means that what we are going to do is we we just apply
7:49
the correlation on the data where all the observations are available, right? So for example, if we
7:59
have four columns and the, so if data is missing in any of that column, then that specific
8:06
row would be excluded if we use list wise or case wise. So let's generate some data, right? And let's
8:14
input. So we have customer satisfaction, product quality and sales performance. The
8:18
are randomly generated data. Let me show you this data. So these are certain numbers
8:26
It is numerical data, continuous data. And let me introduce some missing values, right
8:31
So we have introduced some missing values in our data set. So what I'm going to do is I'm going to apply the list-wise correlation
8:40
And for that, what I'm going to do is within this correlation function, I'm going to use the
8:46
use parameter and specify complete observations. That means where all the data is available and that would apply the correlation
8:55
If I were to do peer-wise correlation, that means apply the correlation when, like for example
9:01
when it is going to apply the correlation between customer satisfaction and product quality
9:05
then we just need to include the rows that contain the data for both service quality and
9:14
customer satisfaction. This is what we call peer-wifference. So for each pair we should have the data
9:19
So the number of observation that we would have for this specific correlation
9:23
the correlation between product quality and customer satisfaction, would be different from the number of observations that we have for the
9:30
let's say, sales performance and customer satisfaction. So this is what we call pairwise dilation or pairwise correlation
9:40
We just use the pairwise word and the rest of the command is exactly the same
9:44
Let me press control. Enter and we get the peer-wise correlation. So I hope this was useful
9:51
Do subscribe to this channel and do hit the back

Correlation Analysis in R

thedatahall.com

How to Estimate Beta Using DCAPM (Downside CAPM) in Stata

How to Check Normality of a Variable in R

Standardization and Normalization in R

Overwatch Mei Ecopoint Wallbreach

How to Add Pin Code Checker to E Commerce Website

How To Link Your Etsy Shop On Pinterest (2024)

CoinWeek IQ: Original vs. Non-Original Gold Coin Surfaces - 4K Video

Bacon-Wrapped Jalapeño Cheese Corn Dogs | Fair Food Goes Cowboy-Style!

Rodney & Holly Robinson Peete Remember Malcolm-Jamal Warner 💔 | SWAY’S UNIVERSE

BLIND TEST - Galaxy S21 Ultra vs iPhone 12 Pro Max

CoinWeek IQ: Grading Ancient Coins with David Vagi - 4K Video

Angel Reese Spills The Tea On Caitlin Clark, Her Love Life, LSU Rumors & More On Her New Show

Up next in 10

Correlation Analysis in R

thedatahall.com