How to Perform ANOVA in R| One-Way and Two-Way ANOVA
4K views
May 16, 2024
In this video we discuss how to perform one way and two way ANOVA in R. 00:00 Intro to video 0:27 One-way ANOVA 4:26 Two-Way ANOVA Website: thedatahall.com As an Amazon Associate, I earn from qualifying purchases.
View Video Transcript
0:00
Welcome to the Data Hall YouTube channel
0:02
In this video, we are going to talk about how do you perform a one way and two-way annova in R
0:08
So let's start with the one-way annova, and one-way anoa is used to compare means if we have more than two categories
0:16
If we have two categories, then we can do T-test, and if we have two categories, whether we perform T-test or we perform an onova, they would give us exactly same results
0:25
So let's perform annova and we have, because you have more than two categories
0:29
Let's set the seed so that we can generate a random data and when you perform this command along with me, you would have the exact same data if you have used the exact same seed
0:44
So next, let's generate the height of different groups. So we have three groups and we have generated a normal distribution height, 30 observations, having a mean and a standard deviation for these three groups
0:58
and let's create a data frame that would contain two variable. The first one is height and it would contain the height of these three groups that we have generated in a random distribution using random distribution function
1:15
And we would have three groups that is men, women and children
1:20
And let's perform this command and we have this height's data where we have height of different individuals
1:27
men, women and children, and we would perform this one-way anova on these three categories
1:38
So now we have three categories. We cannot perform the T-T test. We would have to perform ANO
1:44
And the command for ANO-V is AO-V, this A-O-E function. We specify the variable of interest which is height and then we specified after the tilda sign we specify the variable that contains the group information We also need to give it the data frame that we are going to use that we have generated over here
2:06
Now, before we discuss the results of Innova, we need to check the normality of the residuals or this normal distribution, whether the residuals are normally distributed or not
2:19
Because that is one of the assumption of the NOVA. And for that, let's first extract residuals from this equation
2:27
We extract the residuals using the residuals function. And using the Shapiro test, let's test whether the results are significant or not
2:39
So let's display the results and we have the P value. The Nell hypothesis of Shapiro to test is that the data is normally distributed
2:48
and the alternate hypothesis is that data is not normal distributed. Because our P value is greater than 0.05, so we accept the null hypothesis, which is that the data is
2:59
normally distributed. Let's move on to display the results of this ANOA that we have performed
3:06
So we have saved the results into this object called height ANOA, and let's present the summary
3:13
statistics of that variable. Now, we have a different statistic. but what we mainly are interested in is this P value
3:22
Now this P value indicates, in this case, let's first discuss what are null and alternate hypothesis
3:29
So our null hypothesis is that the mean value of all the groups are same
3:34
So that means whether it's men, women and children, their mean height is same
3:40
That is the null hypothesis. And so what we mean to say is that there is no impact of this group on the height of of an individual And the alternate hypothesis is that the mean of at least one of this category is different
3:56
That means there is an impact of this group on the variable of interest, which is height in this case
4:04
Now, in this case, we get the p value of 0.05. And we know that if p values less than 0.05, what we do is we reject the Nellah hypothesis, which
4:14
is to say that we are saying that the means are different
4:18
So if the P value is less than 0.05, our conclusion would be that the means are different for these different categories
4:26
Let's move to two-way and over. Now, the difference between one-way-nawa and two-way-a-noa is that in one-way
4:32
and over we only have a single factor, such as in this case
4:37
we only had these groups that contain men, women, and children. But in two-way and over, let it, we would have a single factor
4:44
more than one factor. We would have two factors. Let's generate a new data. Let's set the seed
4:51
And before we move forward, let me clear our environment so that we do not have any clutter
4:58
over here and we can easily understand what's going on. So let's generate this data exam score
5:04
that would contain 90 observation with a mean and a specific standard deviation. This would be a normal
5:10
distributed data. Now, we have two different factors. One is students grade in another subject, let's just say
5:20
And then we have number of hours of study that that person had done
5:24
So what we are saying is that students grade do have an impact on exam score
5:30
The people with different grades would have different exam scores. and the people with whether having a low number of hours of study or the person who had done the student who had done a more number of hours of study would have a different exam scores so their mean would be different for
5:50
these different categories so let's generate this categorical data and let's create a
5:56
data frame that would contain three columns the exam score that is coming from here the grades
6:02
and the study hours that are coming from here we have created this factor function so
6:07
that Stata can that R can know that these are the factors rather than certain characters
6:17
Okay, so how do we perform the two-way anova? We use the same command that is the same function
6:25
that is O anova function and we specify our dependent variable, the tilda sign and our
6:32
independent variables separated by this asterisk which is used for multiplication and then we have this
6:42
exam data that we have created over here. If you perform this and show you
6:48
the summary statistics, we have the summary statistics we have the P value
6:53
of the grades and it would show the impact of grades on the exam score in this case
7:00
it is greater than 0.05 so we can conclude that the means are same for different grades and the study hours
7:09
the number of hours of study that is true and had done again the P values greater than 0.05 so we can
7:15
conclude that the mean exam score for different number of other the person had done low score
7:22
low number of hours of study or high number of hours of study would have their mean would
7:28
be same. So that's about an ova. I hope this was useful, do subscribe to this channel and do hit the bell icon
7:36
Thanks for watching this video