0:00
Welcome to the DataHall YouTube channel. In this video we are going to talk about
0:04
how do we perform a two-way interaction in R. We had been discussing this categorical
0:10
variable regression for past couple of videos. So in our first video we discussed how do we
0:14
perform a single categorical variable regression and how do we create its margins and margins plot
0:21
Then our next video we discussed what if we have multiple categories in a categorical variable
0:26
We discussed multiple categorical variable integration a categorical and continuous variable in regression. In this video we are going to focus on different types of
0:36
two-way interaction. So what if we have a categorical and a continuous variable
0:41
two categorical variables and two continuous variables. How do we create its margins and
0:46
margins plot and how do we interpret the results. So let's start with the packages that we are going
0:52
to use. We are going to use the ggeffect and tidyverse package. So let's load these packages
0:58
and then we are going to load our sample data. So let me show you this data. We have different
1:04
individuals, their age, their marks, their salary, their gender and level of education. So these two
1:12
are categorical variables and then the rest of the two are continuous variable and the salary is our
1:18
dependent variable. So if I want to perform the interaction between a categorical and continuous
1:23
variable first we are going to see how do we perform this interaction. So remember by interaction we
1:30
mean that there is a moderating effect of a variable. So for example we say that gender do
1:36
let's say age do have an impact on salary but this relationship is moderated by gender. So there is a
1:43
different slope for each category of gender variable. So the relationship between age and
1:50
salary is different for each category of the gender variable. Now how do we execute the interaction
1:57
term. First we write the function, the dependent variable and then all the independent variables
2:03
and then the interaction term. This interaction term is used using the colon sign. If I press
2:11
control enter show you the summary of these results then you would see that these two are
2:18
our main effects. This is our intercept and this is in the interaction term. If this interaction
2:24
term is significant then we can conclude that there is a moderating effect of gender on the
2:31
relationship between age and salary. The rest of the interpretations are what we have already
2:37
discussed in our previous videos and let me also show you one more way of performing this
2:43
interaction. We have used colon and what colon would do is that you would separately identify
2:50
the main effects and then you would use the interaction of these effects but what asterisk
2:56
can do is you simply use these two variables and use an asterisk between them and what it would do
3:04
is it would automatically include both the main effects and as well as the interaction term. Again
3:12
the results are same we just wanted you to know that there are different methods and there is this
3:18
difference between colon and asterisk. Now let's create interpreting these are somewhat difficult
3:25
from table so we would create graphs we would create visualization to interpret these results
3:30
For this we are going to again perform this interaction and we have the results in this
3:37
object. Now for this we are going to create a sequence of observations for the age as we did
3:45
in our previous video and we have already explained it in our previous video. We are going to predict
3:51
the values the predicted values for male for different level of age that we have used over
3:58
here between 35 and 45 and if I can show you these values these are different predicted values for
4:06
different age. So for age 35 the male predicted salary is 30 3097 for age 36 it's predicted
4:16
salary and so on and so forth. Similarly we would create the predicted salary for female
4:23
combined these different convert them into data frames and combine these this data frame if I can
4:33
show you this data frame it is plot data interval. Now you can see that we have gender their age and
4:42
their predicted salary. Now we have all the data that we can use to create a plot. Now the only
4:48
thing that we need to do now is create the plot. So we take this data frame on x axis we have age
4:56
on y axis we have the predicted salary this column and the color the separate the colors are
5:04
we would have different lines and these lines are separated would have a different color and
5:09
these colors are based on the gender category. We give the name of each axis and the title of
5:16
this graph. If I can execute this you can see that the slope of male is quite steeper as compared
5:25
to the slope of female so that means that the impact of age for each level of for each one
5:33
unit increase in age salary of male will increase higher than salary of female. So that's what this
5:40
graph is means. Similarly the at the starting point of male salary is less as compared to
5:48
females their intercept the males earns lesser salary as compared to female but the change the
5:55
rate of change for male salary is greater as compared to female. Let's move to second topic
6:02
which is the interaction term between two categorical variables. We have these two
6:07
categorical variables which is gender and education. Let's execute this regression and we
6:14
are going to use this expand grid to create different combinations of these gender and age
6:22
Let me show you these values so we have different combinations for each level of education we have
6:29
each gender category and now we are going to predict for each of these categories
6:37
the predicted values. So let me show you this prediction values. So now we have predicted
6:43
value for primary level of education for male a prediction value for primary level of education
6:50
for a female and so on and so forth. Once we have these predicted values the next step is
6:55
to plot them. So next let's use ggplot to plot on x axis we have education on y we have salary
7:04
and these are color coded using gender. We are going to create a line graph and the groups are
7:11
based on gender. So let's execute this and you can see that we have different level of education
7:19
and for each level of education we have a line representing the males and female salary. So for
7:26
example we can see that female earns higher as compared to male at primary level of education
7:35
but that is different when we have other level of education. Now do remember this is a hypothetical
7:42
data right. Lastly we are going to do an interaction of two continuous variables. So we have
7:49
age and marks these are two continuous variables. Let's predict the regression
7:56
and then we are going to take the unique values of age and unique value of marks
8:05
Calculate the predicted values and let me show you this prediction. So for 34 age 82 marks we have
8:14
this predicted salary so for for each level for each combination we have a predicted salary and
8:20
now we are going to plot it but this plot looks quite messy right. We cannot make any sense out
8:27
of it so a better way is that we reduce the categories of age and marks. So let me re-execute
8:34
this regression and now we just have need to have five category of age and five categories of marks
8:45
and then what we are going to do is we are going to take the where age is greater than this first
8:52
category this first limit and less than the second limit. So between 40 and 45 we are going to do
8:59
similarly for the marks predict the values and predict the salary for for these values. So for
9:08
40 level for 40 age 60 marks we have this predicted salary similarly for 40 age 61 marks we
9:16
have this predicted salary. Now we have limited these values and lastly we are going to plot them
9:23
and we can see a nice graph we can see that as age increases salary increases for all different
9:29
level of marks but the one with the higher marks earns a higher salary its slope is higher so if
9:37
its age increases the one with the higher marks would earn a higher salary for each increase for
9:45
each unit increase in age. So I hope this video was useful stay tuned to this channel because in
9:51
our next video we are going to discuss three-way interaction so do subscribe to this channel
9:57
do hit the bell icon and thanks for watching this video