In this video we discuss about different types of ttest that we can apply e.g. one sample ttest,two sample ttest, independent sample ttest andpaired sample ttest.
00:00 Introduction to video
1:00 One sample ttest
2:50 Independent Sample ttest
5:30 Paried Sample ttest
Website: thedatahall.com
As an Amazon Associate, I earn from qualifying purchases.
Show More Show Less View Video Transcript
0:00
Welcome to the Data Hall YouTube channel. In this video we are going to talk about
0:03
the t-tests in R. So what we are going to do is we
0:07
this is what we call mean comparison. So we test whether the mean is significant or not. So that means
0:15
significantly different from one value or the mean between two different groups
0:20
is same or it is different. For that we are going to use the auto data. So let's
0:25
first load the data. We have the data over here. We have
0:29
different firms, different cars, their prices, their mileage, their weight and length etc
0:36
And then there is one variable of interest that is whether the car is
0:40
domestically produced or the car is a foreign produced product. So we have different types of t-tests. One is called
0:49
one sample t-test. Then we have two sample or what we also call independent sample t-test. Then we have the dependent sample or the
0:58
paired sample t-test. So let's start with the one sample t-test. So we have this variable called the price of the car. So what we want to do
1:06
is we want to take that variable and we want to see whether
1:10
its mean value is different from 6000 or not. So our hypothesis is that
1:16
the mean value of the mean price of the car is 6000 in our data set. So we are going to use the t-test function
1:25
and take the data, the column, the specific column, specify the specific mean that we want to test
1:31
and press ctrl enter. And what we are interested in is the p value. This p value is greater than 0.05
1:40
That means we accept the the null hypothesis. And in this case the null hypothesis is that
1:48
the mean value is equal to zero. Sorry then the mean value is equal to
1:53
6000. So because we are not able to reject the null hypothesis so we conclude that the mean value is
2:02
equal to 6000. The alternate hypothesis has been specified over here which says
2:07
that the true mean value is not equal to 6000. So what our conclusion is that the mean value in this data set
2:13
is equal to the mean price is equal to 6000. If we were to check whether it is different from zero or not the mean
2:21
value is different from zero or not. So we would specify the mu as zero and what this says is that the p value is quite low it is
2:31
less than 0.05. So we reject the null hypothesis in favor of alternate hypothesis and the alternate hypothesis says is that the
2:38
mean value is not equal to zero. So that was one sample t-test we apply
2:42
it when we have a data and we want to check its
2:47
mean value. Okay let's move to true sample t-test it's also called independent sample t-test for a reason because we consider
2:56
that these samples are independent of one another. So we have this variable called foreign
3:02
and it contains whether the car is domestically produced or a foreign
3:05
produced car. So we want to compare the mean price between these two categories of this foreign variable that is
3:14
we want to see whether the average price of a car is same whether the car is produced domestically or it is produced
3:23
in a foreign country. So what we do is we use the t-test
3:28
and we take the price variable use the tilde sign and then specify the
3:32
categorical variable and obviously specify the data. If I press ctrl enter I can see that our p value is
3:40
greater than 0.05. In this case the the null hypothesis is that the mean
3:46
value of the the average price of foreign and domestically produced car is same and in this case we are going to
3:54
accept that null hypothesis. The alternate is that the mean value the true difference in mean
4:00
between the group domestic and group foreign is not equal to zero. So that is
4:05
their means are different so the null is the means are same and the alternate is
4:10
that the means are different. Now in this case we had our data stacked
4:14
over one another so our categories were stacked over one another but some cases we would have let's say a variable
4:20
called foreign price and then we would have a column called domestic price. In that case what we do
4:27
is let's generate that data using the split function I am going to generate
4:32
that data. So now we have this groups list where we contain it can be data frame or a list
4:40
the the idea is that now we have two columns one is domestic car and then the foreign car we have
4:46
52 observations in domestic cars and 22 observations in foreign foreign cars and we want to compare their mean so the result would be
4:56
exactly the same but the way we specify is different we just
5:00
specify these two columns now and R would compare now R would
5:05
generate the exact same p value the exact same mean values right it would also generate the mean value so previously the
5:13
domestic car as mean was 60.72 and currently it is also 60.7
5:20
60 60 6072 so so there are different ways of doing doing this two sample t-test and this
5:29
is what we have demonstrated. The next is pair sample t-test what we also call the dependent sample t-test and for that I'm going to use a
5:37
different data set I'm going to use the chicken weights and what we have over here is that we have the weights of the chicks
5:47
and at what day that that weight was taken so that is
5:54
the zero means on the date of the birth and two means
5:58
two days after the birth. So what we are going to do is let's create some category filter the data
6:04
just have the two categories that is weight on day zero and weight on day two and we want to compare whether their
6:13
mean weight is different or not but in this case it is different from our previous data in previous data we had two different
6:20
categories and they were totally independent but in this case the subject is the same that is the same chicks and we want to compare their
6:28
weights so what we do is we use the pair sample t-test apply the
6:32
function specify the the column the weights column and then the categorical variable and specify whether we want to do
6:42
pair sample t-test or not if you do not specify this parameter then it would apply the the the one sample t-test right this is
6:52
sorry the two sample independent sample t-test so this argument is important for it to come
6:58
to do the paired sample t-test so let me apply this and again we are going to look into the the p-value
7:06
it is less than 0.05 in this case the null hypothesis is that the two
7:11
two means are same and the alternate is that two means are different
7:16
so our conclusion is that based on the the p-value that the two
7:21
means are different because when the p-value would be less than 0.05
7:25
we would reject the null hypothesis and accept the alternate hypothesis so i hope this was useful do subscribe to this channel
7:33
and do hit the bell icon
#Science
#Vehicle Specs, Reviews & Comparisons


