Working With Missing Values in R
502 views
May 16, 2024
In this video we discuss explicit and implicit missing values. Excercise file: https://payhip.com/b/gVCJO Working with missing values in R https://thedatahall.com/dealing-with-missing-values-in-r/ 00:00 Intro to Topic 0:28 Explicit missing values 4:57 Implicit missing values Website: thedatahall.com As an Amazon Associate, I earn from qualifying purchases.
View Video Transcript
0:00
Welcome to the DataHoll YouTube channel
0:02
In this video, we are going to talk about how do we work with, how do we deal with missing values in R
0:09
Now, before we move forward, I'm going to work with the TidyWorse package. So if you haven't installed the Tadivus packet, you can install it using this line of code
0:17
Because I have already installed it, so I'm just going to load this package. Now, there are two types of missing values
0:22
One is called explicit missing value, and the second one is called implicit missing value
0:27
So let's first work with explicit missing value. So what is explicit missing value
0:33
If we look at this data, this is a data frame that I have created, this is a sample data
0:38
We have GDP of different countries of different years. So we have year 2011, 12, 13, 14, 15, 16, 17, and 18
0:49
Now these all are in series. There is no gap in them. And these are GDP growth rates for
0:57
all these ears. Now you see this NA over here, this NA represents the missing value in R. So if I can
1:06
show you from this object that I have created, you can see that this is an NA over here. This means
1:12
that this is an explicit missing value. We know that it is a missing value and R tells us that
1:18
it is a missing value. Now there are different ways of fulfilling or filling these values. One is using
1:27
the fill function. So what we do is we take this time series data and use the fill function
1:35
and fill the growth rate. What it would do is it would take the previous value and drag it down
1:41
So whenever there is a missing value, it would take the previous value and replace, I mean
1:46
put it or replace this missing value with, for example, in this case, it would get a value of
1:53
two over here So if I execute this and if I can show you this new object that had been created You can see that this missing value this explicit missing value had been filled Now there are different ways of working with this fill function
2:07
You can look at its help menu and you can see that we have down, up, down, down, down, down up and up down
2:18
Now, these are different ways of working with it. We already have a whole video that would explain this phenomena
2:25
But by default, it is down that it works with. So for example, if I set it to up, the direction is set to up, then what it would do is that we have this missing value
2:38
Instead of taking two from dragging it down, what it would do is it would drag this thing upward, this number upwards
2:46
So if I can execute this and show you the values, then you can see that we have five instead of two in this case
2:55
Anyway, you can work with these, this fill function and look into it
3:00
Now, there is one more function and that is called. I'm not sure how to pronounce it, but what this function does is that it takes the first missing value and replace it with any non-missing value that you want
3:17
In this case, I am passing it to zero. So I want it that if there is a missing value, then replace it with
3:25
with zero. So if I execute this, I'm saving this in time three, it is replaced with zero, right
3:33
Then we can do missing value representation. And what it, what it means is that in some cases
3:41
missing values are not explicitly missing. I mean, they are represented by a certain number
3:48
or certain alphabet, right? So in this case, for example, this missing value is represented by
3:54
number 99 So this 99 means it is a missing value right So what we do is that now in this case we have this time series It is 99 From ours perspective it is not missing but from us human perspective
4:09
Some data sets are that are coded previously in other softies that do not have this capability to work with missing values
4:18
What they do is they assign a number, a special number that we know that this data would not have
4:25
and that special number is considered as a missing value. So for that we can use an if and what it would do is it would take that specific number
4:36
If there is that number, it would convert it into a missing value
4:40
So convert it into missing value if it is this number. This is how we can explain it in plain English
4:49
So if I can execute this, I get time series four and that is converted into an explicit missing
4:55
value. Now let's move to implicit missing value and what that means is that now if you look at
5:02
this data, we have different countries. So let's say we have country A, country B and country C. And
5:08
we have for each of these countries, we have two years of GDP. But in case of B, we just have a
5:15
GDP growth for year 2010 and 2011 is missing. But this missing is not represented. Like for example
5:24
this N-A for country C for year 2011 is an explicit missing value
5:30
But in case of country B, we do not have even the year 2011
5:35
So if I can show you this data, this is T explicit. You can see that year 2011 is missing
5:43
This country B, 2011 data is an implicit missing value, whereas this country C, 2011 is an explicit missing value
5:52
So the issue with implicit missing value is as its name suggests that we cannot we cannot see them So one way of working with them is that we can convert this data This is currently in the long format We can convert them into a wide format using this pivot wider function
6:10
We take the names from the country column and the values from the GDP growth column
6:17
If we convert this into a wide format, then those implicit missing values would become explicit and we can fill them
6:25
So in this case, this B year, country B year 2011 becomes explicit and now we can fill these explicit missing value using any of the methods that we have discussed over here
6:39
We can also convert this dataset into complete without even converting it into a wide format using this complete function
6:48
So for example, we take the T explicit missing value and. And we complete the data for each country each year
7:02
If I can show you the complete data, you can see now we have a complete data
7:07
It is still in the long format. We didn't convert it into it, into a wide format
7:13
but it is now, the implicit missing values are now become explicit missing values
7:19
Okay, there is one more way of doing this complete function that is let's just say if we know the year range that we are going to work with and that
7:30
year range is still missing from our main year variable then we can specify the year range
7:37
and that would give us all the year range in that case for example year 2012 is missing
7:44
for country A, B and C so now this year which was implicit missing value had become explicit
7:52
missing value so I hope this was useful, stay tuned to this channel, do hit the subscribe button, do with the like icon
8:00
and thanks for watching this video
#Computer Education
#Computer Science
#Educational Software