Fill Gaps in Time Series and Panel Data in R
3K views
May 16, 2024
When we work with time series or panel data in R , there are some missing value or gaps in the data. This video discusses how to to deal with these gaps. Excercise file: https://payhip.com/b/E5MHp Fill gaps in Time series and panel data https://thedatahall.com/filling-gaps-in-time-series-and-panel-data-using-r/ 00:000 Intro to video 0:27 Time series gaps 2:56 Panel data gaps Website: thedatahall.com As an Amazon Associate, I earn from qualifying purchases.
View Video Transcript
0:00
Welcome to the DataHall YouTube channel. In this video we are going to discuss how do
0:04
we fill gaps in a panel data or a time series data using R. So I'm going to work with the
0:12
tidyverse package and the zoo package in this video. If you haven't installed these packages
0:16
you can install them using this line of code. So I haven't installed them so I'm just going
0:21
to load these packages the tidyverse package and the zoo package. So let's work with the
0:28
filling gaps in a time series data. Let's say we have a time series data over here where
0:33
we have different years and then there are certain missing years. For example over here
0:39
we can see that year 2015 is missing and then we have year 2019 and 20 is missing. So how
0:47
do we fill these gaps? The first thing is that we need to generate a placeholder that
0:54
can have an explicit missing value. So if you are familiar with what implicit and explicit
0:59
missing values are I have already made a video on that. I would give that link in the description
1:05
so if you haven't watched them do watch it and you would understand what explicit and
1:10
implicit missing values are. So one way of doing it is to use this sequence and what
1:15
we can do is we can generate these sequence of years. For example we can start with year
1:22
2010 take the time series data take the year column take its missing minimum value and
1:29
maximum value or give it a sequence and that sequence is generated would have these years
1:39
only right. And what we can do is we can take this time series object and merge it with
1:48
the time series years that we have created right and we merge it by year and that way
1:54
once we are done we would have this full series that would contain all the years because we
2:00
had all the years generated from the sequence function and now we have merged them using
2:06
the merge function. Now this full series do contain year 2005 and 19 and 20 but their
2:13
values are missing but what we have done is we have converted these implicit missing
2:18
values into an explicit missing value. Now we can use certain functions either we can
2:23
Ipolate or use some other functions to fill these missing values. So I would just give
2:29
you an idea of how to work with Ipolation. We can Ipolate take the full series data mutate
2:35
this GDP growth which is already there it would replace now from zoo package we are
2:40
using to do an approximate so that is used for missing value approximation and if I can
2:48
show you the full series now we have the missing value had been replaced using certain Ipolation
2:56
method right. How do we fill these missing values if we have a panel data for this we
3:01
are going to first load the TS build package. Now this is similar to a table package but
3:10
it is for the time series or panel data. So I have already installed it I am just going
3:14
to load this package and I am just going to create a data table that would contain a panel
3:20
data we would have different firms their different years and stock prices right we have 3 firms
3:26
their stock prices and years data. Let me show you this panel data this is how it looks
3:32
like now there are certain missing values in this panel data right. So for example for
3:39
firm 2 the data starts from 2005 but for firm 1 the data starts from 2010 so there
3:45
is the rest of the years are missing and there are some missing values in between these years
3:51
for example for firm 2 we do not have year 2014. So these are again implicit missing
3:58
values how do we convert them into explicit missing values and fill the values. So one
4:03
way of doing that is to use fill gaps function we take the panel data that we have just created
4:09
but rather we would first have to convert it into a table so it is currently a data
4:14
frame we would have to convert it into TS build what we do is we take this panel data
4:21
data frame tell this function requires that we need to tell them what is the key it is
4:27
the firm in this case and what is the index which would be any time variable because panel
4:32
data contains a cross sectional variable and a time variable so we specify the cross sectional
4:38
variable into the key parameter and the time variable into the index parameter. So if I
4:42
execute this it would do nothing but it would just convert it into a different class object
4:49
right it would look still similar to the panel data that previously we had but only the internal
4:55
structure of this panel underscore data is different. Now what we do is we use the fill
5:02
underscore gaps function take the panel as data that we just created and do the full
5:09
equal to true that would mean that all the missing values would be now become explicit
5:15
So if I can show you this missing panel now you can see that we didn't have 2005 till
5:21
2019 in case of from one similarly we didn't had year 2012 13 18 and 19 in case of from
5:30
one and there were other missing values in other. So now we have created we have converted
5:36
these implicit missing values into explicit what we need to do is we need to fill them
5:40
we can fill them using this fill function. So take this missing panel data group by firms
5:47
fill stock prices and the direction should be down up. What that means is that in cases
5:54
where there is a value preceding the missing value it would move downwards if there are cases
6:02
where there is no preceding value it would take this the next value and fill it upwards that is
6:09
what the down up would do. Again let me show you the down up so because we didn't had all
6:16
these values these were missing it took the succeeding value but in case of other data
6:25
where we didn't had preceding values we took succeeding values. What if we did just down
6:35
what would happen let me show you this thing and let me show you the data now in this case
6:42
I told told r that just to replace it with the preceding value where there was no preceding
6:48
value it still remains the missing value. Last thing is how do you fill these missing values
6:54
using some tidyverse function so for tidyverse we have this fill so at this complete function
7:00
we tell it what the firm variable and the year variable is and replace these with the missing
7:06
values convert these implicit missing values into explicit missing values and this is how
7:12
it would look like and lastly we can convert those explicit missing values into certain meaningful
7:20
values so in this case I'm just going to replace them with zero and let me show you this panel
7:25
data now we have these zeros over here so I hope this video was useful do subscribe to this channel
7:31
do hit the bell icon and thanks for watching this video