
videocam_off
This livestream is currently offline
Check back later when the stream goes live
In this video we discuss how to combine or concatenate strings in R. We use the stringr package that is part of tidyverse package.
00:00 Intro to video
0:50 String concatenation
3:37 Missing value in string
6:44 Str_glue function
7:34 Str_flatten function
Website: thedatahall.com
As an Amazon Associate, I earn from qualifying purchases.
Show More Show Less View Video Transcript
0:00
Welcome to the Data Hall YouTube channel. We have started a series on different aspects
0:05
of string variables or strings in R. So it is working with strings in R. In our previous
0:12
video we discussed about how do we create a string variable. In this video we are going
0:16
to look into how do we combine different strings and in the next video we are going to talk
0:21
about how do we extract data from string variable. So as discussed in previous video we are working
0:26
with the tidyverse package. Let's load this tidyverse package and this video we are going
0:31
to work with how do we concatenate string variables and how do we work with missing
0:36
values when we have a string variable and there is this string glue and string flatten
0:42
function that we are going to work with in this video. So let's start with the string
0:47
concatenation. Let's say we have these two strings. We want hello and Adam to be combined
0:55
Obviously they are two different strings. So what we do is we use STR underscore C which
1:00
stands for string concatenation. So when I press control enter you can see that we have
1:07
hello Adam printed over here and that is what the string concatenation has done. It has
1:14
combined these two strings these two different strings. So let's say we have over here hi
1:19
and then we have two different this is a vector of two two strings that is Adam and
1:27
Rose and we want it so we know that when we combine two different two vectors of different
1:33
sizes it would repeat the smaller vector. So what this would do is it would print hi
1:38
with Adam and hi with Rose. So this is how it would look like. But normally we are working
1:45
with data frames. So let's say we have this names data frame. We are creating a data frame
1:51
over here. We have two columns in that data frame. First one is the first name the second
1:56
one is the last name. We have the names we have three different people in them. Let's
2:02
press control enter. We have this object called names and what we want to do is we want to
2:08
create a full name that would contain the first name and the last name. So what we do
2:14
is we take the names object mutate and then create a new column called first name that
2:20
would be equal to then this column would contain the combined first name and last name and
2:25
we can we know that we can use string concatenation function to combine these two columns. So
2:31
let's press control enter and let's look at the names object and we can see that we have
2:37
the full name but we do know that there is no space between them. What we can simply
2:41
do is have a space over here and separate them using commas. Now if you press control
2:49
enter recreate that column you would have a comma over here. Now do remember that string
2:56
concatenation is the exact same method that we have in base R which is called paste or
3:02
paste zero. So string concatenation is just a mask for paste zero. There are some some
3:07
changes that are there with string concatenation and as we move forward it would be evident
3:12
that what are those changes. So let's do this the space zero. We have exactly the same command
3:18
as over here except I'm creating a new column called full name two and I'm using the paste
3:24
zero function. The rest of the command is exactly the same. So you can see that the
3:29
output is exactly the same. So this string concatenation is exactly the same thing as
3:35
we have paste zero in base R. But let's look at how missing values are treated in both
3:42
these functions. So we have over here the same the same data frame but instead of having
3:48
the third name what I have is the third person have a missing last name. We do not know what
3:55
his last name is or there might not be any last name but whatever the case is we do not
4:00
have that data. So let's create this object names underscore M that would contain the
4:06
missing value. And now if we use the exact same command and concatenate using string
4:13
concatenation and using the paste zero function what you would see is that with string concatenation
4:21
it do respect for the missing value right. It do not print the missing value. The whole
4:28
object if there is missing value. So so with string concatenation we see this result but
4:36
with string with the paste zero function although we didn't had this space within them but we
4:43
can have the space just to make them comparable. So what you can see is that it had combined
4:53
N A with the first name. That is not what we would have expected. So this is where string
4:59
concatenation might might be helpful or we might want to let just say why there is one
5:07
way of dealing with this missing value in this specific case and that is let's say we
5:12
have this function and we want to create a column greeting and that would contain the
5:17
high and the last name. Let's print this and if we if we look at this we know that
5:26
string concatenation would not print the whole object even the high when one of the object
5:34
is missing that we are combining. One way of working with this is we can use this specific
5:39
function I'm not sure how to pronounce this but what it does is it take the first non
5:45
missing value. So we know that string concatenation if there is any missing value would print
5:51
a missing value right. So what we want is instead of this N A we just want a high salutation
5:58
So let's just say we are sending an email and if you know the last name we would send
6:02
it Hi Curry or Hi Paul but if you do not know the last name would you just send an email
6:08
with with a high greeting right. So what we are doing is we know that this thing would
6:14
present if it presents let's say if it if it gives us something non missing value then
6:22
this function would get the value from this object. Otherwise if it is a missing value
6:27
this object presents a missing value then it would it would print Hi. So what it would
6:32
do is it would print the first non missing value. Let me reopen that and we can see as
6:39
we expected we get Hi when there is a missing last name. Next let's move to this string
6:46
glue function. Now string glue does the same thing as string concatenation it combines
6:51
different objects or different strings. But the issue with string concatenation is we
6:57
should have all all the strings within these inverted commas right and it gets somewhat
7:02
tricky to work with. So the way around is to use the string glue function and now you
7:08
can see we just printed we just have written the the the strings by itself. And what about
7:16
the the variable name or the object name or the column name we would just enclose it in
7:22
the curly brackets. And so that gets us rid from the inverted commas right. It would print
7:29
the exact same thing but we now have an easy code to work with. Let's move to last function
7:36
of this video which is string flatten. Let's say we have this data frame where we have
7:41
two different students from a BS program and two students from an MS master's program
7:48
And what we want to do is we want to print a list of all the names that are there in
7:54
the BS program and all the names that are there in the MS program. So what we do is
7:59
we take the students data frame. Let me create this data frame. Let me show you the data
8:04
frame. This is the data frame how it looks like. We want to group the data. Take the
8:10
data frame group it by program names and now we want to summarize. So when we summarize
8:17
we we cannot use the string glue function we have to use the string flatten function
8:23
This is specifically designed for the summarize function. And what we do is we take the names
8:28
and separate them with a comma and a space right. So let me press control enter and let
8:33
me show you the result. So we can we have summarized. So we have BS program and these
8:39
are the two students that are there. We have Mike the name and the Mike and Edwards are
8:45
separated by a space and a comma. That happens also for the MS program. So I hope that was
8:52
useful. Do subscribe to this channel. Do hit the bell icon and thanks for watching this video


