Using mutate function to create a variable in Data frame

In a data frame in R, one can create or modify variables either using base R method or by using the mutate function from the tidyverse package. Lets first discuss the base R method. Load the following data in R for further understanding of how to create a variable in a date frame,

`data(mtcars)`

The data loaded contains information about the cars, including their mileage, weight, display, gears etc. Now, if we want to create a new variable, where we want weight of cars, a variable already present in data, in unit pounds instead of thousand pounds, we can use the following command

`mtcars\$weight <- mtcars\$wt * 1000`

What this command does is, it will take the weight variable in data, and by multiplying it by 1000, it will change the measurement of weight from a thousand to unit pounds.

Similarly, to create another variable which contains the ratios of two variables, we use the following command

`mtcars\$mpg_hp <- mtcars\$mpg / mtcars\$hp`

The above command takes mpg variable and divides it by horsepower of cars to create their ratio.

The commands we used are, however, in the base R, and can be found complex by users if syntaxes are not used carefully. Thus, one can use the mutate() function that makes it easier to create or modify the variables in R.

Create a variable using mutate() function

A simpler way to modify or create a new variable in R is by using mutate() function in the tidyverse package. To work with mutate() function, we need to install and load tidyverse package in R by using the following command

`install.packages(tidyverse) library(tidyverse)`

As we already did some modifications in the mtcars data by creating new variables, we load the data set again to use the mutate function. The command for loading data set is

`data(mtcars)`

Once the data is loaded, we again create a new variable for the weight to be measured in unit pounds instead of thousand pounds, as we did earlier. However, by using the mutate function, the command gets easier. The commands for creating a new weight variable is following

`mutate(mtcars, weight = wt*1000)`

The command carries out the operation for multiplying the weight with a thousand, but doesn’t create a new variable. Note that in above command, we simply multiplied weight by a thousand, instead of making it complex by using dollar signs, as done previously. Also remember that the above command will multiply wt with 1000 but not create new variables because we haven’t assigned it to any object.

To create the variable named weight, we use the following command,

`mtcars <- mutate(mtcars, weight = wt*1000)`

Creating Multiple Variables in a data frame using mutate function

Multiple variables can also be created at a time in a date frame in R using the mutate() function. But before going into creating multiple variables simultaneously, load the mtcars data set again, because we modified the data in above command by selecting a few columns. Similarly, again create the weight variable by using the command used earlier. The commands for both operations will be as following

`data(mtcars) mtcars <- mtcars %>% mutate(weight = wt*1000) `

Now, for instance, we want to change the weight measurement from pounds to kilograms. Then we want to take the mean of the weight variable, we can use mutate() function for that. Similarly, if we want to add two new variables, horsepower per cylinder and displacement per cylinder, along with the previous operation for weight variable, all this can be done in the single command. The command is shown as following

`mtcars <- mtcars %>% mutate(weight_kg = weight * 0.453,              mean_weight = mean(weight),              hp_per_cyl = hp / cyl,                      disp_per_cyl = disp / cyl) `

Thus, instead of creating these variables individually, which can cost much time, using mutate() function makes the task easier. The above variables have been created in the data frame, as shown below

Creating categories for multiple variables in data frame

We can also create categories for the variables in R using mutate() function. For this purpose, we can choose columns in data set and then define categories based on certain conditions. For instance, if we want to create categories for the mileage variable(mpg), based on its efficiency, where a value of 16 or less is less efficient, and value greater than 16 is efficient, we use the following command

`mtcars <- mtcars %>%                  mutate(efficiency = ifelse(mpg < 16,"Less","More"))`

Another variable by the name of efficiency will be created. This variable will have “Less” or “More” categories, depending on whether the value is less than 16 or not.

Similarly, multiple categories for multiple variables in data frame can be created simultaneously by using the mutate() function. Let’s say want to create categories for weight, where cars with a weight less than or equal to 1000 kilograms are categorized as “Small,”. The cars with a weight greater than 1000 kilograms and less than or equal to 2000 kilograms are categorized as “Medium”. And all other cars are categorized as “Large.” In the same command, if we want to create categories for horsepower(hp), having categories as low medium and high, we can do all this by using following command

`mtcars <- mtcars %>%     mutate((weight_cat = case_when(                       weight_kg <= 1000 ~ "Small",                       weight_kg >1000 & weight_kg <=2000 ~ "Medium",                       TRUE ~ "Large")),                  hp_cat = case_when(                       hp <=100 ~ "Low",                        hp >= 100 & hp < 200 ~ "Moderate",                       hp >= 200 ~ "high")) `

This creates two new variables by the name of weight_cat and hp_cat for the categories of weight and horsepower, respectively. Be careful of using syntaxes at the right places while writing the commands in R. Even a small mistake like adding a comma at wrong place can cause an error in executing command.

You can notice the difference between this command and the command we used for creating categories of mpg variable. In the mpg variable command, we used ifelse function, because there were only two categories required to be created. However, in the latter command, we created multiple categories and that’s why case_when() function is used.

Mutate by Category

We can mutate the variables by category in R using the mutate function. The mutate function for this purpose will then work with additional functions like group_by. This will allow us to perform certain operations on categories of the variables in data. For instance, if we want to create the mean of mpg variable by using its categories, we use the mutate function in the following way

`mtcars <- mtcars %>%                  group_by(efficiency) %>%                  mutate(mean_mpg = mean(mpg))`

This will create a new variable by the name of mean_mpg, where the less efficient cars will have less mean.

Subscribe
Notify of