To get started with arrange function and visualize how it sorts data in R, first install and run the tidyverse package in R by using following commands
Once the library is loaded, use the data set using the given command below
The data set is of cars, having different variables including mileage, gear etc. The data is visualized as below.
To avoid clutter, we are going to limit our data to certain columns only. As the select function is used to filter certain columns in R, we use the select function with columns of our interest in the command in following way
example <- mtcars %>% select(gear,mpg)
Now, we want to have certain missing values so that we can also learn how the missing values behave in the arrange function. To create missing values, we use the following command,
This command will replace first two values of mpg variable into missing values.
example[1:2,"mpg"] <- NA
What the above command means is it selects the first two rows of mpg variable and replace the data in those rows with NA value.
This generates the NA values in the data as shown below
Arranging the data in ascending order
Now we want to arrange the data in ascending order. Using the arrange function, it will sort the data starting from lower values to higher values. The following command will be used to arrange data in ascending order.
A1 <- arrange(example,mpg)
This command will arrange the example data set created earlier by sorting the mpg variable in ascending order. The other variable, gear is not arranged in ascending manner because it was not specified in the above command. This is shown as below
Another way to write this command is shown below, where the name of data set to be used is given before and then arrange function is used along with the pipe operator.
A1 <- example %>% arrange(mpg)
The above command generates the same results as the previous did.
Sorting the data in Descending order
After arranging the data in ascending order, we need to learn how to arrange it in descending order. What could be the trick? You got it right. We use the negative sign “-“with the variable that we want to be arranged in the descending order. The command will become as following
A1 <- arrange(example,-mpg)
The data will look as following
Or we can use the function desc with the variable to sort it in descending order. The command with the function desc will be as given blow
A1 <- arrange(example,desc(mpg))
Data will be sorted only for the variable, for which R is instructed to sort. For instance in this example data set that was created for gear and mpg variable, only mpg variable is being sorted out, because we are using the signs or arrange function with mpg variable. If for instance, we use the arrange function with both of the variables, then both gear and mpg will be sorted out.
The command will be as following
A1 <- arrange(example,gear,mpg)
The data is visualized as below
However, if we want one variable to be arranged in the ascending order and the other one in descending order, then the command will be as following.
A1 <- arrange(example,gear,-mpg)
The negative sign will be used with the variable that is required to be sorted in descending way.
Sort function in R to sort data
Another way to sort the data set is by using the sort function in R. Sort function in R works a bit differently than arrange function to sort data in R. In sort function, we use $ (dollar sign is used to access a specific column within a dataframe) sign with the given variable that is required to be sorted. For instance, if we want to arrange the mpg variable using the sort function, the command will look like following
S1 <- sort(example$mpg)
The dollar sign
$ is used to extract a specific column from a data frame. In this case, the values of mpg variable are sorted and saved in the vector form rather than a data frame.
If we wish to sort values in descending order, then we use the decreasing argument with the sort function. For instance, in the case of mpg variable, the sort function with decreasing function will look like below in a command
S1 <- sort(example$mpg, decreasing = TRUE)
This is used to sort the values in the
mpg column of the
example data frame in descending order (highest to lowest) and assign the sorted values to a new vector
Order function in R to sort data set
Similarly, there is another function named order function that tells us the order or position of the value in a vector. For example, in mpg variable, if we use the order function, it would give us the indices of the values. To explain this further, let’s use this order function in a command, as shown below
O1 <- order(example$mpg)
In the above command, example is the data frame to be used and mpg is the variable that we will use for sorting. The order command doesn’t sort the actual values, it will just give the position by which the values should be sorted out. Visualizing the above command will give us following results
Order function tells R to figure out the order in which these mpg values should be arranged from lowest to highest. It doesn’t actually change the numbers themselves, just figures out their order.
Now if we get the order of S1 that we sorted earlier on the basis of mpg variable, we use the following command
O2 <- order(S1)
This would give us the order of the S1 as shown below
We can also use the order function with the example data frame to sort it based on the mpg variable. Following command will be used for that
ODF1 <- example[order(example$mpg),]
So when we execute the above command, R will create a new data frame
ODF1. In this data frame, the rows are ordered based on the ascending values in the
mpg column. This new data frame will have the same columns as the original
example data frame, but the rows will be reordered.
Similarly, if we want to sort multiple variables, using the order function, then we use the dollar sign with the variable required to be sorted. The command will become as following
ODF1 <- example[order(example$gear,example$mpg),]
In the above command, both gear and mpg variables are required to be sorted. Executing the above command gives us following results
To arrange the data in descending order,
rev function can be used in R, along with order function. The rev function will reverse the order of the values in the data frame. The command with reverse function looks like following
ODF1 <- example[rev(order(example$mpg)),]
Mpg variable will be sorted in the descending manner by using above command.
Remember that we created NA or missing values in mpg variable at the very start of the article. To arrange these missing values, and make them present either at the start or at the end of the variable, we use the order function in the following way
ODF1 <- example[order(example$mpg, na.last=TRUE),]
In the above command,
na.last=TRUE function will sort the missing values at the end of the variable, as shown in the image below
What if we use na.last = FALSE function instead? This function will arrange the missing values at the very start of the article. So the na.last=TRUE will arrange values in descending order and na.last=FALSE will arrange them in ascending order. The command with FALSE function will be as following
ODF1 <- example[order(example$mpg, na.last=FALSE),]
To remove the missing values from the data, we use the order function in the following way, which will remove the missing values from the data set.
ODF1 <- example[order(example$mpg, na.last=NA),]
The missing values have been removed from the mpg variable by using above command.