Data frames in R are another data structure used for organizing and manipulating data in a table form. These are two-dimensional structure where rows represent observations and columns represent variables.
Creating a data frame requires using
data.frame() function, where the variables are specified by naming vectors or columns. For example, to create a data frame for individual’s first name, last name and their age, you can use the following command
full_name <- data.frame(f_name = c("Stephen","Chris","Derrick"), l_name = c("Cury","Paul","Rose"), age = c(32 , 29 , 34))
The data frame named full_name is created from the above command, which has three variables in it. The output is shown as below
The same data frame can be created by first creating the vectors separately, and then form data frames from them in R. The commands for that would be following
f_name <- c("Stephen","Chris","Derrick") l_name <- c("Cury","Paul","Rose") age <- c(32 , 29 , 34) full_name <- data.frame(f_name,l_name,age)
The above command will create a data frame with three vectors named, f_name, l_name, and age.
To get the data type of each of the variables in the full_name data frame, you use the following command
This will give the detail of each variable, where the first two variables are character data and the third one is numeric data.
What if we want to create a data frame from the matrix? This could also be done using the data.frame() function in the following way
mat <- matrix(1:10,5)
dt <- as.data.frame(mat)
The following data frame is generated from the matrix.
Similarly, overall summary of data frame created above can also be retrieved from the following command, giving the summary statistics of the numeric variable.
The output generated from the above command is following
Because data frames represent tabular data, each of the variable should have the same number of elements or observations. If one of the variables have less elements in the data frame, the error will be shown by R. For instance, by running the following command, there would be error given by R because we have three datapoints for first and last name but only two datapoints for age
full_name <- data.frame(f_name = c("Stephen","Chris","Derrick"), l_name = c("Cury","Paul","Rose"), age = c(32 , 29))
The following error will be shown
You can balance equal the number of observations in either of the variables by replacing it with “NA”, which will help in executing the command smoothly.
full_name <- data.frame(f_name = c("Stephen","Chris","Derrick"), l_name = c("Cury","Paul","Rose"), age = c(32 , 29, NA))
Accessing data from Data Frames in R
You can access the data from data frames in R, using index of columns and rows, or using index of either column or rows. For instance, if we need the information on first column and first row, or 3rd row and 2nd column, or only information of a certain row; 3 or a certain column 2, we can use the index of the respective rows and column, in the following way
full_name[1,1] full_name[3,2] full_name[3,] full_name[,2]
The data accessed will be as following
Index numbers and names of the columns/rows can be alternatively used to access data, in the following way.
This will retrieve data from the respective rows and columns. To retrieve the specific element within a specific column, the dollar sign can also be used.
The above command retrieves the third element from the “f_name” column.
If you want to generate a sequence of numbers in columns, it can be done using the following command
a <- data.frame(c1 = 1:50, c2 = 51,100)
This will generate data set with two columns, c1 and c2, having numbers from 1-50 and 51-100 respectively.
If you want to access a few rows of the data set, as created above, the head() function can be used. In this case, it will show you the first few rows of the data frame
a that you created.
Similarly, if you want a specified number of rows to access from data set, it can be done in the following way
This will retrieve 1st 10 rows from the data.
Assigning Rows and Column Names
One can also assign names to rows and columns present in the data frames in R. It’s quite simple to assign names and rows in data frame, as shown in the image below
colnames(full_name) <- c("First_n","Last_n","Age") rownames(full_name) <- c("R1","R2","R3")
The above command will assign the given names to the columns and rows.
Functions in Data Frames
There are many functions that can be applied to data frames to perform various operations in R, just like functions in matrices. The first function is
is.data.frame that is used to check whether an object is data frame or not. It can be used in the following way for the full_name data
Similarly, like data.frame, there are other functions that are used to check other type of data structures and/or the specific variable within the data structure i.e. variables are numeric, character or logical etc. Those functions can be used in the following way in a command
is.character(full_name$age) is.numeric(full_name$age) is.logical(full_name$f_name)
Given that only age is the numeric data type, all other output will be FALSE except the age being numeric.
Similarly, we can also check dimension of the data set, to know the number of rows and columns in it. The following command should be used for that
To get to know the names of columns in the data set, we use the following command
The names of rows and columns can be taken separately using the commands below
Similarly, to check the number of rows and columns in a data set, use the following command
To check the length of data frames in R, we use the following command. However, it doesn’t give the number of observations for the data frame. It only gives the number of columns or rows.
To get the number of observations for the data frame, use the following command instead
The as.matrix() function will convert the data frame to matrix and then check its length.