Data Frames in R

Data frames in R are another data structure used for organizing and manipulating data in a table form. These are two-dimensional structure where rows represent observations and columns represent variables.

Creating a data frame requires using data.frame() function, where the variables are specified by naming vectors or columns. For example, to create a data frame for individual’s first name, last name and their age, you can use the following command

Download Example File
full_name <- data.frame(f_name = c("Stephen","Chris","Derrick"),                                         l_name = c("Cury","Paul","Rose"),                                         age = c(32 , 29 , 34))

The data frame named full_name is created from the above command, which has three variables in it. The output is shown as below

use data frames in R

The same data frame can be created by first creating the vectors separately, and then form data frames from them in R. The commands for that would be following

f_name <- c("Stephen","Chris","Derrick")  l_name <- c("Cury","Paul","Rose")  age <- c(32 , 29 , 34)  full_name <- data.frame(f_name,l_name,age)

The above command will create a data frame with three vectors named, f_name, l_name, and age.

To get the data type of each of the variables in the full_name data frame, you use the following command

str(full_name)

This will give the detail of each variable, where the first two variables are character data and the third one is numeric data.

image-117

What if we want to create a data frame from the matrix? This could also be done using the data.frame() function in the following way

mat <- matrix(1:10,5)
dt <- as.data.frame(mat)

The following data frame is generated from the matrix.

data frames in R

Similarly, overall summary of data frame created above can also be retrieved from the following command, giving the summary statistics of the numeric variable.

summary(full_name)

The output generated from the above command is following

image-118

Because data frames represent tabular data, each of the variable should have the same number of elements or observations. If one of the variables have less elements in the data frame, the error will be shown by R. For instance, by running the following command, there would be error given by R because we have three datapoints for first and last name but only two datapoints for age

full_name <- data.frame(f_name = c("Stephen","Chris","Derrick"),                                         l_name = c("Cury","Paul","Rose"),                                         age = c(32 , 29))

The following error will be shown

Screenshot-2023-08-21-110010

You can balance equal the number of observations in either of the variables by replacing it with “NA”, which will help in executing the command smoothly.

full_name <- data.frame(f_name = c("Stephen","Chris","Derrick"),                                         l_name = c("Cury","Paul","Rose"),                                         age = c(32 , 29, NA))

Accessing data from Data Frames in R

You can access the data from data frames in R, using index of columns and rows, or using index of either column or rows. For instance, if we need the information on first column and first row, or 3rd row and 2nd column, or only information of a certain row; 3 or a certain column 2, we can use the index of the respective rows and column, in the following way

full_name[1,1] full_name[3,2] full_name[3,] full_name[,2]

The data accessed will be as following

R data frames

Index numbers and names of the columns/rows can be alternatively used to access data, in the following way.

full_name[,"f_name"]
full_name[3,"f_name"]

This will retrieve data from the respective rows and columns. To retrieve the specific element within a specific column, the dollar sign can also be used.

full_name$f_name[3]

The above command retrieves the third element from the “f_name” column.

If you want to generate a sequence of numbers in columns, it can be done using the following command

a <- data.frame(c1 = 1:50,                           c2 = 51,100)

This will generate data set with two columns, c1 and c2, having numbers from 1-50 and 51-100 respectively.

If you want to access a few rows of the data set, as created above, the head() function can be used. In this case, it will show you the first few rows of the data frame a that you created.

head(a)

Similarly, if you want a specified number of rows to access from data set, it can be done in the following way

head(a,10)

This will retrieve 1st 10 rows from the data.

Assigning Rows and Column Names

One can also assign names to rows and columns present in the data frames in R. It’s quite simple to assign names and rows in data frame, as shown in the image below

colnames(full_name) <- c("First_n","Last_n","Age") rownames(full_name) <- c("R1","R2","R3")

The above command will assign the given names to the columns and rows.

Functions in Data Frames

There are many functions that can be applied to data frames to perform various operations in R, just like functions in matrices. The first function is is.data.frame that is used to check whether an object is data frame or not. It can be used in the following way for the full_name data

is.data.frame(full_name)

Similarly, like data.frame, there are other functions that are used to check other type of data structures and/or the specific variable within the data structure i.e. variables are numeric, character or logical etc. Those functions can be used in the following way in a command

is.character(full_name$age) is.numeric(full_name$age) is.logical(full_name$f_name)

Given that only age is the numeric data type, all other output will be FALSE except the age being numeric.

Similarly, we can also check dimension of the data set, to know the number of rows and columns in it. The following command should be used for that

dim(full_name)

To get to know the names of columns in the data set, we use the following command

names(full_name)

The names of rows and columns can be taken separately using the commands below

colnames(full_name)
rownames(full_name)

Similarly, to check the number of rows and columns in a data set, use the following command

nrow(full_name)
ncol(full_name)

To check the length of data frames in R, we use the following command. However, it doesn’t give the number of observations for the data frame. It only gives the number of columns or rows.

length(full_name)

To get the number of observations for the data frame, use the following command instead

length(as.matrix(full_name))

The as.matrix() function will convert the data frame to matrix and then check its length.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x
Tweet
Share
Share
Pin