There are different types of data structures in R. Data structures allow you to manipulate, store and organize data efficiently. The different types of data structures in R include vectors, matrices, data frame and lists.
We will focus on vectors in this article, and leave the rest of data structures for the later articles. Vectors can be only one dimension, which means that you can either save them in a column or in a row form. The vectors are the most basic data structures in R and holds the homogenous data types. By homogenous we mean, vectors store the same type of data, such as numeric, character, logical or integer values.
Download Example FileGoing in detail for the homogeneity of the data type, the numeric data would have all the numeric values or observations in data type. For example, the data shown below is numeric data. It can either be in row form or the column form.
Or it can be shown in the row form as shown below. But the column and row is just a visualization, a vector is neither a row nor a column. The number 1, 2, 3 and so on shown on the top of elements are the index numbers which are used for identification of the corresponding values or elements in the vector. So the index 5 will be used for the identification of element 33 and index 2 for the element 34.
Similarly, character data would have all text or sentences as values or observations. The example of character vectors is shown below
If you add a numeric value to the character data, as shown following, R will consider it a character because it is enclosed in quotation marks and no arithmetic function will be performed on this data.
Similarly, even a single value is considered vector in R, and there is no limit to minimum how many observations should be included to consider data as vector. The single observation, will also be considered vector.
Creating vectors in R:
Having discussed the nature of vectors, let’s move towards creating vectors in R. If we want to store an observation, say 5 and 10, in two vectors in R, we would create the two vectors by using the following command
vec1<- 5 vec1<-10
To access the value of vector, we use the following command.
vec1 vec2
This would provide us with the following result.
Moving on, if we want to store certain number of observations in a vector, it can also be done. For example, if we want to store some values in numeric vector, we will use the following command
num <- c(55,22,1,42)
The above numeric vector is created using the c() function or the combine function.
Similarly, if we want to store characters or words or sentences in a vector, that will be stored in character vector in R. Again it can also be created using c() function, just like numeric vectors. For that purpose, we use the following command, having certain character values.
chr <- c("abc","jhon","xyz",9,"USA","He")
As we know, character vectors are commonly used to represent strings, labels, and textual data in R, so no arithmetic function can be performed on them.
Indexing of the vectors
Character vectors are indexed similar to the numeric vectors, where the indexing starts from 1. So to access a certain element in the vectors, we use its index number. For instance, in above character vector we stored, if we want to access 9 number, we will use its index number which is 4 for this purpose. The command will be as following
chr[4]
The 4th character has a value of 9 which is shown in the index below
Similarly, if you want to access multiple elements at different indices, then use their index number, as shown in the command below
chr[c(2,3,6)]
The above command will provide access to characters at 2nd, 3rd and 6th number which are John, xyz and He respectively.
We can also access a range of elements from the vectors. For instance, in above example, if we want to access elements from XYZ to the USA, we use their index number to access the range. The command will be as given below
chr[4:6]
Similarly, if you want to access a range of elements and then a certain element in the same operation, this can be done using the indices of the elements in the following way.
chr[c(3:5,1)]
In the above command, the elements from 3 to 5 and then 1st element is accessed, using their index numbers
We can also exclude an element from the vectors in R from accessing by using the minus sign with the index of the element. This minus sign will provide the all elements except that certain element with which minus sign is used. If we want to exclude XYZ from the elements, while accessing them, then the index number of XYZ which is 3, will be mentioned. The command will be as following
chr[-3]
You can also exclude multiple elements from the vector while accessing it. The command for excluding multiple elements will be
chr[-c(2,3,5)]
Add or replace certain values in the vectors in R
To replace or add a certain element in the vector, use it in the following way. If you want to replace second character by Alberto, and then access it in console window, we use following commands
chr[2] <- "Alberto" chr
Following output will be provided by the above commands
Similarly, to add an element in the vector, the following command will be used with. UK will be the new element in the command, and the output can also be accessed
chr[7] <- "UK" chr
The above commands give following output
Check data type of vectors in R
To check the data type, we use the following command. This command will provide us whether the data type is vector or not. If it is not vector, then we will get the return as FALSE. However, if the data type is actually vector, the output will be TRUE.
is.vector(chr)
The above command will give us following results
Similarly, if we want to check what kind of vector is it, we use the name of vector subtype in the command. For instance, if we need to check whether it is numeric data type or not, and the character data type or not, we use the following commands
is.numeric(chr) is.character(chr)
The command gives us following results
To determine the number of elements in the data type, we use the length()
function in R. This function, when used, will tell us the number of elements in an object. For instance, if we use the following command
length(chr)
We get the following result for the above command, which tells us that number of elements are 10 in the character data.
Now if we want to check if the value of a vector is greater than the other vector it can also be done. Remember that, we created two vectors vec1 and vec2 at the very start of article, and their values are as shown below
If it is required for us to see whether vector 1 is greater than vector 2 or not, we can use the following command
vec1 > vec2
This gives us the following results, showing that vector 1 isn’t greater than vector 2.
Vectorization
Let’s now understand the concept of vectorization in R. Vectorization is a powerful concept in R that sets it apart from many other programming languages. To understand vectorization, consider two vectors: Vector1
containing the numbers 1 through 6, and Vector2
contains the the numbers 7 through 12.
To create 1st and 2nd vector, we use the following commands
v1 <- 1:6 v2 <- 7:12
Now, if you want to add the corresponding elements of these vectors in R together, such that 1st row of 1st vector and 1st row of second vector is added, R’s vectorization allows you to perform the operation directly on the entire vectors.
v3 <- v1+v2
So a new vector v3 will be created which has the sum of 1st and 2nd vector’s corresponding values. The vectorization takes place by the following process as shown below
In above image, when adding these vectors element-wise, R would take the first element of Vector1 (which is 1) and add it to the first element of Vector2 (which is 7), resulting in 8. It would then move on to the second element of both vectors, adding 2 and 8 to yield 10, and so on. This process continues until the last elements are reached.
In the absence of vectorization, you would need to manually loop through each element of the vectors, performing the addition one pair at a time. This could lead to less efficient way.
One important thing to remember that dimension of the both vectors should be same, because if one of the vectors have fewer elements than the other, then vectorization process might not occur smoothly. The process will only continue adding the corresponding values until the same length of both vectors reached. For example, above, we created two vectors, v1 and v2. V1 has elements from 1 to 6 and v2 has elements from 7 to 12, so both vectors have 6 elements. If we reduce the number of elements in vector 2, from 7 to 10 instead of 7 to 12, then what can happen?
First, reduce the number of elements in vector 2 by using the following command
v2 <- 7:10
Now if we again add vector 1 and vector 2, using the following command
v3 <- v1+v2
A warning message will be shown that would notify that one of the vectors has shorter length than the other. The warning is shown as below
The above message was a warning rather than an error. The warning is that the command for adding both vectors is executed, but not including the whole length of either of one of the vectors. If we access the newly created vector 3, by the following command
v3
it shows the following results.
R started to repeat the smaller vector till the point it has the same length as the larger vector. V2 has started to repeat it self as shown in the picture below.
Moving on, we have been saying repeatedly that arithmetic operations cannot be performed on character vectors. We can also verify it. Let’s say, we add vec1, a numeric vector to character vector,by using the following command
aa <- v1+chr
Given that there is a non-numeric vector too, vectors wouldn’t add, and R will give the following error
Sequence and Repetition of Vectors
To create a vector that contains a sequence of numbers, like from 1 to 5, you can use the seq()
function or simply the colon operator. For example, seq(1, 5) or 1:5
would both give you the same result. The result will be a vector with numbers from 1 to 5, in a sequence.
To create numbers from 1 to 5 in a sequence in vectors in R, use the following command
seq(1,5)
Or the following command, which will give the same results
1:5
The seq()
function also allows you to specify an increasing value. For instance, if you use seq(1, 17, by = 3)
, it generates a sequence that starts from 1, increases by 3 each time, and stops by 17. The command for this will be
seq(from =1, to =17, by = 3)
This generates the following results, showing that vector starts from and ends at 16, increasing values by 3.
Another way to generate the same results, is by using the simpler command below
seq(1,17,3)
The command generates same results as the previous command did.
For repetition, we can use the rep()
function. In this case, if you want to repeat a single value, you can write the value with the rep() function and then add the number of times you want that value to be repeated.
rep("ThedataHall", 4)
The above command will give you a vector containing “ThedataHall” repeated four times, as shown below
If we want a certain numeric value to be repeated, the following command should be used
rep(1, times=4)
This will repeat 1 for 4 times in the output.
Similarly, we can repeat a sequence of values, such as “car”and “bus” in as many times as we want. For instance, we want car and bus to be repeated in a sequence. First, we create the vector containing car and bus, using the following command
vec <- c("car","bus")
Once the vector has been created, now use the following command to repeat and bus in alternating five times.
rep(vec,5)
The following result will be generated,
Another way to generate the same result is by using the following command
rep(c("car","bus"),5)
Generating Random Numbers
Generating random numbers in vectors in R can be done using the runif()
function. For example, if you want 44 random numbers to be generated, you can use the following command
runif(44)
This will generate 44 random numbers, as shown below
Similarly, if we want to have certain random number of variables, with a certain minimum and maximum value, it can also be done using runif() function. For instance, if we want 44 random numbers starting from a minimum value 22 and maximum value 77 the following command should be used.
runif(n=44, min=22, max=77)
This generates 44 random numbers between 22 and 77 as shown below