
Performing data operations
The following are the different data operations available in R:
- Arithmetic operations
- String operations
- Aggregation operations
Arithmetic operations on the data
In this dataset, we will see the arithmetic operations performed on the data. We can perform various operations such as addition, subtraction, multiplication, division, exponentiation, and modulus. Let's see how these operations are performed in R. Let's first declare two numeric vectors:
a1 <- c(1,2,3,4,5) b1 <- c(6,7,8,9,10) c1 <- a1+b1 [1] 7 9 11 13 15 c1 <- b1-a1 [1] 5 5 5 5 5 c1 <- b1*a1 [1] 6 14 24 36 50 c1 <- b1/a1 [1] 6.000000 3.500000 2.666667 2.250000 2.000000
Apart from those seen at the top, the other arithmetic operations are the exponentiation and modulus, which can be performed as follows, respectively:
c1 <- b1/a1 c1 <- b1 %% a1
Note that these aforementioned arithmetic operations can be performed between two or more numeric vectors of the same length.
We can also perform logical operations. In the following code, we will simply pass the values 1 to 10 to the dataset and then use the check condition to exclude the data based on the given condition. The condition actually returns the logical value; it checks all the values and returns TRUE
when the condition is satisfied, or else, FALSE
is returned.
x <- c(1:10) x[(x>=8) | (x<=5)]
Having seen the various operations on variables, we will also check arithmetic operations on a matrix data. In the following code, we define two matrices that are exactly the same, and then multiply them. The resultant matrix is stored in newmat
:
matdata1 <-matrix(1:25, nrow=5,ncol=5, dimnames=list(rnames, cnames)) matdata2 <-matrix(1:25, nrow=5,ncol=5, dimnames=list(rnames, cnames)) newmat <- matdata1 * matdata2 newmat
String operations on the data
R supports a number of string operations. Many of these string operations are useful in data manipulation such as subsetting a string, replacing a string, changing the case, and splitting the string into characters. Now we will try each one of them in R.
The following code is used to get a part of the original string using the substr
function; we need to pass the original string along with its starting location and the end location for the substring:
x <- "The Shawshank Redemption" substr(x, 6, 14) [1] "Shawshank"
The following code is used to search for a pattern in the character variables using the grep
function, which searches for matches. In this function, we first pass the string that has to be found, then the second parameter will hold a vector; in this case, we specified a character vector, and the third parameter will say if the pattern is a string or regular expression. When fixed=TRUE
, the pattern is a string, where as it is a regular expression if set as FALSE
:
grep("Shawshank", c("The","Shawshank","Redemption"), fixed=TRUE) [1] 2
Now, we will see how to replace a character with another. In order to substitute a character with a new character, we use the sub
function. In the following code, we replace the space with a semicolon. We pass three parameters to the following function. The first parameter will specify the string/character that has to be replaced, the second parameter tells us the new character/string, and finally, we pass the actual string:
sub("\\s",",","Hello There") [1] "Hello,There"
We can also split the string into characters. In order to perform this operation, we need to use the strsplit
function. The following code will split the string into characters:
strsplit("Redemption", "") [1] "R" "e" "d" "e" "m" "p" "t" "i" "o" "n"
We have a paste
function in R that will paste multiple strings or character variables. It is very useful when arriving at a string dynamically. This can be achieved using the following code:
paste("Today is", date()) [1] "Today is Fri Jun 26 01:39:26 2015"
In the preceding function, there is a space between the two strings. We can avoid this using a similar paste0
function, which does the same operation but joins without any space. This function is very similar to the concatenation operation.
We can convert a string to uppercase or lowercase using the toupper
and tolower
functions.
Aggregation operations on the data
We explored many of the arithmetic and string operations in R. Now, let's also have a look at the aggregation operation.
Mean
For this exercise, let's consider the mtcars
dataset in R. Read the dataset to a variable and then use the following code to calculate mean
for a numeric column:
data <- mtcars mean(data$mpg) [1] 20.09062
Median
The median can be obtained using the following code:
med <- median(data$mpg) paste("Median MPG:", med) [1] "Median MPG: 19.2"
Sum
The mtcars
dataset has details about various cars. Let's see what is the horsepower of all the cars in this dataset. We can calculate the sum using the following code:
hp <- sum(data$hp) paste("Total HP:", hp) [1] "Total HP: 4694"
Maximum and minimum
The maximum value or minimum value can be found using the max
and min
functions. Look at the following code for reference:
max <- max(data$mpg) min <- min(data$mpg) paste("Maximum MPG:", max, "and Minimum MPG:", min) [1] "Maximum MPG: 33.9 and Minimum MPG: 10.4"
Standard deviation
We can calculate the standard deviation using the sd
function. Look at the following code to get the standard deviation:
sd <- sd(data$mpg) paste("Std Deviation of MPG:", sd) [1] "Std Deviation of MPG: 6.0269480520891"