A quick GNU R tutorial to basic operations, functions and data structures

Introduction

In the last two articles we have learned how to install and run GNU R on the Linux operating system. The purpose of this article is to provide a quick reference tutorial to GNU R that contains introduction to the main objects of the R programming language . We will learn about basic operations in R, functions and variables. Moreover, we will introduce R data structures, objects and classes.

Basic Operations in R

Let us start with a simple mathematical example. Enter, for instance, addition of seven and three into your R console and press enter, as a result we obtain:

> 7+3
[1] 10

To explain in more detail what just happened and what is the terminology we use when running R, we say that the R interpreter printed an object returned by an expression entered into the R console. We should also mention that R interprets any number as a vector. Therefore, “[1]” near our result means that the index of the first value displayed in the given row is one. This can be further clarified by defining a longer vector using the c() function. For example:

>c(1:100)
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
 [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
 [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
 [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
 [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
 [91]  91  92  93  94  95  96  97  98  99 100

It is also possible to perform operations on vectors. For instance, we can add two vectors as follows:

> c(1,5,67,0)+c(0,1.5,6.7,3)
[1]  1.0  6.5 73.7  3.0

Note that this means adding corresponding elements of these two vectors. If the vectors are not the same size then the shorter vector is repeated multiple times and if the longer object length is not a multiple of the shorter object length a warning message is produced:

> c(1,5,8,9)+c(0, 1.4)
[1]  1.0  6.4  8.0 10.4
> c(1,5,8,9)+c(0, 1.4,7)
[1]  1.0  6.4 15.0  9.0
Warning message:
In c(1, 5, 8, 9) + c(0, 1.4, 7) :
  longer object length is not a multiple of shorter object length

Moreover, we can define character vectors in R as:

> c("linuxcareer.com", "R tutorial")
[1] "linuxcareer.com" "R tutorial"

Finally, to make a comment to R code we use “#”. In particular,

> # This is a comment in R code

Functions and Variables

We can also define or use predefined functions in R. Most functions in R are constructed in the following form

f(argument1, argument2,...)

Here “f” is the name of the function and “argument1, argument2,…” is the list of arguments to the function. For example, using some predefined functions we obtain

> sin(pi/2)
[1] 1
> log(3)
[1] 1.098612

In contrast to the above example, some functions in R are in the form of operators like addition, power, equality, etc. For instance, the equality operator produces a Boolean data type outcome (FALSE/TRUE):

> 4==4
[1] TRUE

Similarly as in other programming language R uses variables. The assignment operator is here “<-” (or “=”), for instance

> x<-c(1,4,7)
> x+x
[1]  2  8 14

We can now refer to the third value of the vector “x” by

> x[3]
[1] 7

or fetch only members less than seven:

> x[x<7]
[1] 1 4

We can also, for instance, fetch items one and three as

> x[c(1,3)]
[1] 1 7

Finally, you can define functions in R by simply naming them accordingly and then calling them with this name similarly to the build in R functions. For example:

> myfunction<-function(x,y){x+y}
> myfunction(4,5)
[1] 9

If you would like to see the code corresponding to a given function simply type in the name of the function as

> myfunction
function(x,y){x+y}

Data Structures

As a first example of a data structure we illustrate how to define matrices (arrays), that is multidimensional vectors.

We can, for instance, define an array explicitly as follows

> a<-array(c(1:24),dim=c(6,4))
> a
     [,1] [,2] [,3] [,4]
[1,]    1    7   13   19
[2,]    2    8   14   20
[3,]    3    9   15   21
[4,]    4   10   16   22
[5,]    5   11   17   23
[6,]    6   12   18   24

Or we can first create a vector and use the matrix() function, that is

v<-c(1:24)
> m<-matrix(data=v,nrow=6,ncol=4)
> m
     [,1] [,2] [,3] [,4]
[1,]    1    7   13   19
[2,]    2    8   14   20
[3,]    3    9   15   21
[4,]    4   10   16   22
[5,]    5   11   17   23
[6,]    6   12   18   24

It is also possible to define more then two dimensional array as

> w<-array(v,dim=c(3,2,4))
> w
, , 1

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

, , 2

     [,1] [,2]
[1,]    7   10
[2,]    8   11
[3,]    9   12

, , 3

     [,1] [,2]
[1,]   13   16
[2,]   14   17
[3,]   15   18

, , 4

     [,1] [,2]
[1,]   19   22
[2,]   20   23
[3,]   21   24

Referring to a value of a part of an array is again simple, for instance

 > w[1,1,1]
[1] 1
> w[1:2,1:2,1]
     [,1] [,2]
[1,]    1    4
[2,]    2    5

By omitting the indices we obtain all elements of a given dimension, such as:

 > w[,1,1]
[1] 1 2 3

Let us now look at more complicated data structures with more then one underlying data type. These data types are called lists. Lists in R may contain a various selection of objects of various data type. We can name each component of a list, and therefore, we can later refer to that component by its name or location. For example,

> l<-list(name="linuxcareer.com",visitors="10,000")
> l
$name
[1] "linuxcareer.com"

$visitors
[1] "10,000"

We can now refer to the components of the list by name or by location as indicated below

> l$visitors
[1] "10,000"
> l[1]
$name
[1] "linuxcareer.com"

> l[[1]]
[1] "linuxcareer.com"

A data frame is a list that contains multiple named vectors with the same lengths. It is similar structure to a database. Let us now construct a data frame that contains some exchange rates (other currency/USD):

> currency<-c("Kroner", "Canadian $", "Hong Kong $", "Rupees")
> date_090812<-c(6.0611,0.9923,7.7556,55.17)
> date_100812<-c(6.0514,0.9917,7.7569,55.18)
> exchangerate<-data.frame(currency,date_090812,date_100812)
> exchangerate
     currency date_090812 date_100812
1      Kroner      6.0611      6.0514
2  Canadian $      0.9923      0.9917
3 Hong Kong $      7.7556      7.7569
4      Rupees     55.1700     55.1800

We can now refer to a particular element of a data frame by its name. For instance, we may need to specify the exchange rate Hong Kong $/USD on 090812. We can achieve this in the following way

>exchangerate$date_090812[exchangerate$currency=="Hong Kong $"]
[1] 7.7556

Objects and Classes

R is an object oriented programming language. This means that every object in R has a type and is a member of a class. To identify a class for a given object we use the function class() as in the following example:

> class(exchangerate)
[1] "data.frame"
> class(myfunction)
[1] "function"
> class(1.07)
[1] "numeric"

In R not all functions are associated with a particular class as in other object oriented programming languages. However, there exist some functions that are closely linked with a specific class. These are called methods. In R methods called generic functions share the same name for different classes. This allows such generic functions to be applied to objects of different types. For instance, “-” is a generic function for subtracting objects. You can subtract numbers but you can also subtract number from a date as below:

> 4-2
[1] 2
> as.Date("2012-09-08")-2
[1] "2012-09-06"

Conclusion

The aim of this basic R tutorial was to introduce R programming language to beginners, who never used R before. This tutorial may also be useful as a reference tutorial for those who will learn more advanced applications of the R statistical software. In the next article we will describe how to define statistical models and perform basic statistical analysis with R. This will be combined with illustration of graphical possibilities of the R software.


GNU R tutorial series:

Part I: GNU R Introductory Tutorials:

  1. Introduction to GNU R on Linux Operating System
  2. Running GNU R on Linux Operating System
  3. A quick GNU R tutorial to basic operations, functions and data structures
  4. A quick GNU R tutorial to statistical models and graphics
  5. How to install and use packages in GNU R
  6. Building basic packages in GNU R

Part II: GNU R Language:

  1. An overview of GNU R programming language