This article will deal mainly with the installation of R on Linux, but also will provide a simple example on how to use R for plotting. This is the first article of the series of R articles so subscribe to our RSS feed for regular updates. Everyone, who is interested in using R for their work or is simply interested in this software is invited to follow this series of articles. The main objective of these articles is to provide a quick reference to R with illustrative examples.
R is an open source programming language (software package) and environment used mainly for statistical data analysis. It is licensed under the GNU General Public License (GPL). R is a very intuitive programming language. You can do in a few lines of R code a lot, mainly because there is a large number of packages available for R, which means a large number of preprogrammed functions for you to use. You can get R packages through Comprehensive R Archive Network (CRAN).
R's strengths are: graphical visualization of data such as plots, data analysis, statistical data fits.
R's weaknesses are: complex structured data storage, querying data, dealing with large data sets, which do not fit in the computer's memory.
On Debian like Linux systems such as Debian, Ubuntu or Linux Mint you can install R from standard repositories. This is a preferred way of getting R installed on your system. The command bellow will download and install R along with all its prerequisites:
$ sudo apt-get install r-base
If you do not have the sudo command available on your system you need to login first as a root user and then install R with:
# apt-get install r-core
The same as it is with Debian like Linux systems you can install R on Redhat Linux and other Redhat like distribution spins using the yum command. The installation using the yum command is fully automatic where the only requirement is enabled EPEL repository. The below command will install R along with all its prerequisites:
$ sudo yum install R
This method of installing R on your system should be chosen as a last resort. Normally you install from the source code if you have some specific environment requirements, you cannot install from standard package repositories, you do not have root privileges to install new software on the system ( Linux / Unix Cluster ) or you desperately need the lasted R version for your work.
In order to get R installed on you system first download GNU R latest source code. Depends on the version number you will end up with a single gziped file called R-2.15.2.tar.gz. Second, you need to decompress it with the tar command:
$ tar xzf R-2.15.2.tar.gz
Based on the R version this will create a new directory. In our case the directory name will be R-2.15.2. Navigate to this directory and execute the pre-compilation script "configure":
$ cd R-2.15.2 $ ./configure
With the "configure" script you can supply a various flags to adjust the compilation to your environment. If you do not have any special requirements you can start compilation with :
This will compile R inside your home directory from where you can also start using it. The following step is optional as it required superuser privileges. If you have superuser privileges you can install a new software on the system with:
$ make install
For the purpose of the below simple example download the gnu-r-example.csv file and save it in your working directory.
Let us now run R on your Linux/Unix platform. First, go to your working directory using the cd command and then type the following:
$ R R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit) ....
This will run R on your Linux operating system.
Let us now provide a simple R example, which includes: first, fetching data from a comma separated file and second, plotting time series and producing a histogram.
In order to read a .csv file in R we use the read.csv function. For example,
> data<-read.csv('example.csv', header=F)
This function reads the numerical data stored in the gnu-r-example.csv file and allocates it to the variable called "data". Now "data" is a one column matrix. Therefore, in order to access values in the first column of "data" we write data[,1].
To plot the values stored in the variable "data[,1]" we use the plot function as follows:
The option 'type' in the function plot means what type of plot should be drawn. The plot method type='l' makes sure that we obtain a line type of plotting (values connected with a line).
The figure above displays the output of the above plot function. Additionally, let us provide an example for displaying a histogram corresponding to "data[,1]". This can be obtained as follows:
The output of this function is illustrated in the figure below.
In summary, we have described how to obtain and install R on the Linux\Unix platform. A simple example of read.csv and plot functions was also provided. As you can see, the installation of R under Linux requires only a one line command, which is extremely convenient. This article is the first in the series of articles about R. If you would like to continue to learn about R please subscribe to our RSS feed or simply regularly visit linuxcareer.com
GNU R tutorial series:
Part I: GNU R Introductory Tutorials:
Part II: GNU R Language: