In this quick GNU R tutorial to statistical models and graphics we will provide a simple linear regression example and learn how to perform such basic statistical analysis of data. This analysis will be accompanied by graphical examples, which will take us closer to producing plots and charts with GNU R. If you are not familiar with using R at all please have a look at the prerequisite tutorial: A quick GNU R tutorial to basic operations, functions and data structures.
Models and Formulas in R
We understand a model in statistics as a concise description of data. Such presentation of data is usually exhibited with a mathematical formula. R has its own way to represent relationships between variables. For instance, the following relationship y=c0+c1x1+c2x2+...+cnxn+r is in R written as
which is a formula object.
Linear regression example
Let us now provide a linear regression example for GNU R, which consists of two parts. In the first part of this example we will study a relationship between the financial index returns denominated in the US dollar and such returns denominated in the Canadian dollar. Additionally in the second part of the example we add one more variable to our analysis, which are returns of the index denominated in Euro.
In the last two articles we have learned how to install and run GNU R on the Linux operating system. The purpose of this article is to provide a quick reference tutorial to GNU R that contains introduction to the main objects of the R programming language . We will learn about basic operations in R, functions and variables. Moreover, we will introduce R data structures, objects and classes.
Basic Operations in R
Let us start with a simple mathematical example. Enter, for instance, addition of seven and three into your R console and press enter, as a result we obtain:
> 7+3  10
To explain in more detail what just happened and what is the terminology we use when running R, we say that the R interpreter printed an object returned by an expression entered into the R console. We should also mention that R interprets any number as a vector. Therefore, "" near our result means that the index of the first value displayed in the given row is one. This can be further clarified by defining a longer vector using the c() function. For example:
GNU R can be run on the Linux operating system in a number of ways. In this article we will describe running R from the command line, in an application window, in a batch mode and from a bash script. You will see that these various options for running R in Linux will suit a specific task. Some of them are more suitable for simple statistical analysis that can be done in one line of code, others for more sophisticated programs that require executions of a larger number of R expressions. Finally, we may want to run a program that will take a day or two to run on a Linux cluster. In this case we will run R in a background, which allows us for logging out from the cluster.
Running R from the Linux command line
Probably, the simplest way to run R under Linux is to run it from the Linux command line. That is,
As a result of this command the following appears:
R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.
This article will deal mainly with the installation of R on Linux, but also will provide a simple example on how to use R for plotting. This is the first article of the series of R articles so subscribe to our RSS feed for regular updates. Everyone, who is interested in using R for their work or is simply interested in this software is invited to follow this series of articles. The main objective of these articles is to provide a quick reference to R with illustrative examples.
What is GNU R?
R is an open source programming language (software package) and environment used mainly for statistical data analysis. It is licensed under the GNU General Public License (GPL). R is a very intuitive programming language. You can do in a few lines of R code a lot, mainly because there is a large number of packages available for R, which means a large number of preprogrammed functions for you to use. You can get R packages through Comprehensive R Archive Network (CRAN).
R's strengths are: graphical visualization of data such as plots, data analysis, statistical data fits.
R's weaknesses are: complex structured data storage, querying data, dealing with large data sets, which do not fit in the computer's memory.
Installing GNU R on Linux/Unix.
Package Management System
Debian / Ubuntu / Mint
On Debian like Linux systems such as Debian, Ubuntu or Linux Mint you can install R from standard repositories. This is a preferred way of getting R installed on your system. The command bellow will download and install R along with all its prerequisites:
First and foremost, a word of warning: while the previousarticles were centered around the beginner, this article is for more advanced users, that already "speak" a programming language or two, and want to customize their editor towards it being ideal for the task. So you are expected to be more or less proficient in the ways of emacs, to be able to use it for day to day tasks and have the ability and desire to learn something new. After all, it will be for your own reward, and your programming tasks will become more efficient. Our approach will be to take a few popular languages, show you how to configure emacs for the perfect development environment, then move on to the next language. Emacs configuration is made in a Lisp dialect called Elisp, but don't worry if you don't know it yet, we'll tell you what you need.
First, some background. This article is about emacs, not about any derivative like mg or jed that might or might not offer the desired functionality. That's because many derivatives were born from the need of creating a smaller emacs, since the original is pretty big, admittedly. So in the process of removing functionality there might just be just some functionality that's getting removed and we probably will need here. In short, emacs-only. Second, the files. In our examples, besides customizing the ~/.emacs file, we will create a directory named ~/.emacs.d/ where we will place our modes. Just as emacs knows what kind of syntax highlighting, indentation, etc. to use for several types of text, like HTML, TeX, C source code, and others, via modes, we can add/modify modes to our liking, to this is what we'll do. Practically speaking, a mode is a file with a .el extension (from Elisp) that will be dropped in ~/.emacs.d, then ~/.emacs will be altered for the editor to "know" about the new extension. You'll see in a few moments, depending on how fast you read.
Of course, we wouldn't have had it any other way: we wanted to be fair, as pledged, so here is the vim article, which is a counterpart of our last one on how to make your editor the perfect programming environment. So you must have the following profile for this article to be really useful to you: you know your way around programming, so you subsequently know what you would like in an editor, and you also know your way around vim, preferably more than what we talked about in the article dedicated to it. If you read the customizing emacs article, you already have a good idea on how this article is going to be structured. If you were directed here from somewhere else, here's what we're gonna do: we'll take some popular programming language (space permitting) and show you how to tweak vim so it will became more fit for coding in that language.
Although vim is written entirely in C, there is something named vimscript that makes creating/editing settings, sort of like Elisp in emacs, although this is a loose comparison. Please remember that whatever will be talked about here is only about vim. Not BSD vi, not some vi extension for another editor, just vim. That is because although you can learn the basics on, say, nvi, the things that interest us (since you already know the basics) will only work on vim. Of course, some recent version, not older than 7.3.x. Many things will probably work on 7.x or maybe even 6.x, but there's no guarantee.
Just as before, a little advice: although this is influenced by personal preference, experience says it works; namely, install scripts/addons/color schemes directly from the source, regardless if your distro offers it as well. That's because many maintainers tend to package stuff with respect to their personal preference, which might or might not be in concordance with yours. Installing such addons is as simple as copying a file to a location, nothing more. And, for your convenience, we'll tell you how to install via your package manager anyway.
The distributions I have available to me at this point are Debian, Fedora, Gentoo and Arch. I will do a search for the 'vim' keyword on each of them and give you some tips and pointers on what you can install, then we'll go language-specific.
In part one we introduced you to Linux editors and gave a storm course on vim. It's now time to dismantle the rumors that we're subjective and talk about the other side, emacs. In some ways, the two editors are opposite one another, mainly from historical reasons, as you will see. We hope you will enjoy this tour and that we'll help you make up your mind.
Introduction and using emacs
I remember writing somewhere in one of my articles that I won't under no circumstances reveal what's my editor/WM/DE/Hollywood actor of choice. Not because I consider myself important, but because I want to avoid any flame material. The true reason for which emacs has an entire article's space, while vim has only half (or less) is the differences between them, and this is what we'll talk about right now.
vim, through its' predecessor, vi, is very much linked to Unix in terms of evolution, just like emacs is with the GNU movement. Here's a crucial difference that influenced the design of the two editors. When Bill Joy developed vi in 1976, hardware resources were scarce, and every character sent to the terminal mattered. In order to imagine what we're talking about, imagine that vi version 2.0 was still (almost) too big to fit inside the memory of a PDP-11/70. So this is the reason why vi(m)'s commands are short and perhaps cryptic for a beginner, and maybe that's why it has its' well-known simplicity. emacs is a wholly different story. It has over 2000 (yes, two thousand) built-in commands and many critics acuse it for its' size and overly complex commands. The name stands for "Editing MACroS", but it's said that it also has to do with a certain ice cream store in Cambridge, MA. Why Cambridge? Because the man responsible for emacs is none other than Richard Stallman, aka RMS, who was working at MIT at the time. That leads to one conclusion: working at MIT equals Richard had access to more powerful hardware, where characters or buffer space weren't an issue, at least not in the amount Bill Joy had to deal with. So although the first year of existence is the same - 1976 - access to hardware made a difference. Not the only one, but an important one, for sure.
As we're nearing the end of the C series it becomes more and more obvious that we need to learn more about the tools, about the practical side of programming. And one essential aspect is the editor. Of course, that's not to say that the editor is only needed when programming. While Linux has GUIs more and more advanced, given its' heritage you will sooner or later have to use the command line and an editor you are comfortable with to edit some config file. So choosing and knowing at least one editor that's available for Linux is more than important. Here's what our article is here to do, and the only thing expected from the reader is patience and some free time.
A theoretical background
Don't worry: while the subtitle might seem a little demanding, don't expect fancy and hard-to-read terminology. But we felt the need to have a little introduction from a more technical point of view.
Linux offers a choice of editors that is too wide at times. How so? Well, if you are a beginner, you will need an editor and start searching the net with terms like "Linux editor". In a matter of seconds you will find blog posts, forum posts, mailing list posts, articles and tutorials on the matter, each and every one telling you how editor X is the best and the other editors are no good. Confusion will ensue in a matter of minutes. This is where what you're reading right now (hopefully) helps. We want to give you a short classification of available Linux editors, then give you a blitz tutorial on the most popular: vim and emacs.
GUI or CLI?
Although we don't really appreciate giving advice and prefer respecting everyone's taste, here's a piece of advice: forget about "use that editor, it's more 31337 than the others! You will be so k3w1, d00d!".
After all that theory and talking, let's start by building the code written through the last nine parts of this series. This part of our series might actually serve you even if you learned C someplace else, or if you think your practical side of C development needs a little strength. We will see how to install necessary software, what said software does and, most important, how to transform your code into zeros and ones. Before we begin, you might want to take a look at our most recent articles about how to customize your development environment:
You may wonder what is meant by the title. Code is code, right? It's important to be bug-free and that's that, what else? Development is more than writing code and testing/debugging it. Imagine you have to read someone else's work, and I suppose you already done that, and all the variables are named foo, bar, baz, var, etc. And the code isn't commented nor documented. You will probably feel the sudden urge to invoke unknown gods, then go to the local pub and drown your sorrows. They say that you should not do unto others what you don't want done unto you, so this part will focus of general coding guidelines, plus GNU-specific ideas that will help you have your code accepted. You are supposed to have read and understood the previous parts of this series, as well as solve all the exercises and, preferably, read and wrote as much code as possible.
Before starting, please take note of the actual meaning of the word above. I don't, in any way, want to tell you how to write your code, nor am I inventing these recommendations. These are the result of years of work by experienced programmers, and many will not just apply to C, but to other languages, interpreted or compiled.
With this part of our C development on Linux article we are getting ready to get out of the theoretical zone and enter the real life one. If you followed the series until this point and tried to solve all the exercises, you will now have some idea about what C is about, so you need to get out in the wild and do some practical stuff, without which theory doesn't have much value. Some of the concepts you'll see below are already known, but they are extremely important for any C program on any Unix-like OS. Yes, the information is valid regardless of the OS, as long as it's some kind of Unix, but if you'll stumble onto something Linux-specific, you will know. We will treat concepts like standard input, output and error, in-depth printf() and file access, among others.
We will continue in this part of our tutorial with the complex data types in C, and we will talk about structures. Many modern programming languages offer them, one shape or another, and so does C. As you will see later, structures allow you to manipulate data easier, by allowing you to store different variables of (possibly) different types under one single "roof".
Although I wanted to postpone the definition part for this sub-chapter, it seems like I couldn't wait and included it in the introduction. Yes, folks, that's what a structure is, and you will see in a whim how useful it is when I will show you some examples. One interesting parallel is the one referring to a database table: if you have a table called users (the unique name), then you will put in that table the exact data which pertains directly to the users: age, gender, name, address, and so on. But these are different types! No problem, you can do that with a table, just as you can do it with a struct: age will be an integer, gender will be a char, name will be a string and so on. Then you will be able to access the members of the table easily, by referring to the name of the table/member. But this is not a database course, so let's move on. But before that, let's take a short look at a logical aspect: you are invited to create structs with members that have something in common from a logical point of view, like the example above. Make it easier for you and the people that will later look at your code. So, let's see how our users database table would translate in a C struct:
We have come to a crucial point in our series of articles regarding C development. It's also, not coincidentally, that part of C that gives lots of headaches to beginners. This is where we come in, and this article's purpose (one of them, anyway), is to debunk the myths about pointers and about C as a language hard/impossible to learn and read. Nonetheless, we recommend increased attention and a wee bit of patience and you'll see that pointers are not as mind-boggling as the legends say.
Definitions and warnings
It seems natural and common sense that we should start with the warnings, and we heartily recommend you remember them: while pointers make your life as a C developer easier, they also can introduce hard-to-find bugs and incomprehensible code. You will see, if you continue reading, what we're talking about and the seriousness of said bugs, but the bottom line is, as said before, be extra careful.
A simple definition of a pointer would be "a variable whose value is the address of another variable". You probably know that operating systems deal with addresses when storing values, just as you would label things inside a warehouse so you have an easy way of finding them when needed. On the other hand, an array can be defined as a collection of items identified by indexes. You will see later why pointers and arrays are usually presented together, and how to become efficient in C using them. If you have a background in other, higher-level languages, you are familiar with the string datatype. In C, arrays are the equivalent of string-typed variables, and it is argued that this approach is more efficient.