C development on Linux – Pointers and Arrays – VI.

Introduction

We have come to a crucial point in our series of articles regarding C development. It’s also, not coincidentally, that part of C that gives lots of headaches to beginners. This is where we come in, and this article’s purpose (one of them, anyway), is to debunk the myths about pointers and about C as a language hard/impossible to learn and read. Nonetheless, we recommend increased attention and a wee bit of patience and you’ll see that pointers are not as mind-boggling as the legends say.

Definitions and warnings

It seems natural and common sense that we should start with the warnings, and we heartily recommend you remember them: while pointers make your life as a C developer easier, they also can introduce hard-to-find bugs and incomprehensible code. You will see, if you continue reading, what we’re talking about and the seriousness of said bugs, but the bottom line is, as said before, be extra careful.

A simple definition of a pointer would be “a variable whose value is the address of another variable”. You probably know that operating systems deal with addresses when storing values, just as you would label things inside a warehouse so you have an easy way of finding them when needed. On the other hand, an array can be defined as a collection of items identified by indexes. You will see later why pointers and arrays are usually presented together, and how to become efficient in C using them. If you have a background in other, higher-level languages, you are familiar with the string datatype. In C, arrays are the equivalent of string-typed variables, and it is argued that this approach is more efficient.



Pointers

You have seen the definition of a pointer, now let’s start with some in-depth explanations and, of course, examples. A first question you may ask yourself is “why should I use pointers?”. Although I might get flamed over this comparison, I’ll take my chances: do you use symlinks in your Linux system? Even if you haven’t created some yourself, your system uFses them and it makes work more efficient. I’ve heard some horror stories about senior C developers that swear they never used pointers because they’re “tricky”, but that only means the developer is incompetent, nothing more. Plus, there are situations where you’ll have to use pointers, so they are not to be treated as optional, because they’re not. As before, I believe in learning by example, so here goes:

int x, y, z;
x = 1;
y = 2;

int *ptoi; /* ptoi is, and stands for, pointer to integer*/
ptoi = &x; /* ptoi points to x */
z = *ptoi; /* z is now 1, x's value, towards which ptoi points */
ptoi = &y; /*ptoi now points to y */

If you’re scratching your head in confusion, don’t run away: it only hurts the first time, you know. Let’s go line by line and see what we did here. We first declared three integers, that’s x, y and z, and gave x and y values 1 and 2, respectively. This is the simple part. The new element comes along with the declaration of the variable ptoi, which is a pointer to an integer, so it points towards an integer. This is accomplished by using the asterisk before the name of the variable and it’s said to be a redirect operator. The line ‘ptoi = &x;’ means “ptoi now points towards x, which must be an integer, as per ptoi’s declaration above”. You can now work with ptoi like you would with x (well, almost). Knowing this, the next line is the equivalent of ‘z = x;’. Next, we dereference ptoi, meaning we say “stop pointing to x and start pointing to y”. One important observation is necessary here: the & operator can only be use on memory-resident objects, those being variables (except register[1]) and array elements.

[1] register-type variables are one of the elements of C that exist, but the majority of the programmers shun them. A variable with this keyword attached suggests to the compiler that it will be used often and it should be stored in a processor register for faster access. Most modern compilers ignore this hint and decide for themselves anyway, so if you’re not sure you need register, you don’t.

We said that ptoi must point to an integer. How should we proceed if we wanted a generic pointer, so we won’t have to worry about data types? Enter the pointer to void. This is all we’ll tell you, and the first assignment is to find out what uses can the pointer to void can have and what are its’ limitations.



Arrays

You will see in this sub-chapter why we insisted on presenting pointers and arrays in one article, despite the risk of overloading the reader’s brain. It’s good to know that, when working with arrays, you don’t have to use pointers, but it’s nice to do so, because operations will be faster, with the downside of less comprehensible code. An array declaration has the result of declaring a number of consecutive elements available through indexes, like so:

int a[5];
int x; 

a[2] = 2;
x = a[2];

a is a 5-element array, with the third element being 2 (index numbering starts with zero!), and x is defined as also being 2. Many bugs and errors when first dealing with arrays is that one forgets the 0-index problem. When we said “consecutive elements” we meant that it’s guaranteed that the array’s elements have consecutive locations in memory, not that if a[2] is 2, then a[3] is 3. There is a data structure in C called an enum that does that, but we won’t deal with it just yet. I found some old program I wrote while learning C, with some help from my friend Google, that reverses the characters in a string. Here it is:

#include <stdio.h>
#include <string.h>

int main()
{
  char stringy[30];
  int i;
  char c;
  printf("Type a string .\n");
  fgets(stringy, 30, stdin);
  printf("\n");

  for(i = 0; i < strlen(stringy); i++)
    printf("%c", stringy[i]);
  printf("\n");
  for(i = strlen(stringy); i >= 0; i--)
    printf("%c", stringy[i]);
  printf("\n");

  return 0;
}

This is one way of doing this without using pointers. It has flaws in many respects, but it illustrates the relation between strings and arrays. stringy is a 30-character array that will be used to hold user input, i will be the array index and c will be the individual character to be worked on. So we ask for a string, we save it to the array using fgets, prints the original string by starting from stringy[0] and going on, using a loop incrementally, until the string ends. The reverse operation gives the desired result: we again get the string’s length with strlen() and start a countdown ’til zero then print the string character by character. Another important aspect is that any character array in C ends with the null character, represented graphically by ‘\0’.

How would we do all this using pointers? Don’t be tempted to replace the array with a pointer to char, that won’t work. Instead, use the right tool for the job. For interactive programs like the one above, use arrays of characters of fixed length, combined with secure functions like fgets(), so you won’t be bitten by buffer overflows. For string constants, though, you can use

char * myname = "David";

and then, using the functions provided to you in string.h, manipulate data as you see fit. Speaking of which, what function would you choose to add myname to strings that address the user? For example, instead of “please enter a number” you should have “David, please enter a number”.



Pointers and arrays

You can, and are encouraged to, use arrays in conjunction with pointers, although at first you might be startled because of the syntax. Generally speaking, you can do anything array-related with pointers, with the advantage of speed at your side. You might think that with today’s hardware, using pointers with arrays just to gain some speed isn’t worth it. However, as your programs grow in size and complexity, said difference will start being more obvious, and if you ever think of porting your application to some embedded platform, you will congratulate yourself. Actually, if you understood what was said up to this point, you won’t have reasons to get startled. Let’s say we have an array of integers and we want to declare a pointer to one of the array’s elements. The code would look like this:

int myarray[10];
int *myptr;
int x;
myptr = &myarray[0];
x = *myptr;

So, we have an array named myarray, consisting of ten integers, a pointer to an integer, that gets the address of the first element of the array, and x, which gets the value of said first element via a pointer. Now you can do all sorts of nifty tricks to move around through the array, like

*(myptr + 1);

which will point towards the next element of myarray, namely myarray[1].

Pointer to array

One important thing to know, and at the same time one that illustrates perfectly the relationship between pointers and arrays, is that the value of an array-type object is the address of its’ first (zero) element, so if myptr = &myarray[0], then myptr = myarray. As somewhat of an exercise, we invite you to study this relationship a bit and code some situations where you think it will/could be useful. This is what you will encounter as pointer arithmetic.

Considerations on strings in C and calls

Before we have seen that you can do either

char *mystring;
mystring = "This is a string."

or you can do the same by using

char mystring[] = "This is a string.";

In the second case, as you might have inferred, mystring is an array big enough as to hold the data attributed to it. The difference is that by using arrays you can operate on individual characters inside the string, while by using the pointer approach you cannot. It is a very important issue to remember that will save you from the compiler having large men coming to your house and do terrible things to your grandma. Going a little further, another issue you should be aware of is that if you forget about pointers, calls in C are made by value. So when a function needs something from a variable, a local copy is made and work is done on that. But if the function alters the variable, changes are not reflected, because the original stays intact. By using pointers, you can use calling by reference, as you will see in our example below. Also, calling by value might become resource-intensive if the objects being worked on are big. Technically, there is also a call by pointer, but let’s keep it simple for now.

Let’s say we want to write a function that takes an integer as an argument and increments it with some value. You will probably be tempted to write something like this:

void incr(int a)
{
  a+=20;
}

Now if you try this, you will see that the integer will not be incremented, because only the local copy will be. If you would have written

void incr(int &a)
{
  a+=20;
}

your integer argument will be incremented with twenty, which is what you want. So if you still had some doubts about the usefulness of pointers, here’s one simple yet significant example.



Somewhat advanced topics

We thought about putting these topics in a special section because they are a little harder to understand for beginners, but are useful, must-know parts of C programming. So…

Pointers to pointers

Yes, pointers are variables just like any other, so they can have other variables point to them. While simple pointers as seen above have one level of “pointing”, pointers to pointers have two, so such a variable points to another that points to another. You think this is maddening? You can have pointers to pointers to pointers to pointers to….ad infinitum, but you already crossed the threshold of sanity and usefulness if you got such declarations. We recommend using cdecl, which is a small program usually available in most Linux distros that “translates”between C and C++ and English and the other way around. So, a pointer to a pointer can be declared as

int **ptrtoptr;

Now, as per how multiple-level pointers are of use, there are situations when you have functions, like the comparison above, and you want to get a pointer from them as return value. You also might want an array of strings, which is a very useful feature, as you will see in a whim.

Multi-dimensional arrays

The arrays you have seen so far are unidimensional, but that doesn’t mean you are limited to that. For example, a bi-dimensional array can be imagined in your mind as being an array of arrays. My advice would be to use multi-dimensional arrays if you feel the need, but if you’re good with a simple, good ole’ unidimensional one, use that so your life as a coder will be simpler. To declare a bi-dimensional array (we use two dimensions here, but you’re not limited to that number), you will do

 int bidimarray [4][2];

which will have the effect of declaring a 4-by-2 integer array. To access the second element vertically (think of a crossword puzzle if that helps!) and the first horizontally, you can do

bidimarray [2][1];

Remember that these dimensions are for our eyes only: the compiler allocates memory and works with the array about the same way, so if you don’t see the utility of this, don’t use it. Ergo, our array above can be declared as

int bidimarray[8]; /* 4 by 2, as said */


Command line arguments

In our previous installment of the series we talked about main and how it can be used with or without arguments. When your program needs it and you have arguments, they are char argc and char *argv[]. Now that you know what arrays and pointers are, things start to make way more sense. However, we thought about getting in a bit of detail here. char *argv[] can be written as char **argv as well. As some food for thought, why do you think that is possible? Please remember that argv stands for “argument vector” and is an array of strings. Always you can rely on the fact that argv[0] is the name of the program itself, while argv[1] is the first argument and so on. So a short program to see the its’ name and the arguments would look like this:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
        while(argc--)
                printf("%s\n", *argv++);
        return 0;
}

Conclusion

We chose the parts that looked the most essential for the understanding of pointers and arrays, and intentionally left out some subjects like pointers to functions. Nonetheless, if you work with the information presented here and solve the exercises, you’ll have a pretty good start on that part of C that’s considered as the primary source of complicated and incomprehensible code.

Here is an excellent reference regarding C++ pointers. Although it’s not C, the languages are related, so the article will help you better understand pointers.

Here is what you can expect next:



Comments and Discussions
Linux Forum