Pointers in C - The basics

A quick overview of the basics of pointers in C

Part I - The basics
Part II - Use of pointers

The concept of pointers is considered one of the trickiest basic concepts in C. Pointers syntax is visually awkward and the usage can be confusing. Explaining pointers typically involves analogies about street addresses, lockers or road signs that, if they don't click with you, they can leave you more confused.

Contents

Definition and syntax

I maintain that the simplest and most straight-forward way to understand pointers is the definition given in "The C programming language" by Kernighan and Ritchie:

A pointer is a variable that contains the address of a variable.

Basically a pointer is a variable, like any other variable. Much like a data variable can only store a specific type of data, a pointer can only store a memory address.

Every variable has a name, an address in memory and the value that it stores. Depending on the architecture we are working on and the type of variable, the variable size differs.

As an example:

int main(void) {
    int a = 16;
    int *pa; // create a pointer named pa
    pa = &a; // pa stores the address of a

    return 0;
}

The asterisk in int *pa indicates a pointer to integer variable. The ampersand in &a indicates an address, so in this example we are assigning the memory address of a to the pointer named pa.

Confusingly enough, you can declare a pointer with either int* p or int *p. The first reads as "p is a variable of integer pointer type" and the second reads as "p is a pointer variable of type integer".

The notation int* p seems less visually ambiguous, as it follows the logic "<data type> <variable name>" and does not use a visually identical notation with pointer dereferencing. However int* p, q does not declare two integer pointers; only p is a pointer. In this case, int *p, q is less ambiguous, because it clearly shows that * is associated with p only.

Addresses and pointers

Let's break down this example:

int main(void) {
    int a = 16;
    int *pa = &a;

    return 0;
}

Variable a is an integer with a value of 16 that lives in memory at the memory address 0x1004 (ofc, this is a fictional address).

Variable
indicator a
value 16
address in memory 0x1004
size of int (x64) 4

The pointer pa has the value of the address of a, 0x1004 in this case. The pointer pa occupies itself some space in memory, in this example in the memory address 0x2004.

If we dereference the pointer, that is, look at the value held at the address stored in pa, we get the value 16. So, by dereferencing a pointer, we can access the value that lives in the specific address our pointer is pointing at. In this case, in the memory address 0x1004 lives the variable a which holds the value 16.

Pointer
indicator pa
value 0x1004
dereferenced value 16
address in memory 0x2004
size of pointer (x64 ) 8

(addresses 0x1004 and 0x2004 are, of course, for illustration purposes)

To see this in action, we can try to modify our code like this:

#include<stdio.h>

int main(void) {
    int a = 16;
    int *pa = &a;

    printf("Value a: %d, address a: %p, sizeof(a): %ld\n", a, &a, sizeof(a));
    printf("Value pa: %p, address pa: %p, deref pa: %d, sizeof(pa): %ld\n", pa, &pa, *pa, sizeof(pa));

    return 0;
}

This will print something like

Value a: 16, address a: 0x7ffee26b56dc, sizeof(a): 4
Value pa: 0x7ffee26b56dc, address pa: 0x7ffee26b56d0, deref pa: 16, sizeof(pa): 8

The null pointer

When we declare any value without initialising it, the behaviour is mostly undefined. Depending on the compiler, target system architecture and optimisation flags used, int a; printf("%d", a); can return 0, any random garbage variable, or just crash.

In a similar manner, uninitialised pointers will have an undefined behaviour. An uninitialised pointer (also known as a wild pointer) will have a garbage value and point to an often unpredictable location in memory. This can be the source of segmentation faults, unexpected results, crashes or even security vulnerabilities.

This can be avoided by using the null pointer. The null pointer has a value of 0 and does not point to any valid memory address. To be more precise, the null pointer points to the address 0x0, an address reserved by the operating system and not accessible to user-level processes. For this reason, a null pointer cannot be dereferenced.

int main(void) {
    int *p = NULL;

    return 0;
}

To utilise the null pointer, the C standard library provides the NULL macro. The NULL macro is defined in a few locations, namely stdio.h, stddef.h, stdlib.h and a couple more. The macro definition in GCC's stddef.h for C is:

#define NULL ((void *)0)

while for C++, that has stronger type checking than C, NULL is simply defined as #define NULL 0

We could simply initialise pointers as 0 and have the same result. The usage of the NULL macro aims to make visually clear that we are not assigning an integer to any specific address. Generally, it's a good practice to initialise out pointers with the NULL pointer constant, rather than leaving them dangling.

The void pointer

The void pointer (void *) is a pointer without any associated data type. It replaces char *, which was used as a generic pointer in earlier versions of C.

I like the void pointer definition in GNU C Language Manual:

It (void *) represents a pointer to we-don’t-say-what

The void pointer can point to an address that holds data of any data type. So, for example, if we have:

int a = 16;
float f = 3.14;
char c = 'y';

a void pointer, void *p; can point to the address of any of the above variables.

void *p;

p = &a; // correct - points to the address of int a
p = %f; // correct - now points to the address of float f
p = &c; // correct - now points to the address of char 'y'

However, a void pointer cannot be dereferenced directly. It has to be typecasted first, otherwise the compiler will not know what type of data it points to, how much memory it needs to access, etc.

#include <stdio.h>

int main (void) {
    int a = 16;
    void *p = &a;

    printf("%d\n", *p); // will throw error
    printf("%d\n", *(int *)p); // will print 16

    return 0;
}

Void pointers are used by the functions that control dynamic memory allocation, as we will see in an upcoming article.


In an upcoming article we will have a look at the uses of pointers.