References and Pointers

Overview

Reading the article Choosing Between References and Pointers in C++ and its comments have lead me to write this article in response. Briefly stated: (i) references are not pointers and (ii) I wouldn't use the article to form opinions or learn about C++'s pointers and/or references.

UPDATE (Aug. 19, 2012): I changed "know" in some places to "assume" concerning the compiler (related to Mikko Sysikaski's comment and my reply). I would also like to add, in my opinion only, when concepts are finally added to C++ code written using references may well become preferred over some uses of pointers if concepts permit the definition of type properties akin to group theory (e.g., think of associativity, transitivity, distributivity, etc. properties). If such occurs, compilers should then be able to powerfully shuffle expressions around to transform and optimize them and since the compiler is not mandated to use storage for a reference, the compiler can forgo allocating storage space for references in the resulting transformed expressions should they not need to be stored.


Declarations

Reference Declarations

The C++11 standard states this about references:

It is unspecified whether or not a reference requires storage. (§8.3.2; para. 4)

This implies the C++ compiler does not need to represent a reference as a pointer internally, i.e., the compiler has total freedom to choose a representation it wants. That said, if the programmer explicitly takes the address of a reference, then it is likely the compiler will have no choice but to represent it internally using a pointer. If one does not use the address-of operator explicitly, then the compiler:

  • knows the reference is an alias to a specific value,
  • knows whether or not the value is an instance variable; a constant value/literal; an lvalue or rvalue; or, can internally flag it if the referent is some weird pointer to memory address hack / complicated expression (i.e., treat it internally as a pointer); and,
  • it can optimize code using that knowledge how it sees fit as it need not even allocate storage for it.

It is therefore foolish to assume anything about a reference other than it refers to a valid specific value. It is reasonable to assume that if one "converts" the reference to a pointer (i.e., by using the address-of operator) to its referenced value, then one is likely inducing the compiler to implement the reference as a pointer and perhaps also forcing the compiler to declare a temporary variable to hold the specific value. If such occurs, then this will likely have an effect on the compiler's ability to optimize the code.

On the other hand, if one avoids the address-of operator, then one is allowing the compiler to retain the reference as a reference and to use whatever forms the compiler decides it should be. So if the reference is to a constant literal value, it can keep it as such when/where required when optimizing the code.

These restrictions:

There shall be no references to references, no arrays of references, and no pointers to references. […] (§8.3.2; para. 5)

combined with the fact that what a reference refers to is known assumed to be valid, allows for the optimizing compiler to more intelligently optimize code. When C++ adds concepts this should prove to be especially powerful to optimization engines.

Pointer Declarations

Unlike references pointers can easily refer to any memory addresses be they valid or invalid. References cannot:

[…] A reference shall be initialized to refer to a valid object or function. [Note: in particular, a null reference cannot exist in a well-defined program […] [§8.3.2; para. 5]

Thus the code in this comment written by AHelps to the aforementioned article violates the C++ standard:

References can be used to refer to NULL. It looks like this:

int &x = *((int *) NULL);

You should never do this, but I have it on good authority that there exist large commercial code bases that do this. It IS well defined behavior, as references are exactly the same as pointers

This is not okay. It is also not well-defined behaviour. If such is used in a program, the C++ program is not well-defined according to the standard. Further the C++ standard does not say anywhere that references are exactly the same as pointers: they are not!

Since pointers can point to anywhere including invalid memory locations, the compiler knows less information concerning the pointer and its referent than it does with a reference since a reference's referent is at least valid.

Further complicating the compiler's ability to handle pointers are the aliasing problems that arise since there can be pointers to pointers, and frankly, pointers can point to anything except references and bit fields [§8.3.1, para. 4]. This implies that the compiler might even have to "follow" pointer chains to perform some types of optimizations –provided it knows what those pointers point to. Of course, in general, the compiler doesn't even know if the pointer points to a valid value, so "following" pointer chains is less useful than it might seem. With references, the compiler knows assumes (i.e., it is assumed by the definition of a from the creation of a reference) that what it refers to is valid. As most code using references sets the referent to literal values or variables –not some expression involving memory address tricks– and this implies that the compiler can likely use the reference information in the optimization process as it typically knows which variables, arguments, and/or literals it refers to.


Closing Comments

Unless you are doing dynamic memory allocation, exchanging pointers with external libraries / other programming languages, or truly need to do pointer arithmetic, use references instead of pointers. In doing so you are permitting the compiler to make decisions on how it represents the reference with the added benefit that the compiler might even be able to avoid allocating any storage to implement the reference. The same cannot be done for pointers.

5 Replies to “References and Pointers”

  1. You are claiming that a reference always points to a valid value. Does the standard necessarily require this?

    The standard says this: "A reference shall be initialized to refer to a valid object or function."

    However, I don't see where it says anything about the lifetime of the references object. Couldn't it be that the pointed object is destroyed before the reference, leaving the reference to point to garbage, so the compiler can't assume that it points to a valid location?

    1. Not quite (with respect to the claim). My read/understanding of the standard is that the compiler/language will assume that a reference will always point to a valid location should its referent ever be used since it was initialized with a valid referent and the referent referred to can never be changed. This assumption is key when the compiler performs code optimizations.

      I did some searching and in §3.8 Object Lifetime, the C++11 standard states an object's lifetime end when a destructor call starts or "the storage which the object occupies is reused or released." If a reference referent's lifetime ends after the initialization of the reference then such would imply the subsequent use of the reference to access the referent is invalid.

      I also came across §12.2 paragraph 5 talking about when references are bound to temporaries. One of its code examples shows a dangling reference being created. Thus, one can conclude that it is not intended to be illegal to have a properly initialized reference become invalid, but, to refer to such after it has become invalid would definitely result in undefined behaviour.

    2. Great question, BTW! (I missed dealing with such in what I wrote.)

      Also I made some fixes to the article, e.g., changing "knows" to "assumes", when referring to the compiler. It was not my intention to use "know".

  2. "Unless you are doing dynamic memory allocation, exchanging pointers with external libraries / other programming languages, or truly need to do pointer arithmetic, use references instead of pointers."

    I believe there's another situation where you have to use pointers. In a non-ownership situation, you can have references as data members – they get initialized in the ctor, and when they exit scope, no management is necessary. However, if you later decide to add move ctor/assignment to your class, you can't nullify those references. The easier solution (and the advice I ended up following) is converting those references to pointers.

    1. I missed that case. I agree completely: internal member variable references in numerous cases should really be pointers to such for the reasons you give, i.e., when one need copy and/or move semantics with respect to the referred-to type.

Leave a Reply

Your email address will not be published. Required fields are marked *