Using The C++ Placement New Operator

Overview

Using C++11's rvalue references, it is an interesting exercise to see how such can be used with placement new and the new alignas() functionality. This post implements a class that has the ability to conditionally store an object without allocating space for the object from the heap. To accomplish this, the object holds a character array capable of holding an instance of its template argument type T and uses placement new to assign the array to the object.

Aside: Unfortunately the GCC version 4.7.0 snapshot 20111029 C++ compiler does not yet appear to support alignas(). I have left an alignas() commented out so it can be revisited when such is supported. With C++98 it is up to the programmer to ensure that a non-heap allocated character array is properly aligned in memory to store some type T. Although such is beyond the scope of this post, it should soon be moot when compilers support such! 🙂

For those familiar with Boost, the code presented herein is conceptually similar to the functionality provided by boost::optional.


Conditionally Storing An Object Without Using The Heap

In order to conditionally store an object without using the heap, one will need a bool to indicate whether or not the object exists and sufficient memory to hold (a properly aligned object):

template <typename T>
class optional
{
  private:
    /*alignas(T)*/ char t_[sizeof(T)];
    bool created_;
};

where t_ is a character array capable of holding a T. Now only if there was a way to invoke new using &t_[0]. There is! It is a C++98 feature called placement new. Placement new allows one to write an operator new that takes customized function arguments. Conveniently, the C++98 standard defines an operator new overload that simply accepts an address where the object will be instantiated in RAM. When this is done, one must ensure enough space exists at that address, that the data is properly aligned, and manually call the destructor when the object is destroyed. Since there is no corresponding placement delete operator, the programmer is responsible for explicitly calling the destructor and performing any necessary clean-up operations.

Given the t_ array above one could create an default object of type T where the t_ array is located in RAM by invoking:

T* p = new(t_) T();

and later the object can be destroyed by explicitly invoking the object's destructor:

p->~T();

Cool, eh?!!!

NOTE: The destructor for a type can only be called if a placement new operator was used to create the instance. Although it can be done, never, ever explicitly call the destructor for an object that was not created with placement new!

So the concept this post implements is this: if created_ is true then an instance of T would have been previously instantiated at t_ and if created_ is false no instance of T exists at T.


Handling Instantiations

Let's start by writing the member functions needed to properly instantiate objects of type T located at t_. To do this, the following cases need to be handled:

  • default construction of T
  • copy construction of T
  • move construction of T
  • copy assignment of T
  • move assignment of T
  • destruction of T

Note that these concern T. Since the class template is optional<T> these operations must be defined to support the above:

  • default construction of optional<T>
  • copy construction of optional<T>
  • move construction of optional<T>
  • copy assignment of optional<T>
  • move assignment of optional<T>
  • destruction of optional<T>

Fortunately, the number of cases to handle can be simplified by writing some protected member functions by first noting:

  • Should the object not previously exist for any constructor or assignment invocation, then it must be created with the default value or by using the provided value.
  • Should the object previously exist for any assignment invocation, then it is sufficient to assign the provided value to the object.

Thus, the following member functions can handle all of the aforementioned cases:

protected:
  // Create T using the default constructor or assign default
  // value if already created...
  T& create_or_default();

  // Create T using value or assign value if already created...
  T& create_or_assign(T const& value);

  // Create T using value or assign value if already created...
  T& create_or_assign(T&& value);

  // Destroy T (invokes destructor and sets created_ to false)...
  void destroy();

  // Properly invokes a create_or_*() or destroy() function...
  void create(optional<T> const& o);

  // Properly invokes a create_or_*() or destroy() function...
  void create(optional<T>&& o);

With such the traditional constructors are easily written (note the use of std::move):

public:
  struct none { };

  optional()
    : created_(false)
  {
  }

  optional(none)
    : created_(false)
  {
  }

  optional(optional<T> const& o)
    : created_(false)
  {
    create(o);
  }

  optional(optional<T>&& o)
    : created_(false)
  {
    create(std::move(o));
  }

  optional(T const& t)
    : created_(false)
  {
    create_or_assign(t);
  }

  optional(T&& t)
    : created_(false)
  {
    create_or_assign(std::move(t));
  }

  optional<T>& operator =(none)
  {
    destroy();
    return *this;
  }

  optional<T>& operator =(T const& t)
  {
    create_or_assign(t);
    return *this;
  }

  optional<T>& operator =(T&& t)
  {
    create_or_assign(std::move(t));
    return *this;
  }

  ~optional()
  {
    destroy();
  }

To simplify the implementation of the protected helper functions above, let's define the following functions which all assume that an object of type T has been created:

public:
  T& get()
  {
    return *static_cast<T*>(static_cast<void*>(t_));
  }

  // Until g++ supports this rvalues...
  T&& get_as_rvalue()
  {
    return std::move(*static_cast<T*>(static_cast<void*>(t_)));
  }

  T const& get() const
  {
    return *static_cast<T const*>(static_cast<void const*>(t_));
  }

  T* ptr()
  {
    return static_cast<T*>(static_cast<void*>(t_));
  }

  T const* ptr() const
  {
    return static_cast<T const*>(static_cast<void const*>(t_));
  }

It is necessary to use static_cast twice to properly coerce the t_ pointer into a pointer to a different type by converting it to a void* first. By doing these casts in these functions, all other functions will not need to use any casts! 🙂


Implementing create_or_default()

If the object has already been created, then assign the default value to it. If not, then create it by invoking its default constructor using placement new:

T& create_or_default()
{
  if (created_)
  {
    T& t = get();
    t = T();
    return t;
  }
  else
  {
    T& retval = *(new(t_) T());
    created_ = true;
    return retval;
  }
}


Implementing create_or_assign()

If the object has already been created, then assign the value pass in to it. If not, then create it by invoking its copy/move constructor using placement new:

T& create_or_assign(T const& value)
{
  if (created_)
  {
    T& t = get();
    t = value;
    return t;
  }
  else
  {
    T& retval = *(new(t_) T(value));
    created_ = true;
    return retval;
  }
}

T& create_or_assign(T&& value)
{
  if (created_)
  {
    T& t = get();
    t = std::move(value);
    return t;
  }
  else
  {
    T& retval = *(new(t_) T(std::move(value)));
    created_ = true;
    return retval;
  }
}


Implementing destroy()

If the object has already been created, call its destructor and set created_ to false. Otherwise do nothing:

void destroy()
{
  if (created_)
  {
    T* t = ptr();
    t->~T();
    created_ = false;
  }
}


Implementing create()

Both copy and move versions of create() have four cases to consider corresponding to the values of this->created_ and o.created_ with o being passed in to the function. The four cases are converted into a two-bit number so they can be efficiently handled using a switch statement:

void create(optional<T> const& o)
{
  switch ((created_ << 1) | o.created_)
  {
    case 1:
    case 3:
      create_or_assign(o.get());
      break;

    case 2:
      destroy();
      break;

    default: // i.e., case 0:
      break;
  }
}

void create(optional<T>&& o)
{
  switch ((created_ << 1) | o.created_)
  {
    case 1:
    case 3:
      create_or_assign(o.get_as_rvalue());
      break;

    case 2:
      destroy();
      break;

    default: // i.e., case 0:
      break;
  }
}


Adding Some Convenience Member Functions

To facilitate the use of class types and the ability to test optional<T> instances in boolean statements to determine if it holds an instance of type T, the following public member functions are added:

operator bool() const
{
  return created_;
}

bool operator !() const
{
  return !created_;
}

T& operator *()
{
  return get();
}

T const& operator *() const
{
  return get();
}

T* operator ->()
{
  return ptr();
}

T const* operator ->() const
{
  return ptr();
}


Using optional<T>

Now optional can be used as follows:

#include <iostream>

int main()
{
  using namespace std;

  optional<int> a;
  cout << (a ? "a is set" : "a is not set") << endl;
  a = 5;
  cout << (a ? "a is set" : "a is not set") << endl;
  a = optional<int>::none();
  cout << (a ? "a is set" : "a is not set") << endl;
}

which outputs:

$ ./a.out
a is not set
a is set
a is not set
$


Closing Comments

Fortunately issues of alignment are (thank)fully addressed in C++11 and so we are only waiting for compilers to add support for those C++11 features. This post demonstrates how one can conditionally create/destroy objects not located on the heap at will while fully leveraging full copy/move semantics. This is very powerful as it allows instances of a class to defer member object instantiation without using the heap. Avoiding the use of the heap enables more efficient code to be written especially when using custom memory allocators or objects that must lazily construct their member variables.

Far too often, code is written to use the heap only to store pointers or other small objects requiring an additional pointer indirection just to access them. Using C++98's placement new operator, C++11's alignment features (when finally implemented), and C++11's move semantics, one can choose to elide a layer of indirection by reserving enough space to hold an object and instantiating it when needed in place. The trade-off is between a small amount of additional adjacent RAM than otherwise required by an object versus the space overhead of allocating object(s) on the heap and using an additional pointer located elsewhere in RAM to access the data. It is reasonable to suggest that the former is likely to run much faster (i.e., with respect to CPU instructions, cache performance, the avoidance of slow heap functions) than the latter especially when small objects are used.

Using The C++ Literal Operator

Overview

The C++11 standard has added a new operator that can be overloaded. Literals in programming languages are hard-coded constants in programs. For example, writing 1.2L, "Hello World!", 4096, etc. are all literals (i.e., the first is a long double value, the second is a const char[13] value, and the third is an int). The C++11 standard allows one to define custom literal types that can be transformed at compile-time or run-time into appropriate values. This post explores such using g++ v4.7 snapshot 20111029.

Update (Dec. 16, 2011): A small update was made to the general case definition of binary_literal_impl to provide more user-friendly compiler error messages if incorrect digits are used.


The Goal

I don't know about you, but I've always wanted to be able to write binary numbers into my program code and to have it store such optimally. Using C++11, we'll be able to do exactly this using code like this:

int main()
{
  using namespace std;

  const unsigned long long bits =
    11011110101011011011111011101111_binary;

  cout << "The number is: " << hex << bits << endl;
}

provided we write a literal operator function whose suffix is _binary.

I believe I read somewhere that literal names not starting with an underscore are reserved by the C++ standard. If you know definitively or otherwise, kindly let me know. 🙂

Update: In a reddit post's comment, zootsuit notes, "From the N3291 draft/17.6.4.3.5: Literal suffix identifiers that do not start with an underscore are reserved for future standardization." –which is likely where I read it: in one of the draft standards.

I am going to introduce an added twist: the conversion must be done at compile-time. Why? Efficiency! For any binary number hard-coded into the program, it must be encoded as a single integer value in the executable (to ensure minimum space usage and maximum efficiency) converted properly at compile time with an error if it is not proper. Certainly, no programmer wants the binary number to be stored in the executable as a string that is converted at run-time into an integer! Ack! The latter is both a waste of space and time.


Let's Do A Simple Example First

Before messing around with template metaprogramming (which is probably bewildering until you know how to read/understand it), let's write a literal whose suffix is _square that computes the square of a long double number it is associated with and returns the result:

#include <iostream>

// Insert literal operator _square definition here. (See below.)

int main()
{
  using namespace std;

  const long double num = 25.5_square;

  cout << num << endl;
}

which would output 650.25. To do this, the following function needs to be written:

constexpr long double operator "" _square(long double num)
{
  return num*num;
}

The literal operator's name is operator"" and its suffix is given after it (i.e., _square). The return type can be anything but it is set to long double because that is the computed value's type here. The constexpr keyword implies and requires that the compiler must be able to compute the result as a compile-time constant. If this is not possible, then it will fail to compile. In general, if a literal operator overload is not written as a template function and does not use constexpr, the compiler will invoke the literal operator at run-time.

That's it! Simply compile the above code, it will store 650.25 in the executable binary as a hard-coded long double value!


Literal Operator Function Parameters

Be aware that the literal operator only allows a fixed set of function arguments:

  • const char*
  • unsigned long long int
  • long double
  • char
  • wchar_t
  • char16_t
  • char32_t
  • const char*, std::size_t
  • const wchar_t*, std::size_t
  • const char16_t*, std::size_t
  • const char32_t*, std::size_t

or if there are no arguments at all, then the literal operator must be defined as a template function whose template arguments are a char template parameter pack, i.e.,

template <char... CS>
some_return_type operator "" _some_suffix_name();

Also notice that all of the function argument types, except for the character types, are the largest-range types of their kind (i.e., unsigned long long is the largest-range integer type, long double is the largest-range floating-point type) as the compiler can easily cast any value to a smaller type at compile-time. Since there are numerous character and string literal types (including the new Unicode and raw literals in C++11) the remaining parameters listed handle these special types of literals.


Implementing The _binary Literal Operator

Recall the earlier code that permits one to write a binary number in the C++:

#include <iostream>

// Insert definition of _binary literal and associated code here.

int main()
{
  using namespace std;

  const unsigned long long bits =
    11011110101011011011111011101111_binary;
  cout << "The number is: " << hex << bits << endl;
}

i.e., the number is 0xDEADBEEF, which is what the program will output.

To ensure that the conversion occurs at compile-time and to be able to easily implement it (as it is a non-trivial function), the implementation of the _binary literal will use a class template with partial template specialization. To understand this better, let's first start by defining the _binary literal:

template <char... Digits>
constexpr unsigned long long operator "" _binary()
{
  return binary_literal_impl<Digits...>::to_ulonglong();
}

Notice that the _binary literal operation has no arguments. This is because the char values in the string before _binary are being passed as a char template parameter pack.

Template parameter packs represent a sequence of template arguments. They are not types and to extract them they must be expanded with the template parameter pack expansion operator, ....

Within the _binary literal operator definition, the char template parameter pack needs to be expanded and processed into an unsigned long long. To accomplish this, the work will be delegated to a static function inside the class template binary_literal_impl as this will allow writing clean, recursively defined code that processes the char template parameter pack which (should be only!) composed of '0' and '1' characters.


Implementing The binary_literal_impl Class Template

The binary_literal_impl class template allows code to be written that recognizes the following properties about its template arguments:

  • when a '0' appears first, possibly followed by more characters,
  • when a '1' appears first, possibly followed by more characters, and,
  • when there are no characters.

Additionally, if any other (invalid) char values occur, then a compile-time error will be generated (as there will be no definition that exists for binary_literal_impl for such arguments. To accomplish this, partial template specialization is needed so the general case needs to be (forward) declared and not defined (as we want errors if there are no matches!) first:

template <char... Digits>
struct binary_literal_impl;

This is needed first so the compiler knows template arguments for the binary_literal_impl class template must be a parameter pack of char values. It is very important that there are no braces used here: this avoids defining what is associated with binary_literal_impl if there are no partial matches with the code that is written below. (If the compiler cannot find a matching definition, a compile-time error will occur.)

If you are used to functional programming in Miranda or Haskell, C++ requires the reverse order of what would be done in those languages when using partial specialization: the general case is written first, then the specialized cases follow.

Even better one can write the above general case of binary_literal_impl to use static_assert to trigger a very nice compiler error messages when an incorrect digit is used. (If you are new to this style of programming, I encourage you to write also try out the above definition to see the differences in compiler output.)

// Alternative user-friendly general case
// (i.e., any digits other than '0' or '1')...
template <char... Digits>
struct binary_literal_impl
{
  static constexpr unsigned long long to_ulonglong()
  {
    static_assert(false, "Digit characters must either be '0' or '1'.");
    return 0;
  }
};

If the first char (template argument) value is '0', then there is no one bit to shift and the result is simply to return the integer value computed on the rest of the characters in the char parameter pack:

// If the next digit is zero, then compute the rest...
template <char... Digits>
struct binary_literal_impl<'0', Digits...>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return binary_literal_impl<Digits...>::to_ulonglong();
  }
};

Notice that the template argument, however, is now one shorter than what it was. If this is not obvious, then know when one is using partial template specialization, what is inside < and > after the class template name is what is being matched. Thus, since '0', Digits... appears, that is what is being matched: a '0' character followed by a parameter pack called Digits with Digits defined to be char... in template <char... Digits>. Thus, Digits... represents the expansion of all char template argument values after the first one! 🙂

If the first char (template argument) value is '1', then there is a one bit to shift left which must be bitwise-OR'd with the result computed on the remaining arguments. Since the binary digits are being processed from left to right, the one bit should be shifted left by the number of digits that remain to be processed. The C++11 sizeof... operator allows one to know the size of a parameter pack at compile time, so the definition of this case becomes:

// If the next digit is one, then shift 1 and compute the rest...
template <char... Digits>
struct binary_literal_impl<'1', Digits...>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return (1ULL << sizeof...(Digits))
      | binary_literal_impl<Digits...>::to_ulonglong();
  }
};

Again notice that the number of characters remaining to process becomes one shorter when recursively calling to_ulonglong().

Finally, at some point there will be no digits left to process in the recursively defined code above. When this occurs, the computed answer should be zero:

// Base case: No digits, so return 0...
template <>
struct binary_literal_impl<>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return 0;
  }
};

i.e., notice binary_literal_impl<> has no contained values.

That's it!

Importantly, since we did not write any code to handle characters other than '0' or '1' using any other values (e.g., try putting a 2 or a in the number) will cause compilation to fail. This is a good thing because a binary number should only contain '0's and '1's!


Closing Comments

As with many template metaprogramming techniques in C++, no matter how complicated the metaprogramming code is, the use of it is often very straight-forward. With C++11 supporting literals, code will be easier to read and write since more meaningful values can now appear as literals in code instead of equivalent hard-to-understand, machine-specific, hard-coded character or integer arrays. Nicely, any literal definitions/prototypes can be hidden away in header files: the end user does not need to know the details. Why? The end user only needs to know how to use the literal operator, i.e., what is written in the documentation about it! In this instance understanding how to use the _binary literal is easy: it must be preceded by a valid binary number –one doesn't need to see its definition at all to be able to use it: he/she only needs to see its documentation. 🙂

For your convenience, this is the entire program presented above:

//===============================================================

//
// binary_literal_impl represents a compile-time function that
// computes the unsigned long long int from a list of characters
// Digits that MUST be composed of '0' or '1'...
//
template <char... Digits>
struct binary_literal_impl;

// If the next digit is zero, then compute the rest...
template <char... Digits>
struct binary_literal_impl<'0', Digits...>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return binary_literal_impl<Digits...>::to_ulonglong();
  }
};

// If the next digit is one, then shift 1 and compute the rest...
template <char... Digits>
struct binary_literal_impl<'1', Digits...>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return (1UL << sizeof...(Digits))
      | binary_literal_impl<Digits...>::to_ulonglong();
  }
};

// Base case: No digits, so return 0...
template <>
struct binary_literal_impl<>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return 0;
  }
};

//===============================================================

template <char... Digits>
constexpr unsigned long long operator "" _binary()
{
  return binary_literal_impl<Digits...>::to_ulonglong();
}

//===============================================================

#include <iostream>

int main()
{
  using namespace std;

  const unsigned long long bits =
    11011110101011011011111011101111_binary;
  cout << "The number is: " << hex << bits << endl;
}

//===============================================================

Testing If std::tuple Is Being Applied To Functors Efficiently

Overview

In computer science it is never good enough to make a claim when the claim can be verified through testing. Fortunately C++ is a rich language which significantly and often aids the writing of test code.

In my previous blog article, Applying std::tuple To Functors Efficiently and Bearded Code Warrior's (BCW) most recent blog entry, Applying a function to a tuple – once again, claims are made that the C++ compiler will pass the tuple arguments efficiently to the desired function.

In my previous post, I had a long preamble explaining how the solution would work before presenting the solution near the end. In this post, I will start with the end results and explain how the code was written to obtain those results. This is done to emphasize why verifying claims/properties of code you've written is important.

Please note that BCW's apply() code is called apply_tuple() within my code. To avoid confusion, within this post all names of any BCW apply() functions are renamed to apply_tuple().

Update Feb. 3, 2012: I have changed Operand's operator ()(T…) function overloads to return Operand*. This ensures no copies/move are made with the returned value from the functor (as such is not relevant to this article series). All code in this series has been updated accordingly.

Article Source Code: cpp_code-apply_tuple-v3.zip.


What Is The Optimal Solution?

Before seeing any test results or writing any test code, first consider what is the optimal solution given a function applied to some arguments:

apply(foo, arg1, arg2, ..., argN);
apply_tuple(foo, make_tuple(arg1, arg2, ..., argN);

Assuming arg1, arg2, …, argN (ARGS) are all C++ class types; assuming all arguments were created prior to the call; and all arguments are valid rvalue references, then the optimal solutions are:

Table 1: Theoretical Optimal Apply Function Results
Call foo Function All N arguments
apply 0 misc
0 copies
0 moves
0 assign
0 misc
0 copies
0 moves
0 assign
apply_tuple 0 misc
0 copies
0 moves
0 assign
0 misc
0 copies
0 moves
0 assign

where:

Table 2: Definitions Of Terms In Table 1
Term Definition
misc Refers to the invocation of a default or any other constructor excluding any copy or move constructors.
copies Refers to the invocation of any copy constructor.
moves Refers to the invocation of any move constructor.
assign Refers to the invocation of any copy/move assignment operator.

For all tests all arguments (except foo) will be created as unnamed or compiler-generated temporary variables. Using only such temporary variables implies that the compiler should be able to reuse them to avoid doing any unnecessary copies/moves, etc. Provided the code being tested is properly written, optimal code should be generated as a result. If such is seen, then this test ought to also be a nice way to verify and test the claim that C++11 allows perfect forwarding!

Table 1 needs to be adjusted since temporary variables will be used throughout for all arguments except the function call argument. Specifically, the following items need to be added to Table 1:

  • the creation of the required (unnamed) initial variables, and,
  • any necessary copies, moves, and/or assignments that must be done for those variables.

These adjustments are reflected in Table 3:

Table 3: Theoretical Optimal Apply Function Results Adjusted For Required Initial Rvalue Reference Variables
Call foo Function All N arguments
apply 1 misc
0 copies
0 moves
0 assign
N misc
0 copies
0 moves
0 assign
apply_tuple 1 misc
0 copies
0 moves
0 assign
N misc
0 copies
N moves
0 assign

Clearly to populate a std::tuple with N rvalue references, a total of N moves, not copies, must be done. Additionally the actual original variables/functor must be created and this accounts for the misc constructor invocations listed in Table 3. Finally, in the case when the foo operation has no arguments, since zero-argument functions often do return a value, a value will be returned in those tests. Thus, there will be "1 misc" foo constructor invocation for any zero-argument cases in the results.

Finally note that within all tests performed within this article, all calls to apply_tuple will use std::make_tuple. Now that the theory looks good, let's check out reality!


The Test Results

Results For My Version Of apply() And apply_tuple()

Table 4 summarizes the results of using my previous blog entry's apply() and apply_tuple():

Table 4: Test Results Using Paul Preney's Apply Code
Call Functor Overhead N Argument Overhead
0 1 2 3
apply 1 misc
0 copies
0 moves
0 assign
1 misc
0 copies
0 moves
0 assign
1 misc
0 copies
0 moves
0 assign
2 misc
0 copies
0 moves
0 assign
3 misc
0 copies
0 moves
0 assign
apply_tuple 1 misc
0 copies
0 moves
0 assign
1 misc
0 copies
0 moves
0 assign
1 misc
0 copies
1 moves
0 assign
2 misc
0 copies
2 moves
0 assign
3 misc
0 copies
3 moves
0 assign

which matches the optimal results in Table 3. (Recall the zero-argument case creates one Operand as its return value hence the "1 misc" in those cases.) The actual test program output for apply() was:

Op() = 1.

Op LOG ============
default_constructor[this=0x7fff7c331c48]
destructor[this=0x7fff7c331c48]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331c50]
destructor[this=0x7fff7c331c50]
Operand HIST ===========
other_constructor 1
destructor 1
Operand END ============

Op()(Operand(-3)) = -3.

Op LOG ============
default_constructor[this=0x7fff7c331c49]
destructor[this=0x7fff7c331c49]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331c60]
destructor[this=0x7fff7c331c60]
Operand HIST ===========
other_constructor 1
destructor 1
Operand END ============

Op()(Operand(-3),Operand(-10)) = -13.

Op LOG ============
default_constructor[this=0x7fff7c331c4a]
destructor[this=0x7fff7c331c4a]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331c80]
other_constructor[this=0x7fff7c331c70]
destructor[this=0x7fff7c331c70]
destructor[this=0x7fff7c331c80]
Operand HIST ===========
other_constructor 2
destructor 2
Operand END ============

Op()(Operand(-3),Operand(-10),Operand(-100)) = -113.

Op LOG ============
default_constructor[this=0x7fff7c331c4b]
destructor[this=0x7fff7c331c4b]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331cb0]
other_constructor[this=0x7fff7c331ca0]
other_constructor[this=0x7fff7c331c90]
destructor[this=0x7fff7c331c90]
destructor[this=0x7fff7c331ca0]
destructor[this=0x7fff7c331cb0]
Operand HIST ===========
other_constructor 3
destructor 3
Operand END ============

and the output for apply_tuple() was:

Op() = 1.

Op LOG ============
default_constructor[this=0x7fff7c331c4c]
destructor[this=0x7fff7c331c4c]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331cc0]
destructor[this=0x7fff7c331cc0]
Operand HIST ===========
other_constructor 1
destructor 1
Operand END ============

Op()(Operand(-3)) = -3.

Op LOG ============
default_constructor[this=0x7fff7c331c4d]
destructor[this=0x7fff7c331c4d]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331cd0]
move_constructor[this=0x7fff7c331ce0,src=0x7fff7c331cd0]
destructor[this=0x7fff7c331ce0]
destructor[this=0x7fff7c331cd0]
Operand HIST ===========
move_constructor 1
other_constructor 1
destructor 2
Operand END ============

Op()(Operand(-3),Operand(-10)) = -13.

Op LOG ============
default_constructor[this=0x7fff7c331c4e]
destructor[this=0x7fff7c331c4e]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331d00]
other_constructor[this=0x7fff7c331cf0]
move_constructor[this=0x7fff7c331d40,src=0x7fff7c331d00]
move_constructor[this=0x7fff7c331d44,src=0x7fff7c331cf0]
destructor[this=0x7fff7c331d44]
destructor[this=0x7fff7c331d40]
destructor[this=0x7fff7c331cf0]
destructor[this=0x7fff7c331d00]
Operand HIST ===========
move_constructor 2
other_constructor 2
destructor 4
Operand END ============

Op()(Operand(-3),Operand(-10),Operand(-100)) = -113.

Op LOG ============
default_constructor[this=0x7fff7c331c4f]
destructor[this=0x7fff7c331c4f]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331d30]
other_constructor[this=0x7fff7c331d20]
other_constructor[this=0x7fff7c331d10]
move_constructor[this=0x7fff7c331d50,src=0x7fff7c331d30]
move_constructor[this=0x7fff7c331d54,src=0x7fff7c331d20]
move_constructor[this=0x7fff7c331d58,src=0x7fff7c331d10]
destructor[this=0x7fff7c331d58]
destructor[this=0x7fff7c331d54]
destructor[this=0x7fff7c331d50]
destructor[this=0x7fff7c331d10]
destructor[this=0x7fff7c331d20]
destructor[this=0x7fff7c331d30]
Operand HIST ===========
move_constructor 3
other_constructor 3
destructor 6
Operand END ============

where:

  • Op os foo,
  • Op() is the creation of a functor,
  • Op()(...) is the invocation of the functor with the … arguments passed in,
  • Operand is the type representing all arguments and return values,
  • Operand(value) is the creation of an argument with the value of value,
  • Op_LOG and Operand_LOG are the constructor, assignment, and destructor actions performed during the test in chronological order for all Op and Operand types respectively, and,
  • Op_HIST and Operand_HIST are the frequency histograms of the actions performed during the test for all Op and Operand types respectively.

These outputs were produced by the following code:

// TEST apply()...
APPLY_FULL0(Op);
APPLY_FULL1(Op, -3);
APPLY_FULL2(Op, -3, -10);
APPLY_FULL3(Op, -3, -10, -100);

// TEST apply_tuple()...
APPLY_TUPLE_FULL0(Op);
APPLY_TUPLE_FULL1(Op, -3);
APPLY_TUPLE_FULL2(Op, -3, -10);
APPLY_TUPLE_FULL3(Op, -3, -10, -100);

where APPLY_FULLn and APPLY_TUPLE_FULLn are macros that will be defined below. The reason macros were used was to ensure no side-effects were incurred through the use of function calls.

This test demonstrates:

  • my apply() and apply_tuple() implementations are optimal, and,
  • perfect forwarding of function arguments is a real achievement of C++11. 🙂

The code enabling these tests to be performed will be discussed after discussing the results of BCW's apply_tuple().


Results For BCW's Version Of apply_tuple()

BCW's apply_tuple() does not use any rvalue references and instead relies on the compiler optimizing the use of all recursive calls. Consequently, my hypothesis was that there would be more overhead in the general case using objects as the compiler would need to make copies of class instances. This is understandable since C++ compilers cannot use the multitude of assumptions it uses for any primitive types such as int, double, etc. That said, I was shocked to see just how bad the results were using BCW's code (after adding appropriate Operand operations supporting lvalues so BCW's code would compile):

Table 5: Test Results Using BCW's Apply Code's Function Call Overhead
Call Functor Overhead With N Arguments
0 1 2 3
apply N/A
apply_tuple 1 misc
0 copies
0 moves
0 assign
1 misc
1 copies
0 moves
0 assign
1 misc
2 copies
0 moves
0 assign
1 misc
3 copies
0 moves
0 assign

which shows how BCW's code unnecessarily copies the function object with each recursive call and:

Table 6: Test Results Using BCW's Apply Code's Argument Overhead
Call N Argument Overhead
0 1 2 3
apply N/A
apply_tuple 1 misc
0 copies
0 moves
0 assign
1 misc
3 copies
2 moves
0 assign
2 misc
8 copies
3 moves
0 assign
3 misc
16 copies
4 moves
0 assign

which shows how quickly the number of copies explodes as the argument size increases as well that the number of moves is equal to the number of arguments + 1, except in the zero-argument case as a special function overload is used to handle tuples with no elements.

Thus, it appears that BCW's code requires:

  • N copies of its function object, and,
  • some 2^(N+1) copies with N+1 moves of its arguments

for a single N-argument function call instead of the optimal:

  • zero copies of its function object, and,
  • zero copies with N moves of the function arguments.

So for N arguments:

  • N extra copies of the function argument are created,
  • it appears some O(2^(N+1)) copies of the arguments are created, and,
  • N extra moves of the arguments are performed.

all with GCC v4.7 (snapshot 2011-10-22) compiled using -std=c++0x and -O3 options! Wow!

This clearly shows just how relevant and practically useful rvalue references are in C++11 or any language that supports objects. Moreover, given how common functions with zero, one, two, and three arguments are, certainly BCW's code shouldn't be used unmodified with class/struct/union types. Thus, although BCW's code may compile and optimize well with the built-in C++ types, it fails to be anywhere near optimal for general class types.

Interestingly, one should note that BCW's solution, as posted in the blog article, performs, for N arguments, N recursive calls each passing increasing numbers of arguments by value. Such is clearly reflected in the values summarized in Table 5 and Table 6 whose values which were obtained from this test program output (whose computed output is also not completely correct due to some serious compiler warnings about references to temporaries in the recursive step of BCW's solution):

Op() = 1.

Op LOG ============
default_constructor[this=0x7fffede986cc]
destructor[this=0x7fffede986cc]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fffede986d0]
destructor[this=0x7fffede986d0]
Operand HIST ===========
other_constructor 1
destructor 1
Operand END ============

Op()(Operand(-3)) = 20.

Op LOG ============
default_constructor[this=0x7fffede986cd]
copy_constructor[this=0x7fffede985bf,src=0x7fffede986cd]
destructor[this=0x7fffede985bf]
destructor[this=0x7fffede986cd]
Op HIST ===========
default_constructor 1
copy_constructor 1
destructor 2
Op END ============
Operand LOG ============
other_constructor[this=0x7fffede986e0]
move_constructor[this=0x7fffede986f0,src=0x7fffede986e0]
copy_constructor[this=0x7fffede985d0,src=0x7fffede986f0]
copy_constructor[this=0x7fffede985c0,src=0x7fffede986f0]
copy_constructor[this=0x7fffede985f0,src=0x7fffede985d0]
move_constructor[this=0x7fffede985e0,src=0x7fffede985f0]
destructor[this=0x7fffede985f0]
destructor[this=0x7fffede985e0]
destructor[this=0x7fffede985c0]
destructor[this=0x7fffede985d0]
destructor[this=0x7fffede986f0]
destructor[this=0x7fffede986e0]
Operand HIST ===========
copy_constructor 3
move_constructor 2
other_constructor 1
destructor 6
Operand END ============

Op()(Operand(-3),Operand(-10)) = -13.

Op LOG ============
default_constructor[this=0x7fffede986ce]
copy_constructor[this=0x7fffede984ce,src=0x7fffede986ce]
copy_constructor[this=0x7fffede984cf,src=0x7fffede984ce]
destructor[this=0x7fffede984cf]
destructor[this=0x7fffede984ce]
destructor[this=0x7fffede986ce]
Op HIST ===========
default_constructor 1
copy_constructor 2
destructor 3
Op END ============
Operand LOG ============
other_constructor[this=0x7fffede98710]
other_constructor[this=0x7fffede98700]
move_constructor[this=0x7fffede98750,src=0x7fffede98710]
move_constructor[this=0x7fffede98754,src=0x7fffede98700]
copy_constructor[this=0x7fffede984d0,src=0x7fffede98754]
copy_constructor[this=0x7fffede98520,src=0x7fffede98750]
copy_constructor[this=0x7fffede98524,src=0x7fffede98754]
copy_constructor[this=0x7fffede984f0,src=0x7fffede98520]
copy_constructor[this=0x7fffede984e0,src=0x7fffede984d0]
copy_constructor[this=0x7fffede98530,src=0x7fffede98520]
copy_constructor[this=0x7fffede98534,src=0x7fffede98524]
copy_constructor[this=0x7fffede98510,src=0x7fffede984e0]
move_constructor[this=0x7fffede98500,src=0x7fffede98510]
destructor[this=0x7fffede98510]
destructor[this=0x7fffede98500]
destructor[this=0x7fffede98534]
destructor[this=0x7fffede98530]
destructor[this=0x7fffede984e0]
destructor[this=0x7fffede984f0]
destructor[this=0x7fffede98524]
destructor[this=0x7fffede98520]
destructor[this=0x7fffede984d0]
destructor[this=0x7fffede98754]
destructor[this=0x7fffede98750]
destructor[this=0x7fffede98700]
destructor[this=0x7fffede98710]
Operand HIST ===========
copy_constructor 8
move_constructor 3
other_constructor 2
destructor 13
Operand END ============

Op()(Operand(-3),Operand(-10),Operand(-100)) = -113.

Op LOG ============
default_constructor[this=0x7fffede986cf]
copy_constructor[this=0x7fffede984de,src=0x7fffede986cf]
copy_constructor[this=0x7fffede984df,src=0x7fffede984de]
copy_constructor[this=0x7fffede9831f,src=0x7fffede984df]
destructor[this=0x7fffede9831f]
destructor[this=0x7fffede984df]
destructor[this=0x7fffede984de]
destructor[this=0x7fffede986cf]
Op HIST ===========
default_constructor 1
copy_constructor 3
destructor 4
Op END ============
Operand LOG ============
other_constructor[this=0x7fffede98740]
other_constructor[this=0x7fffede98730]
other_constructor[this=0x7fffede98720]
move_constructor[this=0x7fffede98760,src=0x7fffede98740]
move_constructor[this=0x7fffede98764,src=0x7fffede98730]
move_constructor[this=0x7fffede98768,src=0x7fffede98720]
copy_constructor[this=0x7fffede984e0,src=0x7fffede98768]
copy_constructor[this=0x7fffede98510,src=0x7fffede98760]
copy_constructor[this=0x7fffede98514,src=0x7fffede98764]
copy_constructor[this=0x7fffede98518,src=0x7fffede98768]
copy_constructor[this=0x7fffede98500,src=0x7fffede98514]
copy_constructor[this=0x7fffede984f0,src=0x7fffede984e0]
copy_constructor[this=0x7fffede98520,src=0x7fffede98510]
copy_constructor[this=0x7fffede98524,src=0x7fffede98514]
copy_constructor[this=0x7fffede98528,src=0x7fffede98518]
copy_constructor[this=0x7fffede98340,src=0x7fffede98520]
copy_constructor[this=0x7fffede98330,src=0x7fffede98500]
copy_constructor[this=0x7fffede98320,src=0x7fffede984f0]
copy_constructor[this=0x7fffede98370,src=0x7fffede98520]
copy_constructor[this=0x7fffede98374,src=0x7fffede98524]
copy_constructor[this=0x7fffede98378,src=0x7fffede98528]
copy_constructor[this=0x7fffede98360,src=0x7fffede98320]
move_constructor[this=0x7fffede98350,src=0x7fffede98360]
destructor[this=0x7fffede98360]
destructor[this=0x7fffede98350]
destructor[this=0x7fffede98378]
destructor[this=0x7fffede98374]
destructor[this=0x7fffede98370]
destructor[this=0x7fffede98320]
destructor[this=0x7fffede98330]
destructor[this=0x7fffede98340]
destructor[this=0x7fffede98528]
destructor[this=0x7fffede98524]
destructor[this=0x7fffede98520]
destructor[this=0x7fffede984f0]
destructor[this=0x7fffede98500]
destructor[this=0x7fffede98518]
destructor[this=0x7fffede98514]
destructor[this=0x7fffede98510]
destructor[this=0x7fffede984e0]
destructor[this=0x7fffede98768]
destructor[this=0x7fffede98764]
destructor[this=0x7fffede98760]
destructor[this=0x7fffede98720]
destructor[this=0x7fffede98730]
destructor[this=0x7fffede98740]
Operand HIST ===========
copy_constructor 16
move_constructor 4
other_constructor 3
destructor 23
Operand END ============

using the same macro code as the previous section's apply_tuple with my apply and apply_tuple code was deleted and replaced with BCW's code instead.

BCW's work is typically thorough but this instance shows that optimizations aren't magical, the programmer must use rvalue references where they ought to be used, and one should always double-check efficiency claims for code written. While in this instance BCW's code was found to have issues, we're all human and all coders all been guilty on various occassions of the assuming code efficiencies that weren't true!

I tried tweaking BCW's code while remaining faithful to the style of his solution by adding rvalue references. Unfortuantely, I ran into a snag: the snapshot of GCC v4.7 I am using gives "Sorry, unimplemented" messages when one tries to expand the TS… parameter pack inside a template argument. The latter is needed to obtain the type of each tuple argument using std::tuple_element so it can be passed to std::forward. Without such, one must use a class template as per my solution.


Tracking All Constructors, Assignments, and Destructors

Defining The Tracking Class

So how would one go about easily tracking and logging all constructor invocations, copy/move assignments, and when destructors are called? First one starts by writing a class template that does exactly this (say in a file called tracker.hxx):

#ifndef tracker_hxx_
#define tracker_hxx_

#include <list>
#include <map>
#include <tuple>
#include <ostream>

template <typename T>
struct tracker
{
  public:
    // Handle default constructor tracking...
    tracker()
    {
      update(DefaultConstr, this);
    }

    // Invoke for tracking any other constructors...
    explicit tracker(int)
    {
      update(OtherConstr, this);
    }

    // Handle copy constructor tracking...
    tracker(tracker const& src)
    {
      update(
        CopyConstr,
        this,
        &const_cast<tracker&>(src)
      );
    }

    // Handle move constructor tracking...
    tracker(tracker&& src)
    {
      update(MoveConstr, this, &src);
    }

    // Handle copy-assignment operator tracking...
    tracker& operator =(tracker const& src)
    {
      update(
        CopyAssign,
        this,
        &const_cast<tracker&>(src)
      );
      return *this;
    }

    // Handle move-assignment operator tracking...
    tracker& operator =(tracker&& src)
    {
      update(MoveAssign, this, &src);
      return *this;
    }

    // Handle destructor tracking...
    ~tracker()
    {
      update(Destruct, this);
    }

    // Clear out all log and histogram data...
    static void clear()
    {
      log_.clear();
      hist_.clear();
    }

    // Output the log to a stream...
    static std::ostream& output_log(std::ostream& os)
    {
      for (log_entry_type& le : log_)
      {
        output_log_entry(os, le);
        os << '\n';
      }
      return os;
    }

    // Output the histogram to a stream...
    static std::ostream& output_hist(std::ostream& os)
    {
      for (typename hist_type::value_type& he : hist_)
      {
        output_hist_entry(os, he);
        os << '\n';
      }
      return os;
    }

  private:
    // Enumeration to identify the various operations being logged
    // and tracked in the frequency histogram...
    enum op
    {
      DefaultConstr, CopyConstr, MoveConstr, OtherConstr,
      CopyAssign, MoveAssign,
      Destruct
    };

    // Define log_entry_type as a std::tuple such that:
    //   <0> is an instance of op
    //   <1> indicates the this object
    //   <2> indicates the source object (e.g., for copies/moves)
    typedef std::tuple<op, tracker*, tracker*> log_entry_type;

    // Define log_type as a list of log_entry_types...
    typedef std::list<log_entry_type> log_type;

    // Define hist_type as a map from op to size_t...
    typedef std::map<op, std::size_t> hist_type;

    // Update tracked information in the log and histogram...
    static void update(op o, tracker* this_v, tracker* src = nullptr)
    {
      auto log_entry = log_entry_type(o, this_v, src);

      // Append to log...
      log_.push_back(log_entry);

      // Update histogram...
      ++hist_[o];
    }

    // Output a log entry...
    static std::ostream& output_log_entry(std::ostream& os,
      log_entry_type const& le)
    {
      switch (std::get<0>(le))
      {
        case DefaultConstr:
          os
            << "default_constructor[this="
            << std::get<1>(le)
            << ']'
          ;
          break;

        case CopyConstr:
          os
            << "copy_constructor[this="
            << std::get<1>(le)
            << ",src="
            << std::get<2>(le)
            << ']'
          ;
          break;

        case MoveConstr:
          os
            << "move_constructor[this="
            << std::get<1>(le)
            << ",src="
            << std::get<2>(le)
            << ']'
          ;
          break;

        case OtherConstr:
          os
            << "other_constructor[this="
            << std::get<1>(le)
            << ']'
          ;
          break;

        case CopyAssign:
          os
            << "copy_assignment[this="
            << std::get<1>(le)
            << ",src="
            << std::get<2>(le)
            << ']'
          ;
          break;

        case MoveAssign:
          os
            << "move_assignment[this="
            << std::get<1>(le)
            << ",src="
            << std::get<2>(le)
            << ']'
          ;
          break;

        case Destruct:
          os
            << "destructor[this="
            << std::get<1>(le)
            << ']'
          ;
          break;
      }
      return os;
    }

    // Output a hist entry...
    static std::ostream& output_hist_entry(std::ostream& os,
      typename hist_type::value_type const& he)
    {
      switch (he.first)
      {
        case DefaultConstr:
          os << "default_constructor " << he.second;
          break;

        case CopyConstr:
          os << "copy_constructor " << he.second;
          break;

        case MoveConstr:
          os << "move_constructor " << he.second;
          break;

        case OtherConstr:
          os << "other_constructor " << he.second;
          break;

        case CopyAssign:
          os << "copy_assignment " << he.second;
          break;

        case MoveAssign:
          os << "move_assignment " << he.second;
          break;

        case Destruct:
          os << "destructor " << he.second;
          break;
      }
      return os;
    }

    // Define the log_ and hist_ variables statically...
    static log_type log_;
    static hist_type hist_;
};

// All class static variables must be instantiated outside the class...
template <typename T>
typename tracker<T>::log_type tracker<T>::log_;

template <typename T>
typename tracker<T>::hist_type tracker<T>::hist_;

#endif // #ifndef tracker_hxx_

Noteworthy is how almost everything is defined as private in order to not pollute the derived class with symbols. Even the static functions could have been defined outside the tracker class as friends to further avoid any pollution.

The idea behind tracker<T> is to:

  • log all constructor, copy/move assignment, and destructor invocations as they occur for some type T,
  • to log all object names (i.e., their addresses) upon which these invocations occur,
  • to log any source objects for all copying and move operations, and,
  • to keep an up-to-date frequency histogram of how many times each type of invocation occurred.

Using The Tracker Class With The Curiously Recurring Template Pattern (CRTP)

With such, one need only inherit from tracker passing in the name of the class doing the inheriting:

class Op : tracker<Op>
{
  public:
    using tracker<Op>::tracker<Op>;
};

Unfortunately, it appears the version of GCC I am using does not implement using syntax with inheriting constructors yet. As a work-around, one needs to define each constructor, assignment operator, and destructor in the derived class and invoke the same in the parent:

struct Op : tracker<Op>
{
  Op()
    : tracker()
  {
  }

  Op(Op const& a)
    : tracker(a)
  {
  }

  Op(Op&& a)
    : tracker(std::move(a))
  {
  }

  Op& operator =(Op const& a)
  {
    tracker::operator =(a);
    return *this;
  }

  Op& operator =(Op&& a)
  {
    tracker::operator =(std::move(a));
    return *this;
  }

  ~Op() = default;
};

Passing the class one is defining as a template argument to its base class is permitted and extremely useful in C++. It is a C++ programming pattern is called the Curiously Recurring Template Pattern (CRTP) and it works because all template instantiations are lazy (i.e., they only occur when they are required) and C++ permits forward declarations of types. When the compiler encounters class Op it now knows that Op is a class, so when it is told to inherit from tracker<Op> it is not an error provided that the use of Op inside of tracker does not need to know anything inside Op's definition outside of any member function's definition (not prototype)! If one tries to access members of Op from within tracker (but not within its member functions) the C++ compiler will give an error message about Op being an "incomplete type". Since tracker does no such thing, the compiler inherits from tracker<Op> and then processes the definition of Op!

The CRTP design pattern is extremely useful in C++ programs especially to perform the following:

  • to automatically perform tracking/logging functions including automatically adding/removing all instances of a type to data structures, and,
  • to eliminate the overhead of virtual function calls when run-time object polymorphism is not needed.

Since we want to track the function object instances and the function argument instances, two classes will need to be defined using CRTP:

  • Op representing the function object type, and,
  • Operand representing the function arguments' type.

Such would be defined as follows (say in a file called op_operand.hxx):

#ifndef op_operand_hxx_
#define op_operand_hxx_

#include "tracker.hxx"
#include <ostream>
#include <iostream>

//====================================================================

// Define the function argument type. It will hold an int value...
struct Operand : tracker<Operand>
{
  Operand()
    : tracker(), value(0) { }
  Operand(int v)
    : tracker(0), value(v) { }
  Operand(Operand const& a)
    : tracker(a), value(a.value) { }
  Operand(Operand&& a)
    : tracker(std::move(a)), value(a.value) { a.value = 0; }
  Operand& operator =(Operand const& a)
    { tracker::operator =(a); value = a.value; return *this; }
  Operand& operator =(Operand&& a)
  {
    tracker::operator =(std::move(a));
    value = a.value;
    a.value = 0;
    return *this;
  }
  ~Operand() = default;

  int value;
};

// Define the ability for Operands to be written to a stream...
std::ostream& operator <<(std::ostream& os, Operand const& o)
{
  os << o.value;
  return os;
}

//====================================================================

// Define the functor type...
struct Op : tracker<Op>
{
  Op() : tracker() { }
  Op(Op const& a) : tracker(a) { }
  Op(Op&& a) : tracker(std::move(a)) { }
  Op& operator =(Op const& a)
    { tracker::operator =(a); return *this; }
  Op& operator =(Op&& a)
    { tracker::operator =(std::move(a)); return *this; }
  ~Op() = default;

  // Define calling a function with no arguments...
  Operand* operator ()() const
    { return nullptr; }

  // Define calling a function with one argument...
  Operand* operator ()(Operand&& x) const
    { return nullptr; }
#ifdef OPERAND_LVALUE_REF_SUPPORT
  Operand* operator ()(Operand const& x) const
    { return nullptr; }
#endif

  // Define calling a function with two arguments...
  Operand* operator ()(Operand&& x, Operand&& y) const
    { x.value += y.value; return std::move(x); }
#ifdef OPERAND_LVALUE_REF_SUPPORT
  Operand* operator ()(Operand const& x, Operand const& y) const
  {
    Operand retval(x);
    retval.value += y.value;
    return nullptr;
  }
#endif

  // Define calling a function with three arguments...
  Operand* operator ()(Operand&& x, Operand&& y,
    Operand&& z) const
    { x.value += y.value + z.value; return nullptr; }
#ifdef OPERAND_LVALUE_REF_SUPPORT
  Operand* operator ()(Operand const& x, Operand const& y,
    Operand const& z) const
  {
    Operand retval(x);
    retval.value += y.value + z.value;
    return nullptr;
  }
#endif
};

//====================================================================

// Define all testing macros...

#define APPLY_TEST0(i) \
  std::cout << #i " = " << apply(i) << "." << std::endl

#define APPLY_TEST1(i,a) \
  std::cout << #i "(" #a ") = " << apply(i,a) << "." << std::endl

#define APPLY_TEST2(i,a,b) \
  std::cout << #i "(" #a "," #b ") = " << apply(i,a,b) \
  << "." << std::endl

#define APPLY_TEST3(i,a,b,c) \
  std::cout << #i "(" #a "," #b "," #c ") = " \
    << apply(i,a,b,c) << "." << std::endl


#define APPLY_TUPLE_TEST0(i) \
  std::cout << #i " = " << apply_tuple(i,std::tuple<>()) \
  << "." << std::endl

#define APPLY_TUPLE_TEST1(i,a) \
  std::cout << #i "(" #a ") = " \
    << apply_tuple(i,std::make_tuple(a)) << "." << std::endl

#define APPLY_TUPLE_TEST2(i,a,b) \
  std::cout << #i "(" #a "," #b ") = " \
    << apply_tuple(i,std::make_tuple(a,b)) << "." << std::endl

#define APPLY_TUPLE_TEST3(i,a,b,c) \
  std::cout << #i "(" #a "," #b "," #c ") = " \
    << apply_tuple(i,std::make_tuple(a,b,c)) << "." << std::endl


#define OUTPUT(type) \
  std::cout \
    << #type " LOG ============\n" \
    << tracker<type>::output_log \
    << #type " HIST ===========\n" \
    << tracker<type>::output_hist \
    << #type " END ============\n" \
  ;


#define CLEAR(type) tracker<type>::clear();

#define APPLY_FULL0(type) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TEST0(type()); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_FULL1(type,a) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TEST1(type(),Operand(a)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_FULL2(type,a,b) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TEST2(type(),Operand(a),Operand(b)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_FULL3(type,a,b,c) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TEST3(type(),Operand(a),Operand(b),Operand(c)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_TUPLE_FULL0(type) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TUPLE_TEST0(type()); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_TUPLE_FULL1(type,a) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TUPLE_TEST1(type(),Operand(a)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_TUPLE_FULL2(type,a,b) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TUPLE_TEST2(type(),Operand(a),Operand(b)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_TUPLE_FULL3(type,a,b,c) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TUPLE_TEST3(type(),Operand(a),Operand(b),Operand(c)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


//====================================================================

#endif // #ifndef op_operand_hxx_


Running The Tests

The C++ code to run tests on my code is:

#include <cstddef>

//====================================================================

template <std::size_t...>
struct indices;

template <std::size_t N, typename Indices, typename... Types>
struct make_indices_impl;

template <
  std::size_t N,
  std::size_t... Indices,
  typename Type, typename... Types
>
struct make_indices_impl<N, indices<Indices...>, Type, Types...>
{
  typedef
    typename make_indices_impl<
      N+1,
      indices<Indices...,N>,
      Types...>::type
    type
  ;
};

template <std::size_t N, std::size_t... Indices>
struct make_indices_impl<N, indices<Indices...>>
{
  typedef indices<Indices...> type;
};

template <std::size_t N, typename... Types>
struct make_indices
{
  typedef
    typename make_indices_impl<
      0,
      indices<>,
      Types...
    >::type
    type
  ;
};

//====================================================================

#include <tuple>
#include <functional>

//====================================================================

template <typename Op, typename... Args>
auto apply(Op&& op, Args&&... args)
  -> typename std::result_of<Op(Args...)>::type
{
  return op( std::forward<Args>(args)... );
}

//====================================================================

template <
  typename Indices
>
struct apply_tuple_impl;

template <
  template <std::size_t...> class I,
  std::size_t... Indices
>
struct apply_tuple_impl<I<Indices...>>
{
  template <
    typename Op,
    typename... OpArgs,
    template <typename...> class T = std::tuple
  >
  static auto apply_tuple(Op&& op, T<OpArgs...>&& t)
    -> typename std::result_of<Op(OpArgs...)>::type
  {
    return op( std::forward<OpArgs>(std::get<Indices>(t))... );
  }
};

template <
  typename Op,
  typename... OpArgs,
  typename Indices = typename make_indices<0, OpArgs...>::type,
  template <typename...> class T = std::tuple
>
auto apply_tuple(Op&& op, T<OpArgs...>&& t)
  -> typename std::result_of<Op(OpArgs...)>::type
{
  return apply_tuple_impl<Indices>::apply_tuple(
    std::forward<Op>(op),
    std::forward<T<OpArgs...>>(t)
  );
}

//====================================================================

#include "tracker.hxx"
#include "op_operand.hxx"

int main()
{
  // TEST apply()...
  APPLY_FULL0(Op);
  APPLY_FULL1(Op, -3);
  APPLY_FULL2(Op, -3, -10);
  APPLY_FULL3(Op, -3, -10, -100);

  std::cout
    << "====================================="
    << std::endl
    << std::endl
  ;

  // TEST apply_tuple()...
  APPLY_TUPLE_FULL0(Op);
  APPLY_TUPLE_FULL1(Op, -3);
  APPLY_TUPLE_FULL2(Op, -3, -10);
  APPLY_TUPLE_FULL3(Op, -3, -10, -100);

  return 0;
}

//====================================================================

and the C++ code to run tests on BCW's code is:

#include <cstddef>
#include <tuple>
#include <functional>

//====================================================================

// BCW's apply() renamed to apply_tuple()...

template <typename Func, typename... TupleTypes,
          typename... UnwrappedParams>
inline typename std::result_of<Func(TupleTypes...)>::type
  apply_tuple(Func f, std::tuple<TupleTypes...> ts,
      UnwrappedParams... params)
{
    return apply_tuple(f, ts, params...,
        std::get<sizeof...(UnwrappedParams)>(ts));
}

template <typename Func, typename... TupleTypes>
inline typename std::result_of<Func(TupleTypes...)>::type
  apply_tuple(Func f, std::tuple<TupleTypes...> ts,
      TupleTypes... params)
{
    return f(params...);
}

template <typename Func>
inline auto apply_tuple(Func f, std::tuple<> ts) -> decltype(f())
{
    return f();
}

//====================================================================

#include "tracker.hxx"

// BCW's code requires Op::operator ()() to support lvalues to compile.
// Thus, it is necessary to define the next line to have BCW's code
// to compile...
#define OPERAND_LVALUE_REF_SUPPORT
#include "op_operand.hxx"

int main()
{
  // TEST apply_tuple()...
  APPLY_TUPLE_FULL0(Op);
  APPLY_TUPLE_FULL1(Op, -3);
  APPLY_TUPLE_FULL2(Op, -3, -10);
  APPLY_TUPLE_FULL3(Op, -3, -10, -100);

  return 0;
}

//====================================================================


Closing Comments

Success has been achieved and tested! The apply functions were each tested to see whether or not they would perform optimally using C++11's rvalue references and ability to perfectly forward function arguments. Also demonstrated was how recursively defined code can have very significant overheads but such can be elided if code is written to exploit perfect argument forwarding with only a small amount of C++ template metaprogramming code. Once done, one can use such codes without being concerned about their overheads!

Another important lesson for myself and everyone that reads this post is just how important testing code for efficiency concerns –not just correctness– is before adding such code to a real program or into a library that is used in real programs. Ideally code once tested, debugged, etc. can be used as a blackbox with its documented guarantees. For this ideal to be reached, the properties and claims made in the documentation about the code need to be confirmed through testing –not just code study.

Should any any errors or omissions be found within this post, kindly advise me of such!

Happy Coding!

Applying std::tuple To Functors Efficiently

Overview

The new C++11 standard nicely adds tuples (i.e., std::tuple found in <tuple>) and while it is easy to invoke std::make_tuple and pass a number of arguments to create a new std::tuple there does not appear to be any function that efficiently does the reverse (i.e., apply all tuple elements efficiently to a function or a functor). This post details how to do such.

Update (Dec. 18, 2011): I have added some comments concerning the use of std::move in some return statements. When returning values from a function, one should simply write return somevalue; to enable the compiler to choose between moving and return value optimization (RVO). Because this article focuses on discussing move operations, I did not write the code to exploit RVO –it explicitly moves everythng– in order to avoid any confusion, but, please be aware that normally one would not use std::move on a variable being returned in a return statement.

Article Source Code: See the next article.


Using std::make_tuple

Using std::make_tuple to construct a tuple is very straight-forward:

#include <string>
#include <tuple>

int main()
{
  auto x = std::make_tuple(1, 4.3, std::string{"Hello!"});
}

would induce the variable x to have the type:

std::tuple<int, double, std::string>

The std::make_tuple function is a convenience function that allows one to avoid having to specify both the types and the values for a specific tuple. For example, the above code could have been written:

#include <string>
#include <tuple>

int main()
{
  std::tuple<int,double,std::string> x{1, 4.3, std::string{"Hello!"}};
}

It is also particularly handy when one needs to create and pass a tuple to a function as a temporary variable. By simply invoking std::make_tuple in a way that ensures its returned tuple is passed to the function, avoids having to formally declare tuple variables.

Accessing std::tuple's Elements

Given some std::tuple instance, t, the C++11 standard allows us to retrieve a specific element of a std::tuple by using get<N>(t) where N is the zero-based index of the desired element. It is also possible to retrieve the number of elements as well as their types, but, these will not be used in the solution presented in this article.

Requirements Of The Solution

The final solution should concern using a function called apply which should take these parameters:

  • the first argument must be a function (pointer) or a functor (i.e., a struct/class with an overloaded function call operator), and,
  • the second parameter must be a tuple whose sequence of types and values are the same as the first argument's function/functor arguments.

so when apply is called all of the std::tuple's elements are efficiently passed to the function/functor. For example this code's cout lines:

struct add1
{
  int operator ()(int x, int y) const
  {
    return x + y;
  }
};

int add2(int x, int y)
{
  return x + y;
}

// ...

std::cout << apply(add1(), std::make_tuple(3,4)) << std::endl;
std::cout << apply(add2, std::make_tuple(8,2)) << std::endl;

are equivalent to:

std::cout << add1()(3,4) << std::endl;
std::cout << add2(8,2) << std::endl;

and would output:

$ ./a.out
7
10
$

Before considering how to manipulate std::tuple's one should consider how to write the code that does the same without std::tuples:

std::cout << apply(add1(), 3, 4) << std::endl;
std::cout << apply(add2, 8, 2) << std::endl;

Ideally such should be a simpler version of a solution that works with std::tuple and, in this instance, this is what occurs.

The other requirement of the solution is maximum possible efficiency. To achieve this will require careful attention to ensure that no unnecessary copying or moving of any data is done. Ideally, the compiler should be able to use the get function to retrieve all tuple elements and place them in the function arguments all at compile time.

NOTE: This means that std::bind and std::function should not be used in the solution.

Rvalue References And Perfect Forwarding

What Is An Rvalue?

To meet the above requirements two new C++11 features will need to be exploited: r-value references and perfect forwarding of function arguments. Prior to C++11 only these special operations, with or without using templates, could be defined with user-defined types:

  • the ability to create a default-initialized object by defining a constructor that takes no arguments,
  • the ability to create a copy of another object of the same type by defining a constructor that has a single argument that receives the other object,
  • the ability to define how one object is assigned to another (of the same type) by defining a copy assignment operator, and,
  • the ability to define how an object is destroyed by writing a destructor.

In addition to the above C++11 adds the following:

  • the ability to move the contents of an object that is an rvalue into a newly created object because the former object will no longer be used (e.g., it will be destroyed), and,
  • the ability to move the contents of an object that is an rvalue into an existing object via assignment because the former object will no longer be used (e.g., it will be destroyed).

The first ability is called the move constructor and the second is called move assignment. Both operations take rvalue references for their arguments. Know that rvalue references are references and cannot be const. Since they are references, the following code is erroneous:

struct Example2 { };

Example2&& test()
{
  Example2 e;
  // ...
  return e;
}

It is illegal for the same reasons that this code is erroneous:

struct Example2 { };

Example2& test()
{
  Example2 e;
  // ...
  return e;
}

and that this code is erroneous:

struct Example3 { };

Example3* test()
{
  Example3 e;
  // ...
  return &e;
}

as all three return the address (implicitly for Example1 and Example2 and explicitly for Example3) of a local variable that will be destroyed once the test() returns.

What exactly is an rvalue? The name rvalue means right-hand-side-value and stands in constrast to an lvalue (which means left-hand-side-value). The usage of right and left refers to where it is allowed relative to an assignment operator: an rvalue can only appear on the right-hand side whereas an lvalue can appear on the left or right hand sides.

In terms of C++11 coding what is an rvalue and how does it differ from an lvalue? Loosely, an rvalue is:

  • a compiler-created temporary variable, or
  • any unnamed variable.

Hence an rvalue is a value that one cannot explicitly obtain its address via the address-of operator as there is no name to refer to it. (Because of nuances, the C++11 standard redefines lvalues and rvalues more precisely than this using a taxonomy. This level of detail will not be discussed here.)

Once a variable somehow acquires a name then it can be explicitly referenced and manipulated as an lvalue and so C++ no longer treats as an rvalue (with some caveats for explicitly named rvalue-reference variables) unless std::move(variable_name) is invoked on it.

Warning: Once std::move(variable_name) is invoked on a variable and its owned contents has been moved, that variable should only used as as a "zeroed-out" or "empty" object (whatever that means for that type). Effectively, when you use std::move() know that the ownership of that variable's state may be transferred (i.e., moved) to another variable since std::move() makes that variable become an rvalue.

One of the requirements of this article is to be able to have maximum efficiency: one does not want to copy the contents of a tuple into function arguments. In general, for proper program semantics with arbitrary functions copies cannot be made. Consider:

double test(int& i, double const& y)
{
  double retval = y * 3.141592;
  i = retval;
  return retval;
}

where i is passed by reference. If a copy of a variable was passed as an argument then the original variable wouldn't be modified. Clearly, one needs a way to perfectly forward the types in a tuple into each function argument in a compatible way that preserves the semantics of the program code. Prior to C++11 this was not possible. In C++11, it is provided rvalue references with their associated semantics and templates are used.

Lvalue and Rvalue References

Rvalue references are defined similarly to C++98 references. C++98 references are lvalues that are declared using a single ampersand and must be assigned to an lvalue reference at that point (with one exception). The ampersand is used because it is the address-of operator and a C++98 reference implicitly takes the address-of of the expression it is initialized with. For example, this C/C++ code:

int i = 4;
int * const ref = &i;
// ...
*ref = 46; // i will now be set to 46

is semantically equivalent (in terms of its underlying implementation) to this C++98 code:

int i = 4;
int& ref = i;
// ...
ref = 46; // i will now be set to 46

i.e., any C++98 reference is equivalent (in terms of its underlying implementation) to a constant pointer to whatever its type is. This implies that once a reference's (address) value is set, it cannot be changed –as it is a constant pointer internally.

C++98 references allowed the ugliness of C's pointers to disappear as it allowed one to implicitly pass variables by reference:

void set(int &i, int value)
{
  i = value;
}
// ...
int myvar;
set(myvar, 9);

instead of writing this C equivalent:

void set(int *i, int value)
{
  *i = value;
}
// ...
int myvar;
set(&myvar, 9);

which also required a lot more attention to always correctly using pointers, etc. to avoid misuing them –the likely result of which was a program crash. Thus C++98 references allowed the programmer to:

  • avoid having to explicitly use the address-of operator for any function call arguments that were to be implicitly passed by reference, and,
  • avoid having to explicitly dereference pointer variables in functions being passed as references –as they are typically only manipulated as values within those functions.

Thus in C++98 it is no longer a burden to efficiently and easily use pass-by-reference. No doubt this encourages programmers to use pass-by-reference when it is more efficient and appropriate to do so.

C++98 references are lvalues and can only refer to rvalues with one exception: constant references referring to compiler-created temporary variables (e.g., passed as function arguments). Consider converting the value function parameter in the above example to be a constant lvalue reference:

void set(int &i, int const&value)
{
  i = value;
}
// ...
int myvar;
set(myvar, 9); // NOTE: 9 must become a temporary --an rvalue!!

Perhaps to one's horror the above code correctly compiles and runs! For the compiler to call set(myvar, 9) it needs to create a temporary variable of type int so its address can be taken and passed to the function! The issue here is the temporary variable is an rvalue –not an lvalue. The C++98 ISO committee reviewed this specific issue carefully and resolved that since the reference refers to constant data that such should be legal. Why? In part: it is constant, i.e., its value won't be changing, and, casting away const is bad and can easily have implementation-defined results. Thus, this "violation" is a non-issue. The result of permitting such:

  • it allows more efficient variable passing to functions,
  • it enables compiler to better optimize code, and,
  • it completely removes the burden of writing overloaded versions of functions just so that literals and other rvalues can be passed to arguments taking constant references.

So all was well and good: plenty of cleaner and efficient C++ code was subsequently written and used successfully. However, being a language of absolute efficiency and robustness with high- and low-level styles of coding, there were still problems that caused programmers to twist their code to be efficient. Consider this example code that critics of object-oriented programming (OOP) often cite as an argument against OOP:

class Matrix
{
  public:
    // NOTE: All code below assumes that nrows_ and ncols_
    //       is hard-coded to be 1000 by 1000 for simplicity
    //       here since how to properly write a general matrix
    //       class is not the purpose of this blog post! :-)

    Matrix()
      : nrows_(1000), ncols_(1000), data_(nrows_*ncols_) { }

    // ...

    // Permit access to the sizes...
    std::size_t nrows() const;
    std::size_t ncols() const;

    // Permit access using the function call operator...
    double& operator ()(std::size_t row, std::size_t col);
    double const& operator ()(std::size_t row, std::size_t col) const;

  private:
    std::size_t const nrows_;   // Hard-coded to be 1000
    std::size_t const ncols_;   // Hard-coded to be 1000
    std::vector<double> data_;
};

// Define addition...
Matrix& operator +=(Matrix& a, Matrix const& b)
{
  for (std::size_t i=0; i != b.nrows(); ++i)
    for (std::size_t j=0; j != b.ncols(); ++j)
      a(i,j) += b(i,j);   // No copies: just += each element.

  return a;   // Return a by reference as it was passed!
}

Matrix operator +(Matrix const& a, Matrix const& b)
{
  Matrix retval(a);   // Copy a into retval
  retval += b;        // Add b to retval
  return retval;      // Return retval by value
}

// ...
Matrix a, b, c;
// ...
Matrix d = a + b + c; // Can be very inefficient!

i.e., given three matrices add them (non-destructively) together and place their result in a new matrix. The addition code be written written using two for loops in a procedural manner (assuming the memory was previously allocated, all four matrices are the same size, and we are not trying to optimize anything else):

std::size_t const nrows = a.nrows();
std::size_t ncols = a.ncols();

for (std::size_t i=0; i != nrows; ++i)
  for (std::size_t j=0; j != ncols; ++j)
    d(i,j) = a(i,j) + b(i,j) + c(i,j);

Unfortunately, in OOP languages (except in C++ when using a library that optimizes such using the expression templates technique) this is not the code that is generated by the compiler. Instead the compiler generates code based on the operations being applied to the objects reflect in the expression:

Matrix d = a + b + c;

which does the following (assuming the expression templates code is not written induce optimization):

  1. First a + b is evaluated. This requires calling operator +(a,b) and returns a compiler-created temporary Matrix value. Let's call this value t1.
  2. Now t1 + c is evaluated. This also requires calling operator +(t1,b) again and it too returns a compiler-created temporary Matrix value. Let's call this value t2.
  3. Having evaluated the right-hand size of the initialization (as this is not the assignment operator!) the constructor will invoke the copy constructor for d passing in t2 so its value can be copied.
  4. Finally the compiler destroys all temporary values: t1 and t2.

Clearly the generated code, although it will produce the correct answer, allocates more RAM and will take more time to execute (as there will be more cache misses than hits especially with large matrices). Clearly this is unacceptable. On the other hand, one does want the code to execute according to the functions defined for the classes being used. What should one do? The expression templates technique is a form of C++ template metaprogramming that is considered to be advanced, is there an easier way? The short answer is with C++11, "Yes!" Before doing that, with C++98, many programmers would write:

Matrix d = a;
d += b;
d += c;

instead of Matrix d = a * b * c;. The former is more efficient, the latter is easier to read and understand –not to mention commonly written in actual code. Ideally a programming language should not force the programmer to make such trade-offs.

Important: Using rvalue references does not eliminate the need for the expression templates technique –there are still many things that won't be optimized. The expression templates technique allows one to write code that recognizes how types are used and transform them (to presumably equivalent, more efficient forms). With matrices, such can be used to determine at compile-time an optimal way to perform matrix multiplication (in expressions with varying matrix sizes) such as:

Matrix d = a * b * c;

Moving Stuff Around

In the above example the temporary variables, t1 and t2, were rvalues and the compiler should have somehow reused t1 instead of also making t2. But, the compiler is not human and does not know the programmer's intention or anything about matrices! This means the compiler cannot simply reuse t1 and call operator +=(t1,c) instead: it was told to only call operator +!!.

So how can rvalue references help with expressions like Matrix d = a + b + c;? Through the definition of move constructors and move assignment operators, they allow the programmer to implicitly tell the compiler how to efficient move the contents/state of one rvalue variable to another. This is particularly handy since the compiler actively does create temporary variables when evaluating expressions that the programmer otherwise has no control over! Re-examining Matrix d = a + b + c;:

  1. First the compiler must make a temporary to perform a + b. Let's again call this temporary t1. So there are no changes here.
  2. Next the compiler will perform t1 + c this too will return a copy. Hey, there are no improvements!! Let's stop.

Yes, there are no improvements. To get them, the Matrix class (and the types it internally uses) need to define move constructors and assignment operators. Nicely a C++11 compiler will define default ones –which are of benefit if the underlying types define such (e.g., Matrix' use of std::vector). So, I lied: there were improvements, but, unless one understands the move constructor and move assignment operator, it is hard-to-comprehend what benefits are obtained through the use of rvalues.

Consider this simple array-like class:

#include <algorithm>

class MyArray
{
  public:
    // Default constructor...
    MyArray()
      : size_(s), data_(nullptr)
    {
    }

    // Allow one to specify a size...
    MyArray(std::size_t s)
      : size_(s), data_(new int[s])
    {
    }

    // Define copy constructor...
    // Notice that it accepts an lvalue reference to a constant
    //   of the same type as this class (i.e., MyArray).
    MyArray(MyArray const& a)
      : size_(a.size_), data_(new int[a.size_])
    {
      std::copy(a.data_, a.data_+a.size_, data_);
    }

    // Define move constructor...
    // Notice that it accepts an rvalue reference (&&) to
    //   an instance of the same type as this class (i.e., MyArray).
    // Notice that the data_ in a is simply assigned to this object
    //   and then a.data_ is set to be null. Why? The data is being
    //   MOVED from a to this. a will need to still be properly
    //   destroyed so a.data_ needs to be set to null.
    MyArray(MyArray&& a)
      : size_(a.size_), data_(a.data_)
    {
      a.size = 0;           // Reset size
      a.data_ = nullptr;    // Reset pointer
    }

    // Define copy assignment operator...
    // Notice that exception-safe code is presented here.
    // Notice that making a copy is necessary.
    MyArray& operator =(MyArray const& a)
    {
      MyArray temp(a);               // Make copy of a
      std::swap(size_, temp.size_);  // Swap sizes
      std::swap(data_, temp.data_);  // Swap data pointers

      // Destroy temp as *this is now a copy of a! :-)
      return *this;
    }

    // Define move assignment operator...
    // One could write this without temp to be slightly
    // more efficient. This code reuses the above definitions
    // and keeps the actual movement code in the move
    // constructor only.
    MyArray&& operator =(MyArray&& a)
    {
      MyArray temp(std::move(a));    // Move a into temp
      std::swap(size_, temp.size_);  // Swap sizes
      std::swap(data_, temp.data_);  // Swap data pointers
      return *this;
    }

    // ...

  private:
    std::size_t size_;
    int* data_;
};

Perhaps it is much easier to see that moving is actually moving the data from one object to another! It can only be done with rvalues as rvalues are not to be the receivers of assignment operations: they are intended to only be used as values even though we can also acquire their addresses as they are located somewhere in the computer's RAM.

Notice that std::move(a) was used. Isn't a declared to be an rvalue reference? Yes it is, but, once it acquires a name it has to be treated as an lvalue. The std::move function is C++11's way for a programmer to explicitly make a lvalue an rvalue. Once this is done the novice programmer should not use the variable whose content was moved –as its state has been reset so that its destructor can be executed without error and a minimum of overhead.

Note: The purpose of a named rvalue reference declaration is to ensure that it can only be initialized with an rvalue. It does not mean it is itself an rvalue –although it refers to one!

Let's re-examine:

Matrix d = a + b + c;

which clearly computes a rvalued result which is used to initialize d. Provided the std::vector class has a move constructor (it does) and one is in Matrix then the compiler need not use the copy constructor: it can use the move constructor instead. This implies the data in t2 will be moved/transferred to d directly –not copied into d. This implies one fewer Matrix has to be dynamically allocated in the expression. Awesome!

Concerning the expression:

a + b + c;

all we've told the compiler is how to handle this function:

Matrix operator +(Matrix const& a, Matrix const& b);

which states given two Matrix instances return a new one. Returning any value by value from a function implies that its value can be treated as an rvalue. Thus, instead of making a copy of the Matrix returned, copying the contents over, and then destroying the local variable, the compiler is free to simply move it from the local variable to another one that is available –since the local variable will be destroyed.

One can do even better by telling the compiler how to add values that are rvalues. Only looking at operator + and making simple changes:

Matrix operator +(Matrix const& a, Matrix const& b); // as before

// [2]
inline Matrix operator +(Matrix&& a, Matrix const& b)
{
  Matrix retval(std::move(a));   // Move a into retval
  retval += b;                   // Add b to retval
  return retval;                 // Returns as an rvalue
  // Some may prefer: return std::move(a += b);

  // In general don't use std::move() in a return statement
  // that returns a variable. Simply write: return retval;
  // This will allow compilers to choose between RVO or moving.
}

// [3]
inline Matrix operator +(Matrix const& a, Matrix&& b)
{
  // This works since matrix addition is associative...
  return operator +(std::move(b), a); // call [2] above
}

// [4]
inline Matrix operator +(Matrix&& a, Matrix&& b)
{
  return operator +(std::move(a), b); // call [2] above
}

With these defined when rvalues are being added the cost of creating a new temporary matrix state (i.e., the RAM to hold the 1000 by 1000 matrix!) is now completely avoided. The compiler does still create the temporary variables and needs to (as they are allocated on the stack), but, the overheads have been sliced to a bare minimum.

Revisiting what happens when Matrix d = a + b + c; is run with the above additions and changes (assuming std::vector behaves like MyArray: i.e., moves null out the internal pointers):

  1. First a + b is calculated and a full temporary matrix is returned as temporary t1. No change here.
  2. Now t1 + c is performed. Since t1 is an rvalue as it is a compiler-created temporary, it invokes:
    Matrix operator +(Matrix&& a, Matrix const& b)

    which avoids creating a new temporary with its own separate storage. Instead it creates a new temporary, t2, by move constructing it from the stored result in retval which was moved from t1 (given the above code). The savings here are huge: no unnecessary additional RAM was allocated on the heap!

  3. Finally, t2 is moved into d's constructor without allocating any additional RAM on the heap!

So, if you're counting temporaries one might argue that there is not much improvement. But if one is counting the amount of RAM allocated and copying there are major improvements. Essentially, only one Matrix is dynamically allocated in RAM (to hold the initial temporary) and that winds up ultimately being moved into d.

Note: C++ can elide constructors and perform other operations to further optimize the above code. The programmer can do additional things too. If one will not be using the a, b, and c variables again, you can help the compiler out by having all variables become rvalues (which would move everything –no copies):

Matrix d = std::move(a) + std::move(b) + std::move(c);

Perfect Forwarding

If the type of a function parameter is an rvalue reference to a template type parameter for a function template then the type parameter is deduced to be: an lvalue reference if an lvalue is passed, or, a value type (which is an rvalue and, thus, can be treated as an rvalue reference by the compiler) assuming the overall call is valid.

Important: Recall that only with function templates, the compiler will deduce the (template) types of its arguments. This specific type of deduction mentioned here only works for function templates when an rvalue reference to a type that is a template parameter is used. If the argument is an rvalue to a non-template type this specific deduction will not occur.

Consider:

struct Silly { };

template <typename T>
void func(T&& t);

int main()
{
  Silly s;
  func(s);             //    Lvalue. Argument type: Silly&
  func(Silly());       // Temporary. Argument type: Silly&&
  func(std::move(s));  //  Rvalue forced. Arg type: Silly&&
}

By using this deductive property, C++ can preserve the argument types while forwarding them to another function. To do this, one needs to use std::forward<arg_type>(arg_variable_name) which takes care of lvalue/rvalue issues when doing such. All that is needed for one to exploit such is to simply use it.

Enough sidebar issues! Let's figure out how to efficiently pass a std::tuple to regular functions and then functors!


Writing apply() To Not Use std::tuple

Using variadic templates solving this problem is straight-forward:

#include <tuple>
#include <functional>
  // NOTE: g++ v4.5 needs <functional> for std::result_of
  //       g++ v4.7 does not

template <typename Op, typename... Args>
auto apply(Op&& op, Args&&... args)
  -> typename std::result_of<Op(Args...)>::type
{
  return op(args...);
}

It is straight-forward for this reasson: every argument from the second passed to apply is in the Args parameter pack in the order and form it needs to be. This allowed the arguments to be expanded in the return statement as-is. The only remaining item to address here is to use std::forward on those arguments:

#include <tuple>
#include <functional>
  // NOTE: g++ v4.5 needs <functional> for std::result_of
  //       g++ v4.7 does not

template <typename Op, typename... Args>
auto apply(Op&& op, Args&&... args)
  -> typename std::result_of<Op(Args...)>::type
{
  return op( std::forward<Args>(args)... );
}

to ensure perfect forwarding.


Writing apply_tuple() To Use std::tuple

A Problem

Unfortunately, the above solution cannot be trivially applied to std::tuple. When template parameter packs are expanded they don't have the ability to perform the arithmetic (e.g., N+1) needed to unroll the std::tuple elements for each function argument. The latter is needed since the std::get template function requires each index as a compile-time constant template argument, if parameter pack expansion is to be used then a way must be found to increment the value passed to std::get as each tuple element is extracted.

Although parameter pack expansions cannot do arithmetic, one can pre-compute the indices it will need. Why is this useful? If parameter pack expansions are done on expressions involving parameter pack types and variables, then they will all be expanded together as a sequence. For example, given the tuple:

std::tuple{1,2.2,'a'}

whose type is std::tuple<int,double,char> and this:

template <std::size_t...>
struct indices;
// ...
indices<0,1,2>

then it will be possible to expand both (i.e., indices and std::tuple types's template arguments, i.e., the type of each element in the tuple) at the same time using parameter pack expansion. Hopefully by solving this problem a solution similar to the non-tuple version will be possible!

Generating A List Of Indices

By defining a compile-time only container to hold the list of indices:

template <std::size_t...>
struct indices;

struct templates make_indices, make_indices, and make_indices_impl can be defined. What should they contain? For each type, a corresponding index needs to be generated that is zero-based. This implies that when processing each tuple element type from left to right the next highest index needs to be appended to the indices instance's template argument std::size_t constants. This will need to be defined recursively using make_indices as an easy-to-use type to start the indices' generation and hold the result in a typedef called type:

#include <cstddef> // for std::size_t

// Define holder for indices...
template <std::size_t...>
struct indices;

// Define adding the Nth index...
// Notice one argument, Type, is discarded each recursive call.
// Notice N is added to the end of Indices... .
// Notice N is incremented with the recursion.
template <std::size_t N, typename Indices, typename... Types>
struct make_indices_impl;

template <
  std::size_t N,
  std::size_t... Indices,
  typename Type,
  typename... Types
>
struct make_indices_impl<N, indices<Indices...>, Type, Types...>
{
  typedef
    typename make_indices_impl<
      N+1,
      indices<Indices...,N>,
      Types...
    >::type
    type
  ;
};

// Define adding the last index...
// Notice no Type or Types... are left.
// Notice the full solution is emitted into the container.
template <std::size_t N, std::size_t... Indices>
struct make_indices_impl<N, indices<Indices...>>
{
  typedef indices<Indices...> type;
};

// Compute the indices starting from zero...
// Notice indices container is empty.
// Notice Types... will be all of the tuple element types.
// Notice this refers to the full solution (i.e., via ::type).
template <std::size_t N, typename... Types>
struct make_indices
{
  typedef
    typename make_indices_impl<0, indices<>, Types...>::type
    type
  ;
};

Define apply_tuple To Work With std::tuple

Finally apply can be written but we need a helper function. Start by writing apply to do everything it can do:

template <
  typename Op,
  typename... OpArgs,
  typename Indices = typename make_indices<0, OpArgs...>::type,
  template <typename...> class T = std::tuple
>
auto apply_tuple(Op&& op, T<OpArgs...>&& t)
  -> typename std::result_of<Op(OpArgs...)>::type
{
  return apply_tuple_impl<Indices>::apply_tuple(
    std::forward<Op>(op),
    std::forward<T<OpArgs...>>(t)
  );
}

Unfortunately, Indices is not a parameter pack and the indices inside of Indices and OpArgs must both be parameter packs so that they can be expanded together:

[snip]
{
  return op( std::forward<OpArgs>(std::get<Indices>(t))... );
}

It is straight-forward to use partial template specialization with a helper class to lift the std::size_t… parameter pack out of its container and this is why apply_tuple_impl has been used above. Notice that all function arguments are simply forwarded. The "prototype" case for apply_tuple_impl must be mentioned before using the partial template specialization technique:

template <typename Indices>
struct apply_tuple_impl;

Now Indices can be lifted out of the template template parameter I<Indices...> which will be inferred to be indices –the container type:

template <
  template <std::size_t...> class I,
  std::size_t... Indices
>
struct apply_tuple_impl<I<Indices...>>
{
  template <
    typename Op,
    typename... OpArgs,
    template <typename...> class T = std::tuple
  >
  static auto apply_tuple(Op&& op, T<OpArgs...>&& t)
    -> typename std::result_of<Op(OpArgs...)>::type
  {
    return op( std::forward<OpArgs>(std::get<Indices>(t))... );
  }
};

which is the desired solution.


Testing The Solution

The following code will nicely (partially) test the solution:

//=================================================================

#include <cstddef>
#include <tuple>
#include <functional> // Only if needed

// Definition of indices
// Definition of make_indices_impl
// Definition of make_indices
// Definition of apply
// Definition of apply_tuple_impl
// Definition of apply_tuple

//=================================================================

#include <iostream>

struct add { int operator ()(int x, int y) { return x + y; } };
int add2(int x, int y) { return x + y; }

struct A1 { int operator ()() { return -1; } };
int A2() { return -2; }

struct B1 { int operator ()(int x) const { return x; } };
int B2(int x) { return x; }

int main()
{
  using namespace std;

  // Test apply() with no arguments...
  cout
    << "A1()() = "
    << apply(A1())
    << "."
    << endl
  ;
  cout
    << "A2() = "
    << apply(A2)
    << "."
    << endl
  ;

  // Test apply() with one argument...
  cout
    << "B1()(-3) = "
    << apply(B1(),-3)
    << "."
    << endl
  ;
  cout
    << "B2(-4) = "
    << apply(B2,-4)
    << "."
    << endl
  ;

/*
  // Define C1 and C2 elsewhere for two arguments and
  // test apply() with such...
  cout
    << "C1()(-3,-10) = "
    << apply(C1(),-3,-10)
    << "."
    << endl
  ;
  cout
    << "C2(-4,-10) = "
    << apply(C2,-4,-10)
    << "."
    << endl
  ;

  // etc.
*/



  // Test apply_tuple() with no arguments...
  cout
    << "A1()() = "
    << apply_tuple(A1(),std::tuple<>())
    << "."
    << endl
  ;
  cout
    << "A2() = "
    << apply_tuple(A2,std::tuple<>())
    << "."
    << endl
  ;

  // Test apply_tuple() with one argument...
  cout
    << "B1()(-3) = "
    << apply_tuple(B1(),make_tuple(-3))
    << "."
    << endl
  ;
  cout << "B2(-3) = "
    << apply_tuple(B2,make_tuple(-4))
    << "."
    << endl
  ;

/*
  // Define C1 and C2 elsewhere for two arguments and
  // test apply_tuple() with such...
  cout
    << "C1()(-3,-10) = "
    << apply_tuple(C1(),make_tuple(-3,-10))
    << "."
    << endl
  ;
  cout
    << "C2(-4,-10) = "
    << apply_tuple(C2,make_tuple(-4,-10))
    << "."
    << endl
  ;

  // etc.
*/


  return 0;
}

It is wise to use different names for apply and apply_tuple so that tuples themselves can be directly applied to functions/functors.


Closing Comments

As far as I can tell the C++11 standard did not include a make_indices utility. Fortunately, it is not hard to define such to provide a straight-forward way to extract tuple values while performing a parameter pack expansion. That said, the solution is not obvious given the norms of coding and therefore can be hard to figure out. For those of you thinking, "There just has to be a way," I wrote this post! 🙂

Should you notice any errors or issues, kindly let me know.

A Template Argument Extender

Overview

When writing C++ code, one may come across the need to have a set of N arguments of all identical types for a variadic template type. In other words, it would be nicer to write:

typedef type_extend<0, std::tuple, std::string>::type t0;
typedef type_extend<1, std::tuple, std::string>::type t1;
typedef type_extend<2, std::tuple, std::string>::type t2;
typedef type_extend<3, std::tuple, std::string>::type t3;
typedef type_extend<4, std::tuple, std::string>::type t4;

instead of:

typedef std::tuple<> t0;
typedef std::tuple<std::string> t1;
typedef std::tuple<std::string, std::string> t2;
typedef std::tuple<std::string, std::string, std::string> t3;
typedef std::tuple<std::string, std::string, std::string, std::string> t4;

In other words, the goal is to have a positive integer (i.e., the first parameter to type_extend) that determines how many times the last argument passed to type_extend will be duplicated in an argument list that will be applied as the template arguments to the type denoted by the second parameter passed to type_extend.

This article explores how to do this using variadic templates which will be part of the upcoming C++ standard.


Template Template Parameters

In the following code, notice the use of std::tuple appears without any template arguments even though std::tuple requires a list of types as its template arguments:

typedef type_extend<2, std::tuple, std::string>::type t2;

C++ allows the programmer to specify that template arguments are placeholders for class templates. Such placeholders are called template template parameters and are declared the same as class templates except that struct and union cannot be used:

template <template<typename X> class T> // OK
void some_func(T<float>*);

template <template<typename X> typename T> // ERROR
void some_func(T<double>*);

template <template<typename X> struct T> // ERROR
void some_func(T<short>*);

template <template<typename X> union T> // ERROR
void some_func(T<long>*);

Notice that the above code allows the user to specify a class that has only one template argument without specifying the template argument:

template <typename N>
struct Number
{
  N num;
};

template <template <typename> class T>
struct SomeClass
{
  // This class sets the argument to T as float...
  T<float> something;
};

int main()
{
  // Notice Number has no arguments...
  typedef SomeClass<Number> N;

  N n;
  n.something.num = 3.14F; // NOTE: num is a float
}

The usefulness of template template parameters is to allow one to omit the template arguments when passing a class template type as a template argument. The downside in using template template parameters are that (a) any use must fully specify all template parameters to the template template placeholder name and (b) the number of template arguments must exactly match the type being passed in. The restriction that (b) places is important to note: the STL containers all have default arguments and therefore they all have at least two arguments –not one!


Implementing type_extend

In order to build up a variable argument list of types, two class templates will be defined: type_extend and type_extend_impl. The type_extend class templates have the three template arguments previously discussed. The type_extend_impl class templates will need to build up the list of template arguments up to the number specified. The purpose of type_extend is to handle the N-th case and the 0-th (i.e., base) case and its definition is straight-forward:

template <unsigned N, template <typename...> class T, typename Arg>
struct type_extend
{
  typedef typename type_extend_impl<N-1,T,Arg>::type type;
};

template <template <typename...> class T, typename Arg>
struct type_extend<0,T,Arg>
{
  typedef T<> type;
};

In the base case (i.e., in the template specialization type_extend<0,T,Arg>), the computed type is T without any template arguments. In all other cases, the computed type is determined by the definition of type in type_extend_impl.


Implementing type_extend_impl

Since the only way to go from some value of an integer N as a template argument to 0 is to use recursion, the first step is to determine what is the base case's type result that needs to be computed. Since the type_extend class template handled the case where there were no arguments, then the base case of type_extend_impl will need to be T having only one template argument. As for all other cases, the number of arguments needs to be increased by one per recursive iteration. To following code accomplishes this:

template <
  unsigned N,
  template <typename...> class T,
  typename Arg,
  typename... Args
>
struct type_extend_impl
{
  // This class was invoked with Arg, Args... and to increase the
  // Arg one more time we need to add it again which is why Arg
  // appears twice below...
  typedef
    typename type_extend_impl<N-1,T,Arg,Arg,Args...>::type
    type
  ;
};

template <
  template <typename...> class T,
  typename Arg,
  typename... Args
>
struct type_extend_impl<0,T,Arg,Args...>
{
  // Base case: Stop the recursion and expose Arg.
  typedef T<Arg,Args...> type;
};

Certainly do take time to digest the above, but, the above code works since each recursive call to type_extend_impl except the last adds one Arg to the template argument list and reduces N by one.


Testing type_extend

All that is left is to test that this works. This test will assume the use of one of the GNU C++ version 4.5 or higher compilers. Suitable test code would be:

#include <tuple>

// Place definition of type_extend_impl here.
// Place definition of type_extend here.

#include <iostream>
#include <typeinfo>
#include <string>
#include <cstdlib>

#include <cxxabi.h>   // GCC-specific

template <typename T>
std::string demangled_name_of_type()
{
  // NOTE: See G++ Manual Chapter 27.
  //       http://gcc.gnu.org/onlinedocs/libstdc++/
  int status;
  char *name = abi::__cxa_demangle(typeid(T).name(), 0, 0, &status);
  std::string realname(name);
  free(name);
  return realname;
}

int main()
{
  typedef type_extend<0,std::tuple,unsigned>::type test0;
  typedef type_extend<1,std::tuple,unsigned>::type test1;
  typedef type_extend<2,std::tuple,unsigned>::type test2;
  typedef type_extend<3,std::tuple,unsigned>::type test3;
  typedef type_extend<4,std::tuple,unsigned>::type test4;

  std::cout
    << "0: " << demangled_name_of_type<test0>() << std::endl
    << "1: " << demangled_name_of_type<test1>() << std::endl
    << "2: " << demangled_name_of_type<test2>() << std::endl
    << "3: " << demangled_name_of_type<test3>() << std::endl
    << "4: " << demangled_name_of_type<test4>() << std::endl
  ;
}

Save it in a file (e.g., called program.cxx and compile it:

g++ -std=c++0x program.cxx

When it is run, the following output will appear:

0: std::tuple<>
1: std::tuple<unsigned int>
2: std::tuple<unsigned int, unsigned int>
3: std::tuple<unsigned int, unsigned int, unsigned int>
4: std::tuple<unsigned int, unsigned int, unsigned int, unsigned int>

Success!


Template Aliases

Another feature, not yet implemented by GCC, in the next standard is the ability to alias template types:

template <unsigned N, typename Arg>
using n_tuple = typename type_extend<N,std::tuple,Arg>::type;

Using template aliases would allow one to easily eliminate the need to write ::type and to partially substitute template arguments. For example, the template alias n_tuple could subsequently be used as follows:

n_tuple<3,std::string>

which would be equivalent to the t3 typedef at the very top of this file in the Overview section.

Using CMake to Build C++ Boost Python Libraries

Overview

The Boost Project has a Python library that makes it easy to seamlessly integrate C++ and Python code. Unfortunately, the documentation recommends using bjam. If you build Boost from source, then this is largely a non-issue. If, however, you are using Boost from preinstalled binaries or from a Gentoo ebuild, then you don't want to use bjam at all!

CMake is an excellent, cross-platform build system tool (e.g., like make but much better). CMake has the ability to determine platform-specific options required to build code. This saves one from having to specify each and every compiler, linker, etc. option including specific system paths, etc. In short, CMake makes is a lot easier to compile and run programs from source.

This post details how to build a minimal C++ library with its code compiled as a module capable of being loaded by Python (version 2.6 or higher) using CMake (version 2.8) and Boost (version 1.45.0). The text below assumes:

  • you are using a BASH shell on a Linux / Unix system,
  • you are using GCC's g++,
  • you already have CMake installed,
  • you already have Boost installed, and,
  • you already have Python installed.

Steps

Step 1: Create Your Development Directory & Set Your PATH

You'll want to create a new directory (e.g., "src") with a "build" subdirectory in it:

$ mkdir -p src/build

and change your current working directory to the new directory:

$ cd src

In order to run the Python code later, also ensure that "." appears in your PATH environment variable:

$ PATH=.:$PATH
$ export PATH

This will permit Python to load the module that is built. You may want to add these two lines to your ~/.bashrc script to avoid having to always set this in any new shells.

Step 2: Create CMakeLists.txt

Now, you can write the file that CMake will use to build the C++ code into a Python module. This module will be compiled as a Shared Object Library file (i.e., a .so file).

NOTE: In Windows, shared objects are Dynamically Linked Libraries (DLLs). Please note that under Linux / Unix, libraries are typically named with a "lib" prefix and a ".so" suffix. Under Windows, libraries simply have a ".dll" suffix. The instructions below are for Linux / Unix and so the "yay" library that is built will be called "libyay.so", whereas, under Windows it would be called "yay.dll". Thus, if you are using Windows, you may have to make some minor adjustments to the files below.

Your CMakeLists.txt file should contain:

CMAKE_MINIMUM_REQUIRED(VERSION 2.8)
IF(NOT CMAKE_BUILD_TYPE)
  SET(CMAKE_BUILD_TYPE "DEBUG")
  #SET(CMAKE_BUILD_TYPE "RELEASE")
  #SET(CMAKE_BUILD_TYPE "RELWITHDEBINFO")
  #SET(CMAKE_BUILD_TYPE "MINSIZEREL")
ENDIF()

FIND_PACKAGE(Boost 1.45.0)
IF(Boost_FOUND)
  INCLUDE_DIRECTORIES("${Boost_INCLUDE_DIRS}" "/usr/include/python2.6")
  SET(Boost_USE_STATIC_LIBS OFF)
  SET(Boost_USE_MULTITHREADED ON)
  SET(Boost_USE_STATIC_RUNTIME OFF)
  FIND_PACKAGE(Boost 1.45.0 COMPONENTS python)

  ADD_LIBRARY(yay SHARED yay.cxx)
  TARGET_LINK_LIBRARIES(yay ${Boost_LIBRARIES})
ELSEIF(NOT Boost_FOUND)
  MESSAGE(FATAL_ERROR "Unable to find correct Boost version. Did you set BOOST_ROOT?")
ENDIF()

IF(CMAKE_COMPILER_IS_GNUCXX)
  ADD_DEFINITIONS("-Wall")
ELSE()
  MESSAGE(FATAL_ERROR "CMakeLists.txt has not been tested/written for your compiler.")
ENDIF()

Notice that the output will be a shared library file called "yay". It is important to specify both the Boost include path as well as the Python include path with INCLUDE_DIRECTORIES in order for the code to compile properly.

Step 3: Create yay.cxx

Now create yay.cxx:

#include <boost/python.hpp>

char const* yay()
{
  return "Yay!";
}

BOOST_PYTHON_MODULE(libyay)
{
  using namespace boost::python;
  def("yay", yay);
}

It is very important that "libyay" in BOOST_PYTHON_MODULE(libyay) matches the name of the library you're generating in CMakeLists.txt (without the extension).

NOTE: On Linux / Unix systems you will need to prefix the name with "lib" as CMake defaults to prepending the file with "lib" as per convention on Linux / Unix systems.

Essentially, the BOOST_PYTHON_MODULE clause exports the "yay" function as the Python symbol name "yay". This will allow you to call the "yay" function from Python later.

Step 4: Build The Python Module

The best way to invoke CMake is to invoke it in a special build directory. This is why Step 1 mentioned to create a build directory as well. Before you can make the library, you need to have cmake generate the neccessary files to perform the build:

$ cd build
$ cmake ..

(You only need to run "cmake .." after changes are made to CMakeLists.txt.) Now, you can build the program by running make:

$ make

If all went well, you will see a libyay.so file in the current directory! If you need to clean up files, then run "make clean". If the build directory is a complete mess and you want to start over, then remove all of its contents, run "cmake ..", and then "make".

Step 5: Create A Python Script To Call yay()

Now you can test your efforts by writing the following python script:

#!/usr/bin/python
import libyay
print libyay.yay()

When you run it, it should output "Yay" (assuming the library file is somewhere in PATH, or, in the standard location for your system's Python libraries).

Step 6: Have Fun!

That's it! More information on how to export more symbols and call functions (either way) is available in the Boost Python library documentation.