Using The C++ Placement New Operator

Overview

Using C++11's rvalue references, it is an interesting exercise to see how such can be used with placement new and the new alignas() functionality. This post implements a class that has the ability to conditionally store an object without allocating space for the object from the heap. To accomplish this, the object holds a character array capable of holding an instance of its template argument type T and uses placement new to assign the array to the object.

Aside: Unfortunately the GCC version 4.7.0 snapshot 20111029 C++ compiler does not yet appear to support alignas(). I have left an alignas() commented out so it can be revisited when such is supported. With C++98 it is up to the programmer to ensure that a non-heap allocated character array is properly aligned in memory to store some type T. Although such is beyond the scope of this post, it should soon be moot when compilers support such! ๐Ÿ™‚

For those familiar with Boost, the code presented herein is conceptually similar to the functionality provided by boost::optional.


Conditionally Storing An Object Without Using The Heap

In order to conditionally store an object without using the heap, one will need a bool to indicate whether or not the object exists and sufficient memory to hold (a properly aligned object):

template <typename T>
class optional
{
  private:
    /*alignas(T)*/ char t_[sizeof(T)];
    bool created_;
};

where t_ is a character array capable of holding a T. Now only if there was a way to invoke new using &t_[0]. There is! It is a C++98 feature called placement new. Placement new allows one to write an operator new that takes customized function arguments. Conveniently, the C++98 standard defines an operator new overload that simply accepts an address where the object will be instantiated in RAM. When this is done, one must ensure enough space exists at that address, that the data is properly aligned, and manually call the destructor when the object is destroyed. Since there is no corresponding placement delete operator, the programmer is responsible for explicitly calling the destructor and performing any necessary clean-up operations.

Given the t_ array above one could create an default object of type T where the t_ array is located in RAM by invoking:

T* p = new(t_) T();

and later the object can be destroyed by explicitly invoking the object's destructor:

p->~T();

Cool, eh?!!!

NOTE: The destructor for a type can only be called if a placement new operator was used to create the instance. Although it can be done, never, ever explicitly call the destructor for an object that was not created with placement new!

So the concept this post implements is this: if created_ is true then an instance of T would have been previously instantiated at t_ and if created_ is false no instance of T exists at T.


Handling Instantiations

Let's start by writing the member functions needed to properly instantiate objects of type T located at t_. To do this, the following cases need to be handled:

  • default construction of T
  • copy construction of T
  • move construction of T
  • copy assignment of T
  • move assignment of T
  • destruction of T

Note that these concern T. Since the class template is optional<T> these operations must be defined to support the above:

  • default construction of optional<T>
  • copy construction of optional<T>
  • move construction of optional<T>
  • copy assignment of optional<T>
  • move assignment of optional<T>
  • destruction of optional<T>

Fortunately, the number of cases to handle can be simplified by writing some protected member functions by first noting:

  • Should the object not previously exist for any constructor or assignment invocation, then it must be created with the default value or by using the provided value.
  • Should the object previously exist for any assignment invocation, then it is sufficient to assign the provided value to the object.

Thus, the following member functions can handle all of the aforementioned cases:

protected:
  // Create T using the default constructor or assign default
  // value if already created...
  T& create_or_default();

  // Create T using value or assign value if already created...
  T& create_or_assign(T const& value);

  // Create T using value or assign value if already created...
  T& create_or_assign(T&& value);

  // Destroy T (invokes destructor and sets created_ to false)...
  void destroy();

  // Properly invokes a create_or_*() or destroy() function...
  void create(optional<T> const& o);

  // Properly invokes a create_or_*() or destroy() function...
  void create(optional<T>&& o);

With such the traditional constructors are easily written (note the use of std::move):

public:
  struct none { };

  optional()
    : created_(false)
  {
  }

  optional(none)
    : created_(false)
  {
  }

  optional(optional<T> const& o)
    : created_(false)
  {
    create(o);
  }

  optional(optional<T>&& o)
    : created_(false)
  {
    create(std::move(o));
  }

  optional(T const& t)
    : created_(false)
  {
    create_or_assign(t);
  }

  optional(T&& t)
    : created_(false)
  {
    create_or_assign(std::move(t));
  }

  optional<T>& operator =(none)
  {
    destroy();
    return *this;
  }

  optional<T>& operator =(T const& t)
  {
    create_or_assign(t);
    return *this;
  }

  optional<T>& operator =(T&& t)
  {
    create_or_assign(std::move(t));
    return *this;
  }

  ~optional()
  {
    destroy();
  }

To simplify the implementation of the protected helper functions above, let's define the following functions which all assume that an object of type T has been created:

public:
  T& get()
  {
    return *static_cast<T*>(static_cast<void*>(t_));
  }

  // Until g++ supports this rvalues...
  T&& get_as_rvalue()
  {
    return std::move(*static_cast<T*>(static_cast<void*>(t_)));
  }

  T const& get() const
  {
    return *static_cast<T const*>(static_cast<void const*>(t_));
  }

  T* ptr()
  {
    return static_cast<T*>(static_cast<void*>(t_));
  }

  T const* ptr() const
  {
    return static_cast<T const*>(static_cast<void const*>(t_));
  }

It is necessary to use static_cast twice to properly coerce the t_ pointer into a pointer to a different type by converting it to a void* first. By doing these casts in these functions, all other functions will not need to use any casts! ๐Ÿ™‚


Implementing create_or_default()

If the object has already been created, then assign the default value to it. If not, then create it by invoking its default constructor using placement new:

T& create_or_default()
{
  if (created_)
  {
    T& t = get();
    t = T();
    return t;
  }
  else
  {
    T& retval = *(new(t_) T());
    created_ = true;
    return retval;
  }
}


Implementing create_or_assign()

If the object has already been created, then assign the value pass in to it. If not, then create it by invoking its copy/move constructor using placement new:

T& create_or_assign(T const& value)
{
  if (created_)
  {
    T& t = get();
    t = value;
    return t;
  }
  else
  {
    T& retval = *(new(t_) T(value));
    created_ = true;
    return retval;
  }
}

T& create_or_assign(T&& value)
{
  if (created_)
  {
    T& t = get();
    t = std::move(value);
    return t;
  }
  else
  {
    T& retval = *(new(t_) T(std::move(value)));
    created_ = true;
    return retval;
  }
}


Implementing destroy()

If the object has already been created, call its destructor and set created_ to false. Otherwise do nothing:

void destroy()
{
  if (created_)
  {
    T* t = ptr();
    t->~T();
    created_ = false;
  }
}


Implementing create()

Both copy and move versions of create() have four cases to consider corresponding to the values of this->created_ and o.created_ with o being passed in to the function. The four cases are converted into a two-bit number so they can be efficiently handled using a switch statement:

void create(optional<T> const& o)
{
  switch ((created_ << 1) | o.created_)
  {
    case 1:
    case 3:
      create_or_assign(o.get());
      break;

    case 2:
      destroy();
      break;

    default: // i.e., case 0:
      break;
  }
}

void create(optional<T>&& o)
{
  switch ((created_ << 1) | o.created_)
  {
    case 1:
    case 3:
      create_or_assign(o.get_as_rvalue());
      break;

    case 2:
      destroy();
      break;

    default: // i.e., case 0:
      break;
  }
}


Adding Some Convenience Member Functions

To facilitate the use of class types and the ability to test optional<T> instances in boolean statements to determine if it holds an instance of type T, the following public member functions are added:

operator bool() const
{
  return created_;
}

bool operator !() const
{
  return !created_;
}

T& operator *()
{
  return get();
}

T const& operator *() const
{
  return get();
}

T* operator ->()
{
  return ptr();
}

T const* operator ->() const
{
  return ptr();
}


Using optional<T>

Now optional can be used as follows:

#include <iostream>

int main()
{
  using namespace std;

  optional<int> a;
  cout << (a ? "a is set" : "a is not set") << endl;
  a = 5;
  cout << (a ? "a is set" : "a is not set") << endl;
  a = optional<int>::none();
  cout << (a ? "a is set" : "a is not set") << endl;
}

which outputs:

$ ./a.out
a is not set
a is set
a is not set
$


Closing Comments

Fortunately issues of alignment are (thank)fully addressed in C++11 and so we are only waiting for compilers to add support for those C++11 features. This post demonstrates how one can conditionally create/destroy objects not located on the heap at will while fully leveraging full copy/move semantics. This is very powerful as it allows instances of a class to defer member object instantiation without using the heap. Avoiding the use of the heap enables more efficient code to be written especially when using custom memory allocators or objects that must lazily construct their member variables.

Far too often, code is written to use the heap only to store pointers or other small objects requiring an additional pointer indirection just to access them. Using C++98's placement new operator, C++11's alignment features (when finally implemented), and C++11's move semantics, one can choose to elide a layer of indirection by reserving enough space to hold an object and instantiating it when needed in place. The trade-off is between a small amount of additional adjacent RAM than otherwise required by an object versus the space overhead of allocating object(s) on the heap and using an additional pointer located elsewhere in RAM to access the data. It is reasonable to suggest that the former is likely to run much faster (i.e., with respect to CPU instructions, cache performance, the avoidance of slow heap functions) than the latter especially when small objects are used.

Using The C++ Literal Operator

Overview

The C++11 standard has added a new operator that can be overloaded. Literals in programming languages are hard-coded constants in programs. For example, writing 1.2L, "Hello World!", 4096, etc. are all literals (i.e., the first is a long double value, the second is a const char[13] value, and the third is an int). The C++11 standard allows one to define custom literal types that can be transformed at compile-time or run-time into appropriate values. This post explores such using g++ v4.7 snapshot 20111029.

Update (Dec. 16, 2011): A small update was made to the general case definition of binary_literal_impl to provide more user-friendly compiler error messages if incorrect digits are used.


The Goal

I don't know about you, but I've always wanted to be able to write binary numbers into my program code and to have it store such optimally. Using C++11, we'll be able to do exactly this using code like this:

int main()
{
  using namespace std;

  const unsigned long long bits =
    11011110101011011011111011101111_binary;

  cout << "The number is: " << hex << bits << endl;
}

provided we write a literal operator function whose suffix is _binary.

I believe I read somewhere that literal names not starting with an underscore are reserved by the C++ standard. If you know definitively or otherwise, kindly let me know. ๐Ÿ™‚

Update: In a reddit post's comment, zootsuit notes, "From the N3291 draft/17.6.4.3.5: Literal suffix identifiers that do not start with an underscore are reserved for future standardization." –which is likely where I read it: in one of the draft standards.

I am going to introduce an added twist: the conversion must be done at compile-time. Why? Efficiency! For any binary number hard-coded into the program, it must be encoded as a single integer value in the executable (to ensure minimum space usage and maximum efficiency) converted properly at compile time with an error if it is not proper. Certainly, no programmer wants the binary number to be stored in the executable as a string that is converted at run-time into an integer! Ack! The latter is both a waste of space and time.


Let's Do A Simple Example First

Before messing around with template metaprogramming (which is probably bewildering until you know how to read/understand it), let's write a literal whose suffix is _square that computes the square of a long double number it is associated with and returns the result:

#include <iostream>

// Insert literal operator _square definition here. (See below.)

int main()
{
  using namespace std;

  const long double num = 25.5_square;

  cout << num << endl;
}

which would output 650.25. To do this, the following function needs to be written:

constexpr long double operator "" _square(long double num)
{
  return num*num;
}

The literal operator's name is operator"" and its suffix is given after it (i.e., _square). The return type can be anything but it is set to long double because that is the computed value's type here. The constexpr keyword implies and requires that the compiler must be able to compute the result as a compile-time constant. If this is not possible, then it will fail to compile. In general, if a literal operator overload is not written as a template function and does not use constexpr, the compiler will invoke the literal operator at run-time.

That's it! Simply compile the above code, it will store 650.25 in the executable binary as a hard-coded long double value!


Literal Operator Function Parameters

Be aware that the literal operator only allows a fixed set of function arguments:

  • const char*
  • unsigned long long int
  • long double
  • char
  • wchar_t
  • char16_t
  • char32_t
  • const char*, std::size_t
  • const wchar_t*, std::size_t
  • const char16_t*, std::size_t
  • const char32_t*, std::size_t

or if there are no arguments at all, then the literal operator must be defined as a template function whose template arguments are a char template parameter pack, i.e.,

template <char... CS>
some_return_type operator "" _some_suffix_name();

Also notice that all of the function argument types, except for the character types, are the largest-range types of their kind (i.e., unsigned long long is the largest-range integer type, long double is the largest-range floating-point type) as the compiler can easily cast any value to a smaller type at compile-time. Since there are numerous character and string literal types (including the new Unicode and raw literals in C++11) the remaining parameters listed handle these special types of literals.


Implementing The _binary Literal Operator

Recall the earlier code that permits one to write a binary number in the C++:

#include <iostream>

// Insert definition of _binary literal and associated code here.

int main()
{
  using namespace std;

  const unsigned long long bits =
    11011110101011011011111011101111_binary;
  cout << "The number is: " << hex << bits << endl;
}

i.e., the number is 0xDEADBEEF, which is what the program will output.

To ensure that the conversion occurs at compile-time and to be able to easily implement it (as it is a non-trivial function), the implementation of the _binary literal will use a class template with partial template specialization. To understand this better, let's first start by defining the _binary literal:

template <char... Digits>
constexpr unsigned long long operator "" _binary()
{
  return binary_literal_impl<Digits...>::to_ulonglong();
}

Notice that the _binary literal operation has no arguments. This is because the char values in the string before _binary are being passed as a char template parameter pack.

Template parameter packs represent a sequence of template arguments. They are not types and to extract them they must be expanded with the template parameter pack expansion operator, ....

Within the _binary literal operator definition, the char template parameter pack needs to be expanded and processed into an unsigned long long. To accomplish this, the work will be delegated to a static function inside the class template binary_literal_impl as this will allow writing clean, recursively defined code that processes the char template parameter pack which (should be only!) composed of '0' and '1' characters.


Implementing The binary_literal_impl Class Template

The binary_literal_impl class template allows code to be written that recognizes the following properties about its template arguments:

  • when a '0' appears first, possibly followed by more characters,
  • when a '1' appears first, possibly followed by more characters, and,
  • when there are no characters.

Additionally, if any other (invalid) char values occur, then a compile-time error will be generated (as there will be no definition that exists for binary_literal_impl for such arguments. To accomplish this, partial template specialization is needed so the general case needs to be (forward) declared and not defined (as we want errors if there are no matches!) first:

template <char... Digits>
struct binary_literal_impl;

This is needed first so the compiler knows template arguments for the binary_literal_impl class template must be a parameter pack of char values. It is very important that there are no braces used here: this avoids defining what is associated with binary_literal_impl if there are no partial matches with the code that is written below. (If the compiler cannot find a matching definition, a compile-time error will occur.)

If you are used to functional programming in Miranda or Haskell, C++ requires the reverse order of what would be done in those languages when using partial specialization: the general case is written first, then the specialized cases follow.

Even better one can write the above general case of binary_literal_impl to use static_assert to trigger a very nice compiler error messages when an incorrect digit is used. (If you are new to this style of programming, I encourage you to write also try out the above definition to see the differences in compiler output.)

// Alternative user-friendly general case
// (i.e., any digits other than '0' or '1')...
template <char... Digits>
struct binary_literal_impl
{
  static constexpr unsigned long long to_ulonglong()
  {
    static_assert(false, "Digit characters must either be '0' or '1'.");
    return 0;
  }
};

If the first char (template argument) value is '0', then there is no one bit to shift and the result is simply to return the integer value computed on the rest of the characters in the char parameter pack:

// If the next digit is zero, then compute the rest...
template <char... Digits>
struct binary_literal_impl<'0', Digits...>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return binary_literal_impl<Digits...>::to_ulonglong();
  }
};

Notice that the template argument, however, is now one shorter than what it was. If this is not obvious, then know when one is using partial template specialization, what is inside < and > after the class template name is what is being matched. Thus, since '0', Digits... appears, that is what is being matched: a '0' character followed by a parameter pack called Digits with Digits defined to be char... in template <char... Digits>. Thus, Digits... represents the expansion of all char template argument values after the first one! ๐Ÿ™‚

If the first char (template argument) value is '1', then there is a one bit to shift left which must be bitwise-OR'd with the result computed on the remaining arguments. Since the binary digits are being processed from left to right, the one bit should be shifted left by the number of digits that remain to be processed. The C++11 sizeof... operator allows one to know the size of a parameter pack at compile time, so the definition of this case becomes:

// If the next digit is one, then shift 1 and compute the rest...
template <char... Digits>
struct binary_literal_impl<'1', Digits...>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return (1ULL << sizeof...(Digits))
      | binary_literal_impl<Digits...>::to_ulonglong();
  }
};

Again notice that the number of characters remaining to process becomes one shorter when recursively calling to_ulonglong().

Finally, at some point there will be no digits left to process in the recursively defined code above. When this occurs, the computed answer should be zero:

// Base case: No digits, so return 0...
template <>
struct binary_literal_impl<>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return 0;
  }
};

i.e., notice binary_literal_impl<> has no contained values.

That's it!

Importantly, since we did not write any code to handle characters other than '0' or '1' using any other values (e.g., try putting a 2 or a in the number) will cause compilation to fail. This is a good thing because a binary number should only contain '0's and '1's!


Closing Comments

As with many template metaprogramming techniques in C++, no matter how complicated the metaprogramming code is, the use of it is often very straight-forward. With C++11 supporting literals, code will be easier to read and write since more meaningful values can now appear as literals in code instead of equivalent hard-to-understand, machine-specific, hard-coded character or integer arrays. Nicely, any literal definitions/prototypes can be hidden away in header files: the end user does not need to know the details. Why? The end user only needs to know how to use the literal operator, i.e., what is written in the documentation about it! In this instance understanding how to use the _binary literal is easy: it must be preceded by a valid binary number –one doesn't need to see its definition at all to be able to use it: he/she only needs to see its documentation. ๐Ÿ™‚

For your convenience, this is the entire program presented above:

//===============================================================

//
// binary_literal_impl represents a compile-time function that
// computes the unsigned long long int from a list of characters
// Digits that MUST be composed of '0' or '1'...
//
template <char... Digits>
struct binary_literal_impl;

// If the next digit is zero, then compute the rest...
template <char... Digits>
struct binary_literal_impl<'0', Digits...>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return binary_literal_impl<Digits...>::to_ulonglong();
  }
};

// If the next digit is one, then shift 1 and compute the rest...
template <char... Digits>
struct binary_literal_impl<'1', Digits...>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return (1UL << sizeof...(Digits))
      | binary_literal_impl<Digits...>::to_ulonglong();
  }
};

// Base case: No digits, so return 0...
template <>
struct binary_literal_impl<>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return 0;
  }
};

//===============================================================

template <char... Digits>
constexpr unsigned long long operator "" _binary()
{
  return binary_literal_impl<Digits...>::to_ulonglong();
}

//===============================================================

#include <iostream>

int main()
{
  using namespace std;

  const unsigned long long bits =
    11011110101011011011111011101111_binary;
  cout << "The number is: " << hex << bits << endl;
}

//===============================================================

Testing If std::tuple Is Being Applied To Functors Efficiently

Overview

In computer science it is never good enough to make a claim when the claim can be verified through testing. Fortunately C++ is a rich language which significantly and often aids the writing of test code.

In my previous blog article, Applying std::tuple To Functors Efficiently and Bearded Code Warrior's (BCW) most recent blog entry, Applying a function to a tuple – once again, claims are made that the C++ compiler will pass the tuple arguments efficiently to the desired function.

In my previous post, I had a long preamble explaining how the solution would work before presenting the solution near the end. In this post, I will start with the end results and explain how the code was written to obtain those results. This is done to emphasize why verifying claims/properties of code you've written is important.

Please note that BCW's apply() code is called apply_tuple() within my code. To avoid confusion, within this post all names of any BCW apply() functions are renamed to apply_tuple().

Update Feb. 3, 2012: I have changed Operand's operator ()(T…) function overloads to return Operand*. This ensures no copies/move are made with the returned value from the functor (as such is not relevant to this article series). All code in this series has been updated accordingly.

Article Source Code: cpp_code-apply_tuple-v3.zip.


What Is The Optimal Solution?

Before seeing any test results or writing any test code, first consider what is the optimal solution given a function applied to some arguments:

apply(foo, arg1, arg2, ..., argN);
apply_tuple(foo, make_tuple(arg1, arg2, ..., argN);

Assuming arg1, arg2, …, argN (ARGS) are all C++ class types; assuming all arguments were created prior to the call; and all arguments are valid rvalue references, then the optimal solutions are:

Table 1: Theoretical Optimal Apply Function Results
Call foo Function All N arguments
apply 0 misc
0 copies
0 moves
0 assign
0 misc
0 copies
0 moves
0 assign
apply_tuple 0 misc
0 copies
0 moves
0 assign
0 misc
0 copies
0 moves
0 assign

where:

Table 2: Definitions Of Terms In Table 1
Term Definition
misc Refers to the invocation of a default or any other constructor excluding any copy or move constructors.
copies Refers to the invocation of any copy constructor.
moves Refers to the invocation of any move constructor.
assign Refers to the invocation of any copy/move assignment operator.

For all tests all arguments (except foo) will be created as unnamed or compiler-generated temporary variables. Using only such temporary variables implies that the compiler should be able to reuse them to avoid doing any unnecessary copies/moves, etc. Provided the code being tested is properly written, optimal code should be generated as a result. If such is seen, then this test ought to also be a nice way to verify and test the claim that C++11 allows perfect forwarding!

Table 1 needs to be adjusted since temporary variables will be used throughout for all arguments except the function call argument. Specifically, the following items need to be added to Table 1:

  • the creation of the required (unnamed) initial variables, and,
  • any necessary copies, moves, and/or assignments that must be done for those variables.

These adjustments are reflected in Table 3:

Table 3: Theoretical Optimal Apply Function Results Adjusted For Required Initial Rvalue Reference Variables
Call foo Function All N arguments
apply 1 misc
0 copies
0 moves
0 assign
N misc
0 copies
0 moves
0 assign
apply_tuple 1 misc
0 copies
0 moves
0 assign
N misc
0 copies
N moves
0 assign

Clearly to populate a std::tuple with N rvalue references, a total of N moves, not copies, must be done. Additionally the actual original variables/functor must be created and this accounts for the misc constructor invocations listed in Table 3. Finally, in the case when the foo operation has no arguments, since zero-argument functions often do return a value, a value will be returned in those tests. Thus, there will be "1 misc" foo constructor invocation for any zero-argument cases in the results.

Finally note that within all tests performed within this article, all calls to apply_tuple will use std::make_tuple. Now that the theory looks good, let's check out reality!


The Test Results

Results For My Version Of apply() And apply_tuple()

Table 4 summarizes the results of using my previous blog entry's apply() and apply_tuple():

Table 4: Test Results Using Paul Preney's Apply Code
Call Functor Overhead N Argument Overhead
0 1 2 3
apply 1 misc
0 copies
0 moves
0 assign
1 misc
0 copies
0 moves
0 assign
1 misc
0 copies
0 moves
0 assign
2 misc
0 copies
0 moves
0 assign
3 misc
0 copies
0 moves
0 assign
apply_tuple 1 misc
0 copies
0 moves
0 assign
1 misc
0 copies
0 moves
0 assign
1 misc
0 copies
1 moves
0 assign
2 misc
0 copies
2 moves
0 assign
3 misc
0 copies
3 moves
0 assign

which matches the optimal results in Table 3. (Recall the zero-argument case creates one Operand as its return value hence the "1 misc" in those cases.) The actual test program output for apply() was:

Op() = 1.

Op LOG ============
default_constructor[this=0x7fff7c331c48]
destructor[this=0x7fff7c331c48]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331c50]
destructor[this=0x7fff7c331c50]
Operand HIST ===========
other_constructor 1
destructor 1
Operand END ============

Op()(Operand(-3)) = -3.

Op LOG ============
default_constructor[this=0x7fff7c331c49]
destructor[this=0x7fff7c331c49]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331c60]
destructor[this=0x7fff7c331c60]
Operand HIST ===========
other_constructor 1
destructor 1
Operand END ============

Op()(Operand(-3),Operand(-10)) = -13.

Op LOG ============
default_constructor[this=0x7fff7c331c4a]
destructor[this=0x7fff7c331c4a]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331c80]
other_constructor[this=0x7fff7c331c70]
destructor[this=0x7fff7c331c70]
destructor[this=0x7fff7c331c80]
Operand HIST ===========
other_constructor 2
destructor 2
Operand END ============

Op()(Operand(-3),Operand(-10),Operand(-100)) = -113.

Op LOG ============
default_constructor[this=0x7fff7c331c4b]
destructor[this=0x7fff7c331c4b]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331cb0]
other_constructor[this=0x7fff7c331ca0]
other_constructor[this=0x7fff7c331c90]
destructor[this=0x7fff7c331c90]
destructor[this=0x7fff7c331ca0]
destructor[this=0x7fff7c331cb0]
Operand HIST ===========
other_constructor 3
destructor 3
Operand END ============

and the output for apply_tuple() was:

Op() = 1.

Op LOG ============
default_constructor[this=0x7fff7c331c4c]
destructor[this=0x7fff7c331c4c]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331cc0]
destructor[this=0x7fff7c331cc0]
Operand HIST ===========
other_constructor 1
destructor 1
Operand END ============

Op()(Operand(-3)) = -3.

Op LOG ============
default_constructor[this=0x7fff7c331c4d]
destructor[this=0x7fff7c331c4d]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331cd0]
move_constructor[this=0x7fff7c331ce0,src=0x7fff7c331cd0]
destructor[this=0x7fff7c331ce0]
destructor[this=0x7fff7c331cd0]
Operand HIST ===========
move_constructor 1
other_constructor 1
destructor 2
Operand END ============

Op()(Operand(-3),Operand(-10)) = -13.

Op LOG ============
default_constructor[this=0x7fff7c331c4e]
destructor[this=0x7fff7c331c4e]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331d00]
other_constructor[this=0x7fff7c331cf0]
move_constructor[this=0x7fff7c331d40,src=0x7fff7c331d00]
move_constructor[this=0x7fff7c331d44,src=0x7fff7c331cf0]
destructor[this=0x7fff7c331d44]
destructor[this=0x7fff7c331d40]
destructor[this=0x7fff7c331cf0]
destructor[this=0x7fff7c331d00]
Operand HIST ===========
move_constructor 2
other_constructor 2
destructor 4
Operand END ============

Op()(Operand(-3),Operand(-10),Operand(-100)) = -113.

Op LOG ============
default_constructor[this=0x7fff7c331c4f]
destructor[this=0x7fff7c331c4f]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331d30]
other_constructor[this=0x7fff7c331d20]
other_constructor[this=0x7fff7c331d10]
move_constructor[this=0x7fff7c331d50,src=0x7fff7c331d30]
move_constructor[this=0x7fff7c331d54,src=0x7fff7c331d20]
move_constructor[this=0x7fff7c331d58,src=0x7fff7c331d10]
destructor[this=0x7fff7c331d58]
destructor[this=0x7fff7c331d54]
destructor[this=0x7fff7c331d50]
destructor[this=0x7fff7c331d10]
destructor[this=0x7fff7c331d20]
destructor[this=0x7fff7c331d30]
Operand HIST ===========
move_constructor 3
other_constructor 3
destructor 6
Operand END ============

where:

  • Op os foo,
  • Op() is the creation of a functor,
  • Op()(...) is the invocation of the functor with the … arguments passed in,
  • Operand is the type representing all arguments and return values,
  • Operand(value) is the creation of an argument with the value of value,
  • Op_LOG and Operand_LOG are the constructor, assignment, and destructor actions performed during the test in chronological order for all Op and Operand types respectively, and,
  • Op_HIST and Operand_HIST are the frequency histograms of the actions performed during the test for all Op and Operand types respectively.

These outputs were produced by the following code:

// TEST apply()...
APPLY_FULL0(Op);
APPLY_FULL1(Op, -3);
APPLY_FULL2(Op, -3, -10);
APPLY_FULL3(Op, -3, -10, -100);

// TEST apply_tuple()...
APPLY_TUPLE_FULL0(Op);
APPLY_TUPLE_FULL1(Op, -3);
APPLY_TUPLE_FULL2(Op, -3, -10);
APPLY_TUPLE_FULL3(Op, -3, -10, -100);

where APPLY_FULLn and APPLY_TUPLE_FULLn are macros that will be defined below. The reason macros were used was to ensure no side-effects were incurred through the use of function calls.

This test demonstrates:

  • my apply() and apply_tuple() implementations are optimal, and,
  • perfect forwarding of function arguments is a real achievement of C++11. ๐Ÿ™‚

The code enabling these tests to be performed will be discussed after discussing the results of BCW's apply_tuple().


Results For BCW's Version Of apply_tuple()

BCW's apply_tuple() does not use any rvalue references and instead relies on the compiler optimizing the use of all recursive calls. Consequently, my hypothesis was that there would be more overhead in the general case using objects as the compiler would need to make copies of class instances. This is understandable since C++ compilers cannot use the multitude of assumptions it uses for any primitive types such as int, double, etc. That said, I was shocked to see just how bad the results were using BCW's code (after adding appropriate Operand operations supporting lvalues so BCW's code would compile):

Table 5: Test Results Using BCW's Apply Code's Function Call Overhead
Call Functor Overhead With N Arguments
0 1 2 3
apply N/A
apply_tuple 1 misc
0 copies
0 moves
0 assign
1 misc
1 copies
0 moves
0 assign
1 misc
2 copies
0 moves
0 assign
1 misc
3 copies
0 moves
0 assign

which shows how BCW's code unnecessarily copies the function object with each recursive call and:

Table 6: Test Results Using BCW's Apply Code's Argument Overhead
Call N Argument Overhead
0 1 2 3
apply N/A
apply_tuple 1 misc
0 copies
0 moves
0 assign
1 misc
3 copies
2 moves
0 assign
2 misc
8 copies
3 moves
0 assign
3 misc
16 copies
4 moves
0 assign

which shows how quickly the number of copies explodes as the argument size increases as well that the number of moves is equal to the number of arguments + 1, except in the zero-argument case as a special function overload is used to handle tuples with no elements.

Thus, it appears that BCW's code requires:

  • N copies of its function object, and,
  • some 2^(N+1) copies with N+1 moves of its arguments

for a single N-argument function call instead of the optimal:

  • zero copies of its function object, and,
  • zero copies with N moves of the function arguments.

So for N arguments:

  • N extra copies of the function argument are created,
  • it appears some O(2^(N+1)) copies of the arguments are created, and,
  • N extra moves of the arguments are performed.

all with GCC v4.7 (snapshot 2011-10-22) compiled using -std=c++0x and -O3 options! Wow!

This clearly shows just how relevant and practically useful rvalue references are in C++11 or any language that supports objects. Moreover, given how common functions with zero, one, two, and three arguments are, certainly BCW's code shouldn't be used unmodified with class/struct/union types. Thus, although BCW's code may compile and optimize well with the built-in C++ types, it fails to be anywhere near optimal for general class types.

Interestingly, one should note that BCW's solution, as posted in the blog article, performs, for N arguments, N recursive calls each passing increasing numbers of arguments by value. Such is clearly reflected in the values summarized in Table 5 and Table 6 whose values which were obtained from this test program output (whose computed output is also not completely correct due to some serious compiler warnings about references to temporaries in the recursive step of BCW's solution):

Op() = 1.

Op LOG ============
default_constructor[this=0x7fffede986cc]
destructor[this=0x7fffede986cc]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fffede986d0]
destructor[this=0x7fffede986d0]
Operand HIST ===========
other_constructor 1
destructor 1
Operand END ============

Op()(Operand(-3)) = 20.

Op LOG ============
default_constructor[this=0x7fffede986cd]
copy_constructor[this=0x7fffede985bf,src=0x7fffede986cd]
destructor[this=0x7fffede985bf]
destructor[this=0x7fffede986cd]
Op HIST ===========
default_constructor 1
copy_constructor 1
destructor 2
Op END ============
Operand LOG ============
other_constructor[this=0x7fffede986e0]
move_constructor[this=0x7fffede986f0,src=0x7fffede986e0]
copy_constructor[this=0x7fffede985d0,src=0x7fffede986f0]
copy_constructor[this=0x7fffede985c0,src=0x7fffede986f0]
copy_constructor[this=0x7fffede985f0,src=0x7fffede985d0]
move_constructor[this=0x7fffede985e0,src=0x7fffede985f0]
destructor[this=0x7fffede985f0]
destructor[this=0x7fffede985e0]
destructor[this=0x7fffede985c0]
destructor[this=0x7fffede985d0]
destructor[this=0x7fffede986f0]
destructor[this=0x7fffede986e0]
Operand HIST ===========
copy_constructor 3
move_constructor 2
other_constructor 1
destructor 6
Operand END ============

Op()(Operand(-3),Operand(-10)) = -13.

Op LOG ============
default_constructor[this=0x7fffede986ce]
copy_constructor[this=0x7fffede984ce,src=0x7fffede986ce]
copy_constructor[this=0x7fffede984cf,src=0x7fffede984ce]
destructor[this=0x7fffede984cf]
destructor[this=0x7fffede984ce]
destructor[this=0x7fffede986ce]
Op HIST ===========
default_constructor 1
copy_constructor 2
destructor 3
Op END ============
Operand LOG ============
other_constructor[this=0x7fffede98710]
other_constructor[this=0x7fffede98700]
move_constructor[this=0x7fffede98750,src=0x7fffede98710]
move_constructor[this=0x7fffede98754,src=0x7fffede98700]
copy_constructor[this=0x7fffede984d0,src=0x7fffede98754]
copy_constructor[this=0x7fffede98520,src=0x7fffede98750]
copy_constructor[this=0x7fffede98524,src=0x7fffede98754]
copy_constructor[this=0x7fffede984f0,src=0x7fffede98520]
copy_constructor[this=0x7fffede984e0,src=0x7fffede984d0]
copy_constructor[this=0x7fffede98530,src=0x7fffede98520]
copy_constructor[this=0x7fffede98534,src=0x7fffede98524]
copy_constructor[this=0x7fffede98510,src=0x7fffede984e0]
move_constructor[this=0x7fffede98500,src=0x7fffede98510]
destructor[this=0x7fffede98510]
destructor[this=0x7fffede98500]
destructor[this=0x7fffede98534]
destructor[this=0x7fffede98530]
destructor[this=0x7fffede984e0]
destructor[this=0x7fffede984f0]
destructor[this=0x7fffede98524]
destructor[this=0x7fffede98520]
destructor[this=0x7fffede984d0]
destructor[this=0x7fffede98754]
destructor[this=0x7fffede98750]
destructor[this=0x7fffede98700]
destructor[this=0x7fffede98710]
Operand HIST ===========
copy_constructor 8
move_constructor 3
other_constructor 2
destructor 13
Operand END ============

Op()(Operand(-3),Operand(-10),Operand(-100)) = -113.

Op LOG ============
default_constructor[this=0x7fffede986cf]
copy_constructor[this=0x7fffede984de,src=0x7fffede986cf]
copy_constructor[this=0x7fffede984df,src=0x7fffede984de]
copy_constructor[this=0x7fffede9831f,src=0x7fffede984df]
destructor[this=0x7fffede9831f]
destructor[this=0x7fffede984df]
destructor[this=0x7fffede984de]
destructor[this=0x7fffede986cf]
Op HIST ===========
default_constructor 1
copy_constructor 3
destructor 4
Op END ============
Operand LOG ============
other_constructor[this=0x7fffede98740]
other_constructor[this=0x7fffede98730]
other_constructor[this=0x7fffede98720]
move_constructor[this=0x7fffede98760,src=0x7fffede98740]
move_constructor[this=0x7fffede98764,src=0x7fffede98730]
move_constructor[this=0x7fffede98768,src=0x7fffede98720]
copy_constructor[this=0x7fffede984e0,src=0x7fffede98768]
copy_constructor[this=0x7fffede98510,src=0x7fffede98760]
copy_constructor[this=0x7fffede98514,src=0x7fffede98764]
copy_constructor[this=0x7fffede98518,src=0x7fffede98768]
copy_constructor[this=0x7fffede98500,src=0x7fffede98514]
copy_constructor[this=0x7fffede984f0,src=0x7fffede984e0]
copy_constructor[this=0x7fffede98520,src=0x7fffede98510]
copy_constructor[this=0x7fffede98524,src=0x7fffede98514]
copy_constructor[this=0x7fffede98528,src=0x7fffede98518]
copy_constructor[this=0x7fffede98340,src=0x7fffede98520]
copy_constructor[this=0x7fffede98330,src=0x7fffede98500]
copy_constructor[this=0x7fffede98320,src=0x7fffede984f0]
copy_constructor[this=0x7fffede98370,src=0x7fffede98520]
copy_constructor[this=0x7fffede98374,src=0x7fffede98524]
copy_constructor[this=0x7fffede98378,src=0x7fffede98528]
copy_constructor[this=0x7fffede98360,src=0x7fffede98320]
move_constructor[this=0x7fffede98350,src=0x7fffede98360]
destructor[this=0x7fffede98360]
destructor[this=0x7fffede98350]
destructor[this=0x7fffede98378]
destructor[this=0x7fffede98374]
destructor[this=0x7fffede98370]
destructor[this=0x7fffede98320]
destructor[this=0x7fffede98330]
destructor[this=0x7fffede98340]
destructor[this=0x7fffede98528]
destructor[this=0x7fffede98524]
destructor[this=0x7fffede98520]
destructor[this=0x7fffede984f0]
destructor[this=0x7fffede98500]
destructor[this=0x7fffede98518]
destructor[this=0x7fffede98514]
destructor[this=0x7fffede98510]
destructor[this=0x7fffede984e0]
destructor[this=0x7fffede98768]
destructor[this=0x7fffede98764]
destructor[this=0x7fffede98760]
destructor[this=0x7fffede98720]
destructor[this=0x7fffede98730]
destructor[this=0x7fffede98740]
Operand HIST ===========
copy_constructor 16
move_constructor 4
other_constructor 3
destructor 23
Operand END ============

using the same macro code as the previous section's apply_tuple with my apply and apply_tuple code was deleted and replaced with BCW's code instead.

BCW's work is typically thorough but this instance shows that optimizations aren't magical, the programmer must use rvalue references where they ought to be used, and one should always double-check efficiency claims for code written. While in this instance BCW's code was found to have issues, we're all human and all coders all been guilty on various occassions of the assuming code efficiencies that weren't true!

I tried tweaking BCW's code while remaining faithful to the style of his solution by adding rvalue references. Unfortuantely, I ran into a snag: the snapshot of GCC v4.7 I am using gives "Sorry, unimplemented" messages when one tries to expand the TS… parameter pack inside a template argument. The latter is needed to obtain the type of each tuple argument using std::tuple_element so it can be passed to std::forward. Without such, one must use a class template as per my solution.


Tracking All Constructors, Assignments, and Destructors

Defining The Tracking Class

So how would one go about easily tracking and logging all constructor invocations, copy/move assignments, and when destructors are called? First one starts by writing a class template that does exactly this (say in a file called tracker.hxx):

#ifndef tracker_hxx_
#define tracker_hxx_

#include <list>
#include <map>
#include <tuple>
#include <ostream>

template <typename T>
struct tracker
{
  public:
    // Handle default constructor tracking...
    tracker()
    {
      update(DefaultConstr, this);
    }

    // Invoke for tracking any other constructors...
    explicit tracker(int)
    {
      update(OtherConstr, this);
    }

    // Handle copy constructor tracking...
    tracker(tracker const& src)
    {
      update(
        CopyConstr,
        this,
        &const_cast<tracker&>(src)
      );
    }

    // Handle move constructor tracking...
    tracker(tracker&& src)
    {
      update(MoveConstr, this, &src);
    }

    // Handle copy-assignment operator tracking...
    tracker& operator =(tracker const& src)
    {
      update(
        CopyAssign,
        this,
        &const_cast<tracker&>(src)
      );
      return *this;
    }

    // Handle move-assignment operator tracking...
    tracker& operator =(tracker&& src)
    {
      update(MoveAssign, this, &src);
      return *this;
    }

    // Handle destructor tracking...
    ~tracker()
    {
      update(Destruct, this);
    }

    // Clear out all log and histogram data...
    static void clear()
    {
      log_.clear();
      hist_.clear();
    }

    // Output the log to a stream...
    static std::ostream& output_log(std::ostream& os)
    {
      for (log_entry_type& le : log_)
      {
        output_log_entry(os, le);
        os << '\n';
      }
      return os;
    }

    // Output the histogram to a stream...
    static std::ostream& output_hist(std::ostream& os)
    {
      for (typename hist_type::value_type& he : hist_)
      {
        output_hist_entry(os, he);
        os << '\n';
      }
      return os;
    }

  private:
    // Enumeration to identify the various operations being logged
    // and tracked in the frequency histogram...
    enum op
    {
      DefaultConstr, CopyConstr, MoveConstr, OtherConstr,
      CopyAssign, MoveAssign,
      Destruct
    };

    // Define log_entry_type as a std::tuple such that:
    //   <0> is an instance of op
    //   <1> indicates the this object
    //   <2> indicates the source object (e.g., for copies/moves)
    typedef std::tuple<op, tracker*, tracker*> log_entry_type;

    // Define log_type as a list of log_entry_types...
    typedef std::list<log_entry_type> log_type;

    // Define hist_type as a map from op to size_t...
    typedef std::map<op, std::size_t> hist_type;

    // Update tracked information in the log and histogram...
    static void update(op o, tracker* this_v, tracker* src = nullptr)
    {
      auto log_entry = log_entry_type(o, this_v, src);

      // Append to log...
      log_.push_back(log_entry);

      // Update histogram...
      ++hist_[o];
    }

    // Output a log entry...
    static std::ostream& output_log_entry(std::ostream& os,
      log_entry_type const& le)
    {
      switch (std::get<0>(le))
      {
        case DefaultConstr:
          os
            << "default_constructor[this="
            << std::get<1>(le)
            << ']'
          ;
          break;

        case CopyConstr:
          os
            << "copy_constructor[this="
            << std::get<1>(le)
            << ",src="
            << std::get<2>(le)
            << ']'
          ;
          break;

        case MoveConstr:
          os
            << "move_constructor[this="
            << std::get<1>(le)
            << ",src="
            << std::get<2>(le)
            << ']'
          ;
          break;

        case OtherConstr:
          os
            << "other_constructor[this="
            << std::get<1>(le)
            << ']'
          ;
          break;

        case CopyAssign:
          os
            << "copy_assignment[this="
            << std::get<1>(le)
            << ",src="
            << std::get<2>(le)
            << ']'
          ;
          break;

        case MoveAssign:
          os
            << "move_assignment[this="
            << std::get<1>(le)
            << ",src="
            << std::get<2>(le)
            << ']'
          ;
          break;

        case Destruct:
          os
            << "destructor[this="
            << std::get<1>(le)
            << ']'
          ;
          break;
      }
      return os;
    }

    // Output a hist entry...
    static std::ostream& output_hist_entry(std::ostream& os,
      typename hist_type::value_type const& he)
    {
      switch (he.first)
      {
        case DefaultConstr:
          os << "default_constructor " << he.second;
          break;

        case CopyConstr:
          os << "copy_constructor " << he.second;
          break;

        case MoveConstr:
          os << "move_constructor " << he.second;
          break;

        case OtherConstr:
          os << "other_constructor " << he.second;
          break;

        case CopyAssign:
          os << "copy_assignment " << he.second;
          break;

        case MoveAssign:
          os << "move_assignment " << he.second;
          break;

        case Destruct:
          os << "destructor " << he.second;
          break;
      }
      return os;
    }

    // Define the log_ and hist_ variables statically...
    static log_type log_;
    static hist_type hist_;
};

// All class static variables must be instantiated outside the class...
template <typename T>
typename tracker<T>::log_type tracker<T>::log_;

template <typename T>
typename tracker<T>::hist_type tracker<T>::hist_;

#endif // #ifndef tracker_hxx_

Noteworthy is how almost everything is defined as private in order to not pollute the derived class with symbols. Even the static functions could have been defined outside the tracker class as friends to further avoid any pollution.

The idea behind tracker<T> is to:

  • log all constructor, copy/move assignment, and destructor invocations as they occur for some type T,
  • to log all object names (i.e., their addresses) upon which these invocations occur,
  • to log any source objects for all copying and move operations, and,
  • to keep an up-to-date frequency histogram of how many times each type of invocation occurred.

Using The Tracker Class With The Curiously Recurring Template Pattern (CRTP)

With such, one need only inherit from tracker passing in the name of the class doing the inheriting:

class Op : tracker<Op>
{
  public:
    using tracker<Op>::tracker<Op>;
};

Unfortunately, it appears the version of GCC I am using does not implement using syntax with inheriting constructors yet. As a work-around, one needs to define each constructor, assignment operator, and destructor in the derived class and invoke the same in the parent:

struct Op : tracker<Op>
{
  Op()
    : tracker()
  {
  }

  Op(Op const& a)
    : tracker(a)
  {
  }

  Op(Op&& a)
    : tracker(std::move(a))
  {
  }

  Op& operator =(Op const& a)
  {
    tracker::operator =(a);
    return *this;
  }

  Op& operator =(Op&& a)
  {
    tracker::operator =(std::move(a));
    return *this;
  }

  ~Op() = default;
};

Passing the class one is defining as a template argument to its base class is permitted and extremely useful in C++. It is a C++ programming pattern is called the Curiously Recurring Template Pattern (CRTP) and it works because all template instantiations are lazy (i.e., they only occur when they are required) and C++ permits forward declarations of types. When the compiler encounters class Op it now knows that Op is a class, so when it is told to inherit from tracker<Op> it is not an error provided that the use of Op inside of tracker does not need to know anything inside Op's definition outside of any member function's definition (not prototype)! If one tries to access members of Op from within tracker (but not within its member functions) the C++ compiler will give an error message about Op being an "incomplete type". Since tracker does no such thing, the compiler inherits from tracker<Op> and then processes the definition of Op!

The CRTP design pattern is extremely useful in C++ programs especially to perform the following:

  • to automatically perform tracking/logging functions including automatically adding/removing all instances of a type to data structures, and,
  • to eliminate the overhead of virtual function calls when run-time object polymorphism is not needed.

Since we want to track the function object instances and the function argument instances, two classes will need to be defined using CRTP:

  • Op representing the function object type, and,
  • Operand representing the function arguments' type.

Such would be defined as follows (say in a file called op_operand.hxx):

#ifndef op_operand_hxx_
#define op_operand_hxx_

#include "tracker.hxx"
#include <ostream>
#include <iostream>

//====================================================================

// Define the function argument type. It will hold an int value...
struct Operand : tracker<Operand>
{
  Operand()
    : tracker(), value(0) { }
  Operand(int v)
    : tracker(0), value(v) { }
  Operand(Operand const& a)
    : tracker(a), value(a.value) { }
  Operand(Operand&& a)
    : tracker(std::move(a)), value(a.value) { a.value = 0; }
  Operand& operator =(Operand const& a)
    { tracker::operator =(a); value = a.value; return *this; }
  Operand& operator =(Operand&& a)
  {
    tracker::operator =(std::move(a));
    value = a.value;
    a.value = 0;
    return *this;
  }
  ~Operand() = default;

  int value;
};

// Define the ability for Operands to be written to a stream...
std::ostream& operator <<(std::ostream& os, Operand const& o)
{
  os << o.value;
  return os;
}

//====================================================================

// Define the functor type...
struct Op : tracker<Op>
{
  Op() : tracker() { }
  Op(Op const& a) : tracker(a) { }
  Op(Op&& a) : tracker(std::move(a)) { }
  Op& operator =(Op const& a)
    { tracker::operator =(a); return *this; }
  Op& operator =(Op&& a)
    { tracker::operator =(std::move(a)); return *this; }
  ~Op() = default;

  // Define calling a function with no arguments...
  Operand* operator ()() const
    { return nullptr; }

  // Define calling a function with one argument...
  Operand* operator ()(Operand&& x) const
    { return nullptr; }
#ifdef OPERAND_LVALUE_REF_SUPPORT
  Operand* operator ()(Operand const& x) const
    { return nullptr; }
#endif

  // Define calling a function with two arguments...
  Operand* operator ()(Operand&& x, Operand&& y) const
    { x.value += y.value; return std::move(x); }
#ifdef OPERAND_LVALUE_REF_SUPPORT
  Operand* operator ()(Operand const& x, Operand const& y) const
  {
    Operand retval(x);
    retval.value += y.value;
    return nullptr;
  }
#endif

  // Define calling a function with three arguments...
  Operand* operator ()(Operand&& x, Operand&& y,
    Operand&& z) const
    { x.value += y.value + z.value; return nullptr; }
#ifdef OPERAND_LVALUE_REF_SUPPORT
  Operand* operator ()(Operand const& x, Operand const& y,
    Operand const& z) const
  {
    Operand retval(x);
    retval.value += y.value + z.value;
    return nullptr;
  }
#endif
};

//====================================================================

// Define all testing macros...

#define APPLY_TEST0(i) \
  std::cout << #i " = " << apply(i) << "." << std::endl

#define APPLY_TEST1(i,a) \
  std::cout << #i "(" #a ") = " << apply(i,a) << "." << std::endl

#define APPLY_TEST2(i,a,b) \
  std::cout << #i "(" #a "," #b ") = " << apply(i,a,b) \
  << "." << std::endl

#define APPLY_TEST3(i,a,b,c) \
  std::cout << #i "(" #a "," #b "," #c ") = " \
    << apply(i,a,b,c) << "." << std::endl


#define APPLY_TUPLE_TEST0(i) \
  std::cout << #i " = " << apply_tuple(i,std::tuple<>()) \
  << "." << std::endl

#define APPLY_TUPLE_TEST1(i,a) \
  std::cout << #i "(" #a ") = " \
    << apply_tuple(i,std::make_tuple(a)) << "." << std::endl

#define APPLY_TUPLE_TEST2(i,a,b) \
  std::cout << #i "(" #a "," #b ") = " \
    << apply_tuple(i,std::make_tuple(a,b)) << "." << std::endl

#define APPLY_TUPLE_TEST3(i,a,b,c) \
  std::cout << #i "(" #a "," #b "," #c ") = " \
    << apply_tuple(i,std::make_tuple(a,b,c)) << "." << std::endl


#define OUTPUT(type) \
  std::cout \
    << #type " LOG ============\n" \
    << tracker<type>::output_log \
    << #type " HIST ===========\n" \
    << tracker<type>::output_hist \
    << #type " END ============\n" \
  ;


#define CLEAR(type) tracker<type>::clear();

#define APPLY_FULL0(type) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TEST0(type()); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_FULL1(type,a) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TEST1(type(),Operand(a)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_FULL2(type,a,b) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TEST2(type(),Operand(a),Operand(b)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_FULL3(type,a,b,c) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TEST3(type(),Operand(a),Operand(b),Operand(c)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_TUPLE_FULL0(type) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TUPLE_TEST0(type()); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_TUPLE_FULL1(type,a) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TUPLE_TEST1(type(),Operand(a)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_TUPLE_FULL2(type,a,b) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TUPLE_TEST2(type(),Operand(a),Operand(b)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_TUPLE_FULL3(type,a,b,c) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TUPLE_TEST3(type(),Operand(a),Operand(b),Operand(c)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


//====================================================================

#endif // #ifndef op_operand_hxx_


Running The Tests

The C++ code to run tests on my code is:

#include <cstddef>

//====================================================================

template <std::size_t...>
struct indices;

template <std::size_t N, typename Indices, typename... Types>
struct make_indices_impl;

template <
  std::size_t N,
  std::size_t... Indices,
  typename Type, typename... Types
>
struct make_indices_impl<N, indices<Indices...>, Type, Types...>
{
  typedef
    typename make_indices_impl<
      N+1,
      indices<Indices...,N>,
      Types...>::type
    type
  ;
};

template <std::size_t N, std::size_t... Indices>
struct make_indices_impl<N, indices<Indices...>>
{
  typedef indices<Indices...> type;
};

template <std::size_t N, typename... Types>
struct make_indices
{
  typedef
    typename make_indices_impl<
      0,
      indices<>,
      Types...
    >::type
    type
  ;
};

//====================================================================

#include <tuple>
#include <functional>

//====================================================================

template <typename Op, typename... Args>
auto apply(Op&& op, Args&&... args)
  -> typename std::result_of<Op(Args...)>::type
{
  return op( std::forward<Args>(args)... );
}

//====================================================================

template <
  typename Indices
>
struct apply_tuple_impl;

template <
  template <std::size_t...> class I,
  std::size_t... Indices
>
struct apply_tuple_impl<I<Indices...>>
{
  template <
    typename Op,
    typename... OpArgs,
    template <typename...> class T = std::tuple
  >
  static auto apply_tuple(Op&& op, T<OpArgs...>&& t)
    -> typename std::result_of<Op(OpArgs...)>::type
  {
    return op( std::forward<OpArgs>(std::get<Indices>(t))... );
  }
};

template <
  typename Op,
  typename... OpArgs,
  typename Indices = typename make_indices<0, OpArgs...>::type,
  template <typename...> class T = std::tuple
>
auto apply_tuple(Op&& op, T<OpArgs...>&& t)
  -> typename std::result_of<Op(OpArgs...)>::type
{
  return apply_tuple_impl<Indices>::apply_tuple(
    std::forward<Op>(op),
    std::forward<T<OpArgs...>>(t)
  );
}

//====================================================================

#include "tracker.hxx"
#include "op_operand.hxx"

int main()
{
  // TEST apply()...
  APPLY_FULL0(Op);
  APPLY_FULL1(Op, -3);
  APPLY_FULL2(Op, -3, -10);
  APPLY_FULL3(Op, -3, -10, -100);

  std::cout
    << "====================================="
    << std::endl
    << std::endl
  ;

  // TEST apply_tuple()...
  APPLY_TUPLE_FULL0(Op);
  APPLY_TUPLE_FULL1(Op, -3);
  APPLY_TUPLE_FULL2(Op, -3, -10);
  APPLY_TUPLE_FULL3(Op, -3, -10, -100);

  return 0;
}

//====================================================================

and the C++ code to run tests on BCW's code is:

#include <cstddef>
#include <tuple>
#include <functional>

//====================================================================

// BCW's apply() renamed to apply_tuple()...

template <typename Func, typename... TupleTypes,
          typename... UnwrappedParams>
inline typename std::result_of<Func(TupleTypes...)>::type
  apply_tuple(Func f, std::tuple<TupleTypes...> ts,
      UnwrappedParams... params)
{
    return apply_tuple(f, ts, params...,
        std::get<sizeof...(UnwrappedParams)>(ts));
}

template <typename Func, typename... TupleTypes>
inline typename std::result_of<Func(TupleTypes...)>::type
  apply_tuple(Func f, std::tuple<TupleTypes...> ts,
      TupleTypes... params)
{
    return f(params...);
}

template <typename Func>
inline auto apply_tuple(Func f, std::tuple<> ts) -> decltype(f())
{
    return f();
}

//====================================================================

#include "tracker.hxx"

// BCW's code requires Op::operator ()() to support lvalues to compile.
// Thus, it is necessary to define the next line to have BCW's code
// to compile...
#define OPERAND_LVALUE_REF_SUPPORT
#include "op_operand.hxx"

int main()
{
  // TEST apply_tuple()...
  APPLY_TUPLE_FULL0(Op);
  APPLY_TUPLE_FULL1(Op, -3);
  APPLY_TUPLE_FULL2(Op, -3, -10);
  APPLY_TUPLE_FULL3(Op, -3, -10, -100);

  return 0;
}

//====================================================================


Closing Comments

Success has been achieved and tested! The apply functions were each tested to see whether or not they would perform optimally using C++11's rvalue references and ability to perfectly forward function arguments. Also demonstrated was how recursively defined code can have very significant overheads but such can be elided if code is written to exploit perfect argument forwarding with only a small amount of C++ template metaprogramming code. Once done, one can use such codes without being concerned about their overheads!

Another important lesson for myself and everyone that reads this post is just how important testing code for efficiency concerns –not just correctness– is before adding such code to a real program or into a library that is used in real programs. Ideally code once tested, debugged, etc. can be used as a blackbox with its documented guarantees. For this ideal to be reached, the properties and claims made in the documentation about the code need to be confirmed through testing –not just code study.

Should any any errors or omissions be found within this post, kindly advise me of such!

Happy Coding!

Applying std::tuple To Functors Efficiently

Overview

The new C++11 standard nicely adds tuples (i.e., std::tuple found in <tuple>) and while it is easy to invoke std::make_tuple and pass a number of arguments to create a new std::tuple there does not appear to be any function that efficiently does the reverse (i.e., apply all tuple elements efficiently to a function or a functor). This post details how to do such.

Update (Dec. 18, 2011): I have added some comments concerning the use of std::move in some return statements. When returning values from a function, one should simply write return somevalue; to enable the compiler to choose between moving and return value optimization (RVO). Because this article focuses on discussing move operations, I did not write the code to exploit RVO –it explicitly moves everythng– in order to avoid any confusion, but, please be aware that normally one would not use std::move on a variable being returned in a return statement.

Article Source Code: See the next article.


Using std::make_tuple

Using std::make_tuple to construct a tuple is very straight-forward:

#include <string>
#include <tuple>

int main()
{
  auto x = std::make_tuple(1, 4.3, std::string{"Hello!"});
}

would induce the variable x to have the type:

std::tuple<int, double, std::string>

The std::make_tuple function is a convenience function that allows one to avoid having to specify both the types and the values for a specific tuple. For example, the above code could have been written:

#include <string>
#include <tuple>

int main()
{
  std::tuple<int,double,std::string> x{1, 4.3, std::string{"Hello!"}};
}

It is also particularly handy when one needs to create and pass a tuple to a function as a temporary variable. By simply invoking std::make_tuple in a way that ensures its returned tuple is passed to the function, avoids having to formally declare tuple variables.

Accessing std::tuple's Elements

Given some std::tuple instance, t, the C++11 standard allows us to retrieve a specific element of a std::tuple by using get<N>(t) where N is the zero-based index of the desired element. It is also possible to retrieve the number of elements as well as their types, but, these will not be used in the solution presented in this article.

Requirements Of The Solution

The final solution should concern using a function called apply which should take these parameters:

  • the first argument must be a function (pointer) or a functor (i.e., a struct/class with an overloaded function call operator), and,
  • the second parameter must be a tuple whose sequence of types and values are the same as the first argument's function/functor arguments.

so when apply is called all of the std::tuple's elements are efficiently passed to the function/functor. For example this code's cout lines:

struct add1
{
  int operator ()(int x, int y) const
  {
    return x + y;
  }
};

int add2(int x, int y)
{
  return x + y;
}

// ...

std::cout << apply(add1(), std::make_tuple(3,4)) << std::endl;
std::cout << apply(add2, std::make_tuple(8,2)) << std::endl;

are equivalent to:

std::cout << add1()(3,4) << std::endl;
std::cout << add2(8,2) << std::endl;

and would output:

$ ./a.out
7
10
$

Before considering how to manipulate std::tuple's one should consider how to write the code that does the same without std::tuples:

std::cout << apply(add1(), 3, 4) << std::endl;
std::cout << apply(add2, 8, 2) << std::endl;

Ideally such should be a simpler version of a solution that works with std::tuple and, in this instance, this is what occurs.

The other requirement of the solution is maximum possible efficiency. To achieve this will require careful attention to ensure that no unnecessary copying or moving of any data is done. Ideally, the compiler should be able to use the get function to retrieve all tuple elements and place them in the function arguments all at compile time.

NOTE: This means that std::bind and std::function should not be used in the solution.

Rvalue References And Perfect Forwarding

What Is An Rvalue?

To meet the above requirements two new C++11 features will need to be exploited: r-value references and perfect forwarding of function arguments. Prior to C++11 only these special operations, with or without using templates, could be defined with user-defined types:

  • the ability to create a default-initialized object by defining a constructor that takes no arguments,
  • the ability to create a copy of another object of the same type by defining a constructor that has a single argument that receives the other object,
  • the ability to define how one object is assigned to another (of the same type) by defining a copy assignment operator, and,
  • the ability to define how an object is destroyed by writing a destructor.

In addition to the above C++11 adds the following:

  • the ability to move the contents of an object that is an rvalue into a newly created object because the former object will no longer be used (e.g., it will be destroyed), and,
  • the ability to move the contents of an object that is an rvalue into an existing object via assignment because the former object will no longer be used (e.g., it will be destroyed).

The first ability is called the move constructor and the second is called move assignment. Both operations take rvalue references for their arguments. Know that rvalue references are references and cannot be const. Since they are references, the following code is erroneous:

struct Example2 { };

Example2&& test()
{
  Example2 e;
  // ...
  return e;
}

It is illegal for the same reasons that this code is erroneous:

struct Example2 { };

Example2& test()
{
  Example2 e;
  // ...
  return e;
}

and that this code is erroneous:

struct Example3 { };

Example3* test()
{
  Example3 e;
  // ...
  return &e;
}

as all three return the address (implicitly for Example1 and Example2 and explicitly for Example3) of a local variable that will be destroyed once the test() returns.

What exactly is an rvalue? The name rvalue means right-hand-side-value and stands in constrast to an lvalue (which means left-hand-side-value). The usage of right and left refers to where it is allowed relative to an assignment operator: an rvalue can only appear on the right-hand side whereas an lvalue can appear on the left or right hand sides.

In terms of C++11 coding what is an rvalue and how does it differ from an lvalue? Loosely, an rvalue is:

  • a compiler-created temporary variable, or
  • any unnamed variable.

Hence an rvalue is a value that one cannot explicitly obtain its address via the address-of operator as there is no name to refer to it. (Because of nuances, the C++11 standard redefines lvalues and rvalues more precisely than this using a taxonomy. This level of detail will not be discussed here.)

Once a variable somehow acquires a name then it can be explicitly referenced and manipulated as an lvalue and so C++ no longer treats as an rvalue (with some caveats for explicitly named rvalue-reference variables) unless std::move(variable_name) is invoked on it.

Warning: Once std::move(variable_name) is invoked on a variable and its owned contents has been moved, that variable should only used as as a "zeroed-out" or "empty" object (whatever that means for that type). Effectively, when you use std::move() know that the ownership of that variable's state may be transferred (i.e., moved) to another variable since std::move() makes that variable become an rvalue.

One of the requirements of this article is to be able to have maximum efficiency: one does not want to copy the contents of a tuple into function arguments. In general, for proper program semantics with arbitrary functions copies cannot be made. Consider:

double test(int& i, double const& y)
{
  double retval = y * 3.141592;
  i = retval;
  return retval;
}

where i is passed by reference. If a copy of a variable was passed as an argument then the original variable wouldn't be modified. Clearly, one needs a way to perfectly forward the types in a tuple into each function argument in a compatible way that preserves the semantics of the program code. Prior to C++11 this was not possible. In C++11, it is provided rvalue references with their associated semantics and templates are used.

Lvalue and Rvalue References

Rvalue references are defined similarly to C++98 references. C++98 references are lvalues that are declared using a single ampersand and must be assigned to an lvalue reference at that point (with one exception). The ampersand is used because it is the address-of operator and a C++98 reference implicitly takes the address-of of the expression it is initialized with. For example, this C/C++ code:

int i = 4;
int * const ref = &i;
// ...
*ref = 46; // i will now be set to 46

is semantically equivalent (in terms of its underlying implementation) to this C++98 code:

int i = 4;
int& ref = i;
// ...
ref = 46; // i will now be set to 46

i.e., any C++98 reference is equivalent (in terms of its underlying implementation) to a constant pointer to whatever its type is. This implies that once a reference's (address) value is set, it cannot be changed –as it is a constant pointer internally.

C++98 references allowed the ugliness of C's pointers to disappear as it allowed one to implicitly pass variables by reference:

void set(int &i, int value)
{
  i = value;
}
// ...
int myvar;
set(myvar, 9);

instead of writing this C equivalent:

void set(int *i, int value)
{
  *i = value;
}
// ...
int myvar;
set(&myvar, 9);

which also required a lot more attention to always correctly using pointers, etc. to avoid misuing them –the likely result of which was a program crash. Thus C++98 references allowed the programmer to:

  • avoid having to explicitly use the address-of operator for any function call arguments that were to be implicitly passed by reference, and,
  • avoid having to explicitly dereference pointer variables in functions being passed as references –as they are typically only manipulated as values within those functions.

Thus in C++98 it is no longer a burden to efficiently and easily use pass-by-reference. No doubt this encourages programmers to use pass-by-reference when it is more efficient and appropriate to do so.

C++98 references are lvalues and can only refer to rvalues with one exception: constant references referring to compiler-created temporary variables (e.g., passed as function arguments). Consider converting the value function parameter in the above example to be a constant lvalue reference:

void set(int &i, int const&value)
{
  i = value;
}
// ...
int myvar;
set(myvar, 9); // NOTE: 9 must become a temporary --an rvalue!!

Perhaps to one's horror the above code correctly compiles and runs! For the compiler to call set(myvar, 9) it needs to create a temporary variable of type int so its address can be taken and passed to the function! The issue here is the temporary variable is an rvalue –not an lvalue. The C++98 ISO committee reviewed this specific issue carefully and resolved that since the reference refers to constant data that such should be legal. Why? In part: it is constant, i.e., its value won't be changing, and, casting away const is bad and can easily have implementation-defined results. Thus, this "violation" is a non-issue. The result of permitting such:

  • it allows more efficient variable passing to functions,
  • it enables compiler to better optimize code, and,
  • it completely removes the burden of writing overloaded versions of functions just so that literals and other rvalues can be passed to arguments taking constant references.

So all was well and good: plenty of cleaner and efficient C++ code was subsequently written and used successfully. However, being a language of absolute efficiency and robustness with high- and low-level styles of coding, there were still problems that caused programmers to twist their code to be efficient. Consider this example code that critics of object-oriented programming (OOP) often cite as an argument against OOP:

class Matrix
{
  public:
    // NOTE: All code below assumes that nrows_ and ncols_
    //       is hard-coded to be 1000 by 1000 for simplicity
    //       here since how to properly write a general matrix
    //       class is not the purpose of this blog post! :-)

    Matrix()
      : nrows_(1000), ncols_(1000), data_(nrows_*ncols_) { }

    // ...

    // Permit access to the sizes...
    std::size_t nrows() const;
    std::size_t ncols() const;

    // Permit access using the function call operator...
    double& operator ()(std::size_t row, std::size_t col);
    double const& operator ()(std::size_t row, std::size_t col) const;

  private:
    std::size_t const nrows_;   // Hard-coded to be 1000
    std::size_t const ncols_;   // Hard-coded to be 1000
    std::vector<double> data_;
};

// Define addition...
Matrix& operator +=(Matrix& a, Matrix const& b)
{
  for (std::size_t i=0; i != b.nrows(); ++i)
    for (std::size_t j=0; j != b.ncols(); ++j)
      a(i,j) += b(i,j);   // No copies: just += each element.

  return a;   // Return a by reference as it was passed!
}

Matrix operator +(Matrix const& a, Matrix const& b)
{
  Matrix retval(a);   // Copy a into retval
  retval += b;        // Add b to retval
  return retval;      // Return retval by value
}

// ...
Matrix a, b, c;
// ...
Matrix d = a + b + c; // Can be very inefficient!

i.e., given three matrices add them (non-destructively) together and place their result in a new matrix. The addition code be written written using two for loops in a procedural manner (assuming the memory was previously allocated, all four matrices are the same size, and we are not trying to optimize anything else):

std::size_t const nrows = a.nrows();
std::size_t ncols = a.ncols();

for (std::size_t i=0; i != nrows; ++i)
  for (std::size_t j=0; j != ncols; ++j)
    d(i,j) = a(i,j) + b(i,j) + c(i,j);

Unfortunately, in OOP languages (except in C++ when using a library that optimizes such using the expression templates technique) this is not the code that is generated by the compiler. Instead the compiler generates code based on the operations being applied to the objects reflect in the expression:

Matrix d = a + b + c;

which does the following (assuming the expression templates code is not written induce optimization):

  1. First a + b is evaluated. This requires calling operator +(a,b) and returns a compiler-created temporary Matrix value. Let's call this value t1.
  2. Now t1 + c is evaluated. This also requires calling operator +(t1,b) again and it too returns a compiler-created temporary Matrix value. Let's call this value t2.
  3. Having evaluated the right-hand size of the initialization (as this is not the assignment operator!) the constructor will invoke the copy constructor for d passing in t2 so its value can be copied.
  4. Finally the compiler destroys all temporary values: t1 and t2.

Clearly the generated code, although it will produce the correct answer, allocates more RAM and will take more time to execute (as there will be more cache misses than hits especially with large matrices). Clearly this is unacceptable. On the other hand, one does want the code to execute according to the functions defined for the classes being used. What should one do? The expression templates technique is a form of C++ template metaprogramming that is considered to be advanced, is there an easier way? The short answer is with C++11, "Yes!" Before doing that, with C++98, many programmers would write:

Matrix d = a;
d += b;
d += c;

instead of Matrix d = a * b * c;. The former is more efficient, the latter is easier to read and understand –not to mention commonly written in actual code. Ideally a programming language should not force the programmer to make such trade-offs.

Important: Using rvalue references does not eliminate the need for the expression templates technique –there are still many things that won't be optimized. The expression templates technique allows one to write code that recognizes how types are used and transform them (to presumably equivalent, more efficient forms). With matrices, such can be used to determine at compile-time an optimal way to perform matrix multiplication (in expressions with varying matrix sizes) such as:

Matrix d = a * b * c;

Moving Stuff Around

In the above example the temporary variables, t1 and t2, were rvalues and the compiler should have somehow reused t1 instead of also making t2. But, the compiler is not human and does not know the programmer's intention or anything about matrices! This means the compiler cannot simply reuse t1 and call operator +=(t1,c) instead: it was told to only call operator +!!.

So how can rvalue references help with expressions like Matrix d = a + b + c;? Through the definition of move constructors and move assignment operators, they allow the programmer to implicitly tell the compiler how to efficient move the contents/state of one rvalue variable to another. This is particularly handy since the compiler actively does create temporary variables when evaluating expressions that the programmer otherwise has no control over! Re-examining Matrix d = a + b + c;:

  1. First the compiler must make a temporary to perform a + b. Let's again call this temporary t1. So there are no changes here.
  2. Next the compiler will perform t1 + c this too will return a copy. Hey, there are no improvements!! Let's stop.

Yes, there are no improvements. To get them, the Matrix class (and the types it internally uses) need to define move constructors and assignment operators. Nicely a C++11 compiler will define default ones –which are of benefit if the underlying types define such (e.g., Matrix' use of std::vector). So, I lied: there were improvements, but, unless one understands the move constructor and move assignment operator, it is hard-to-comprehend what benefits are obtained through the use of rvalues.

Consider this simple array-like class:

#include <algorithm>

class MyArray
{
  public:
    // Default constructor...
    MyArray()
      : size_(s), data_(nullptr)
    {
    }

    // Allow one to specify a size...
    MyArray(std::size_t s)
      : size_(s), data_(new int[s])
    {
    }

    // Define copy constructor...
    // Notice that it accepts an lvalue reference to a constant
    //   of the same type as this class (i.e., MyArray).
    MyArray(MyArray const& a)
      : size_(a.size_), data_(new int[a.size_])
    {
      std::copy(a.data_, a.data_+a.size_, data_);
    }

    // Define move constructor...
    // Notice that it accepts an rvalue reference (&&) to
    //   an instance of the same type as this class (i.e., MyArray).
    // Notice that the data_ in a is simply assigned to this object
    //   and then a.data_ is set to be null. Why? The data is being
    //   MOVED from a to this. a will need to still be properly
    //   destroyed so a.data_ needs to be set to null.
    MyArray(MyArray&& a)
      : size_(a.size_), data_(a.data_)
    {
      a.size = 0;           // Reset size
      a.data_ = nullptr;    // Reset pointer
    }

    // Define copy assignment operator...
    // Notice that exception-safe code is presented here.
    // Notice that making a copy is necessary.
    MyArray& operator =(MyArray const& a)
    {
      MyArray temp(a);               // Make copy of a
      std::swap(size_, temp.size_);  // Swap sizes
      std::swap(data_, temp.data_);  // Swap data pointers

      // Destroy temp as *this is now a copy of a! :-)
      return *this;
    }

    // Define move assignment operator...
    // One could write this without temp to be slightly
    // more efficient. This code reuses the above definitions
    // and keeps the actual movement code in the move
    // constructor only.
    MyArray&& operator =(MyArray&& a)
    {
      MyArray temp(std::move(a));    // Move a into temp
      std::swap(size_, temp.size_);  // Swap sizes
      std::swap(data_, temp.data_);  // Swap data pointers
      return *this;
    }

    // ...

  private:
    std::size_t size_;
    int* data_;
};

Perhaps it is much easier to see that moving is actually moving the data from one object to another! It can only be done with rvalues as rvalues are not to be the receivers of assignment operations: they are intended to only be used as values even though we can also acquire their addresses as they are located somewhere in the computer's RAM.

Notice that std::move(a) was used. Isn't a declared to be an rvalue reference? Yes it is, but, once it acquires a name it has to be treated as an lvalue. The std::move function is C++11's way for a programmer to explicitly make a lvalue an rvalue. Once this is done the novice programmer should not use the variable whose content was moved –as its state has been reset so that its destructor can be executed without error and a minimum of overhead.

Note: The purpose of a named rvalue reference declaration is to ensure that it can only be initialized with an rvalue. It does not mean it is itself an rvalue –although it refers to one!

Let's re-examine:

Matrix d = a + b + c;

which clearly computes a rvalued result which is used to initialize d. Provided the std::vector class has a move constructor (it does) and one is in Matrix then the compiler need not use the copy constructor: it can use the move constructor instead. This implies the data in t2 will be moved/transferred to d directly –not copied into d. This implies one fewer Matrix has to be dynamically allocated in the expression. Awesome!

Concerning the expression:

a + b + c;

all we've told the compiler is how to handle this function:

Matrix operator +(Matrix const& a, Matrix const& b);

which states given two Matrix instances return a new one. Returning any value by value from a function implies that its value can be treated as an rvalue. Thus, instead of making a copy of the Matrix returned, copying the contents over, and then destroying the local variable, the compiler is free to simply move it from the local variable to another one that is available –since the local variable will be destroyed.

One can do even better by telling the compiler how to add values that are rvalues. Only looking at operator + and making simple changes:

Matrix operator +(Matrix const& a, Matrix const& b); // as before

// [2]
inline Matrix operator +(Matrix&& a, Matrix const& b)
{
  Matrix retval(std::move(a));   // Move a into retval
  retval += b;                   // Add b to retval
  return retval;                 // Returns as an rvalue
  // Some may prefer: return std::move(a += b);

  // In general don't use std::move() in a return statement
  // that returns a variable. Simply write: return retval;
  // This will allow compilers to choose between RVO or moving.
}

// [3]
inline Matrix operator +(Matrix const& a, Matrix&& b)
{
  // This works since matrix addition is associative...
  return operator +(std::move(b), a); // call [2] above
}

// [4]
inline Matrix operator +(Matrix&& a, Matrix&& b)
{
  return operator +(std::move(a), b); // call [2] above
}

With these defined when rvalues are being added the cost of creating a new temporary matrix state (i.e., the RAM to hold the 1000 by 1000 matrix!) is now completely avoided. The compiler does still create the temporary variables and needs to (as they are allocated on the stack), but, the overheads have been sliced to a bare minimum.

Revisiting what happens when Matrix d = a + b + c; is run with the above additions and changes (assuming std::vector behaves like MyArray: i.e., moves null out the internal pointers):

  1. First a + b is calculated and a full temporary matrix is returned as temporary t1. No change here.
  2. Now t1 + c is performed. Since t1 is an rvalue as it is a compiler-created temporary, it invokes:
    Matrix operator +(Matrix&& a, Matrix const& b)

    which avoids creating a new temporary with its own separate storage. Instead it creates a new temporary, t2, by move constructing it from the stored result in retval which was moved from t1 (given the above code). The savings here are huge: no unnecessary additional RAM was allocated on the heap!

  3. Finally, t2 is moved into d's constructor without allocating any additional RAM on the heap!

So, if you're counting temporaries one might argue that there is not much improvement. But if one is counting the amount of RAM allocated and copying there are major improvements. Essentially, only one Matrix is dynamically allocated in RAM (to hold the initial temporary) and that winds up ultimately being moved into d.

Note: C++ can elide constructors and perform other operations to further optimize the above code. The programmer can do additional things too. If one will not be using the a, b, and c variables again, you can help the compiler out by having all variables become rvalues (which would move everything –no copies):

Matrix d = std::move(a) + std::move(b) + std::move(c);

Perfect Forwarding

If the type of a function parameter is an rvalue reference to a template type parameter for a function template then the type parameter is deduced to be: an lvalue reference if an lvalue is passed, or, a value type (which is an rvalue and, thus, can be treated as an rvalue reference by the compiler) assuming the overall call is valid.

Important: Recall that only with function templates, the compiler will deduce the (template) types of its arguments. This specific type of deduction mentioned here only works for function templates when an rvalue reference to a type that is a template parameter is used. If the argument is an rvalue to a non-template type this specific deduction will not occur.

Consider:

struct Silly { };

template <typename T>
void func(T&& t);

int main()
{
  Silly s;
  func(s);             //    Lvalue. Argument type: Silly&
  func(Silly());       // Temporary. Argument type: Silly&&
  func(std::move(s));  //  Rvalue forced. Arg type: Silly&&
}

By using this deductive property, C++ can preserve the argument types while forwarding them to another function. To do this, one needs to use std::forward<arg_type>(arg_variable_name) which takes care of lvalue/rvalue issues when doing such. All that is needed for one to exploit such is to simply use it.

Enough sidebar issues! Let's figure out how to efficiently pass a std::tuple to regular functions and then functors!


Writing apply() To Not Use std::tuple

Using variadic templates solving this problem is straight-forward:

#include <tuple>
#include <functional>
  // NOTE: g++ v4.5 needs <functional> for std::result_of
  //       g++ v4.7 does not

template <typename Op, typename... Args>
auto apply(Op&& op, Args&&... args)
  -> typename std::result_of<Op(Args...)>::type
{
  return op(args...);
}

It is straight-forward for this reasson: every argument from the second passed to apply is in the Args parameter pack in the order and form it needs to be. This allowed the arguments to be expanded in the return statement as-is. The only remaining item to address here is to use std::forward on those arguments:

#include <tuple>
#include <functional>
  // NOTE: g++ v4.5 needs <functional> for std::result_of
  //       g++ v4.7 does not

template <typename Op, typename... Args>
auto apply(Op&& op, Args&&... args)
  -> typename std::result_of<Op(Args...)>::type
{
  return op( std::forward<Args>(args)... );
}

to ensure perfect forwarding.


Writing apply_tuple() To Use std::tuple

A Problem

Unfortunately, the above solution cannot be trivially applied to std::tuple. When template parameter packs are expanded they don't have the ability to perform the arithmetic (e.g., N+1) needed to unroll the std::tuple elements for each function argument. The latter is needed since the std::get template function requires each index as a compile-time constant template argument, if parameter pack expansion is to be used then a way must be found to increment the value passed to std::get as each tuple element is extracted.

Although parameter pack expansions cannot do arithmetic, one can pre-compute the indices it will need. Why is this useful? If parameter pack expansions are done on expressions involving parameter pack types and variables, then they will all be expanded together as a sequence. For example, given the tuple:

std::tuple{1,2.2,'a'}

whose type is std::tuple<int,double,char> and this:

template <std::size_t...>
struct indices;
// ...
indices<0,1,2>

then it will be possible to expand both (i.e., indices and std::tuple types's template arguments, i.e., the type of each element in the tuple) at the same time using parameter pack expansion. Hopefully by solving this problem a solution similar to the non-tuple version will be possible!

Generating A List Of Indices

By defining a compile-time only container to hold the list of indices:

template <std::size_t...>
struct indices;

struct templates make_indices, make_indices, and make_indices_impl can be defined. What should they contain? For each type, a corresponding index needs to be generated that is zero-based. This implies that when processing each tuple element type from left to right the next highest index needs to be appended to the indices instance's template argument std::size_t constants. This will need to be defined recursively using make_indices as an easy-to-use type to start the indices' generation and hold the result in a typedef called type:

#include <cstddef> // for std::size_t

// Define holder for indices...
template <std::size_t...>
struct indices;

// Define adding the Nth index...
// Notice one argument, Type, is discarded each recursive call.
// Notice N is added to the end of Indices... .
// Notice N is incremented with the recursion.
template <std::size_t N, typename Indices, typename... Types>
struct make_indices_impl;

template <
  std::size_t N,
  std::size_t... Indices,
  typename Type,
  typename... Types
>
struct make_indices_impl<N, indices<Indices...>, Type, Types...>
{
  typedef
    typename make_indices_impl<
      N+1,
      indices<Indices...,N>,
      Types...
    >::type
    type
  ;
};

// Define adding the last index...
// Notice no Type or Types... are left.
// Notice the full solution is emitted into the container.
template <std::size_t N, std::size_t... Indices>
struct make_indices_impl<N, indices<Indices...>>
{
  typedef indices<Indices...> type;
};

// Compute the indices starting from zero...
// Notice indices container is empty.
// Notice Types... will be all of the tuple element types.
// Notice this refers to the full solution (i.e., via ::type).
template <std::size_t N, typename... Types>
struct make_indices
{
  typedef
    typename make_indices_impl<0, indices<>, Types...>::type
    type
  ;
};

Define apply_tuple To Work With std::tuple

Finally apply can be written but we need a helper function. Start by writing apply to do everything it can do:

template <
  typename Op,
  typename... OpArgs,
  typename Indices = typename make_indices<0, OpArgs...>::type,
  template <typename...> class T = std::tuple
>
auto apply_tuple(Op&& op, T<OpArgs...>&& t)
  -> typename std::result_of<Op(OpArgs...)>::type
{
  return apply_tuple_impl<Indices>::apply_tuple(
    std::forward<Op>(op),
    std::forward<T<OpArgs...>>(t)
  );
}

Unfortunately, Indices is not a parameter pack and the indices inside of Indices and OpArgs must both be parameter packs so that they can be expanded together:

[snip]
{
  return op( std::forward<OpArgs>(std::get<Indices>(t))... );
}

It is straight-forward to use partial template specialization with a helper class to lift the std::size_t… parameter pack out of its container and this is why apply_tuple_impl has been used above. Notice that all function arguments are simply forwarded. The "prototype" case for apply_tuple_impl must be mentioned before using the partial template specialization technique:

template <typename Indices>
struct apply_tuple_impl;

Now Indices can be lifted out of the template template parameter I<Indices...> which will be inferred to be indices –the container type:

template <
  template <std::size_t...> class I,
  std::size_t... Indices
>
struct apply_tuple_impl<I<Indices...>>
{
  template <
    typename Op,
    typename... OpArgs,
    template <typename...> class T = std::tuple
  >
  static auto apply_tuple(Op&& op, T<OpArgs...>&& t)
    -> typename std::result_of<Op(OpArgs...)>::type
  {
    return op( std::forward<OpArgs>(std::get<Indices>(t))... );
  }
};

which is the desired solution.


Testing The Solution

The following code will nicely (partially) test the solution:

//=================================================================

#include <cstddef>
#include <tuple>
#include <functional> // Only if needed

// Definition of indices
// Definition of make_indices_impl
// Definition of make_indices
// Definition of apply
// Definition of apply_tuple_impl
// Definition of apply_tuple

//=================================================================

#include <iostream>

struct add { int operator ()(int x, int y) { return x + y; } };
int add2(int x, int y) { return x + y; }

struct A1 { int operator ()() { return -1; } };
int A2() { return -2; }

struct B1 { int operator ()(int x) const { return x; } };
int B2(int x) { return x; }

int main()
{
  using namespace std;

  // Test apply() with no arguments...
  cout
    << "A1()() = "
    << apply(A1())
    << "."
    << endl
  ;
  cout
    << "A2() = "
    << apply(A2)
    << "."
    << endl
  ;

  // Test apply() with one argument...
  cout
    << "B1()(-3) = "
    << apply(B1(),-3)
    << "."
    << endl
  ;
  cout
    << "B2(-4) = "
    << apply(B2,-4)
    << "."
    << endl
  ;

/*
  // Define C1 and C2 elsewhere for two arguments and
  // test apply() with such...
  cout
    << "C1()(-3,-10) = "
    << apply(C1(),-3,-10)
    << "."
    << endl
  ;
  cout
    << "C2(-4,-10) = "
    << apply(C2,-4,-10)
    << "."
    << endl
  ;

  // etc.
*/



  // Test apply_tuple() with no arguments...
  cout
    << "A1()() = "
    << apply_tuple(A1(),std::tuple<>())
    << "."
    << endl
  ;
  cout
    << "A2() = "
    << apply_tuple(A2,std::tuple<>())
    << "."
    << endl
  ;

  // Test apply_tuple() with one argument...
  cout
    << "B1()(-3) = "
    << apply_tuple(B1(),make_tuple(-3))
    << "."
    << endl
  ;
  cout << "B2(-3) = "
    << apply_tuple(B2,make_tuple(-4))
    << "."
    << endl
  ;

/*
  // Define C1 and C2 elsewhere for two arguments and
  // test apply_tuple() with such...
  cout
    << "C1()(-3,-10) = "
    << apply_tuple(C1(),make_tuple(-3,-10))
    << "."
    << endl
  ;
  cout
    << "C2(-4,-10) = "
    << apply_tuple(C2,make_tuple(-4,-10))
    << "."
    << endl
  ;

  // etc.
*/


  return 0;
}

It is wise to use different names for apply and apply_tuple so that tuples themselves can be directly applied to functions/functors.


Closing Comments

As far as I can tell the C++11 standard did not include a make_indices utility. Fortunately, it is not hard to define such to provide a straight-forward way to extract tuple values while performing a parameter pack expansion. That said, the solution is not obvious given the norms of coding and therefore can be hard to figure out. For those of you thinking, "There just has to be a way," I wrote this post! ๐Ÿ™‚

Should you notice any errors or issues, kindly let me know.

Installing Gentoo On An Encrypted Root Partition

Overview

There are times when one needs to be able to know that if a computer's hard drive falls into the wrong hands that the information on it cannot be harvested. This is increasingly needed for laptops, desktops, and servers whose hard drives may be storing required-to-be confidential material. Of course, it need not only be such, but, one may wish to have the peace-of-mind that his/her contact information and other personal data is also not accessible (e.g., should their laptop be stolen). The purpose of this post is to detail how configure the Gentoo Linux operating system so that it boots from an encrypted root partition.

DISCLAIMER: Use of this post in any manner is completely and totally at your own risk. You are responsible for what you do with your own equipment/installations. Always read all documentation before using any commands.

Installation Media

In order to install the operating system, one first needs to be able to:

  • boot from a CD/DVD capable of installing Gentoo,
  • be able to partition the hard drive using software on the CD/DVD,
  • be able to run cryptsetup from the CD in order to format the hard drive with encryption,
  • be able to run LVM (Logical Volume Management) software from the CD/DVD, and,
  • be able to format the logical volumes created so the operating system can be installed.

Unlike a normal operating system installation, one cannot reboot the computer and boot into the operating system until enough software is installed so the operating system can actually mount the root partition. Further until such is made to work the only recourse one has is to have reliable boot CDs/DVDs with the required software programs needed to mount partitions so any issues can be fixed.

IMPORTANT: Ensure you have working bootable media on hand before you start. This is especially important if you need to stop and resume things later: make sure you have the necessary bootable media!

The simplest way to ease the burden of an installation is to use the latest Gentoo LiveDVD as it boots a fully working copy of Gentoo Linux with KDE, etc. It also has all of the tools, including cryptsetup, as well as the necessary kernel modules for encryption compiled into the Linux kernel. Without the proper kernel modules, it will not be possible to format the root partition with encryption.

NOTE: The minimal CD for Gentoo is very minimal and nearly everything has to be done manually. Unless you are used to this, I would not recommend using it. The LiveDVD, despite its large size, is much nicer as it configures the network (if it uses DHCP), has firmware for various network cards, can be used with KDE or simply as a text console, has all of the tools needed to encrypt, format, and mount hard drive paritions, etc. This post assumes the use of the LiveDVD. Briefly, the LiveDVD is nice, and convenient whereas the minimal CD is tedious.

Assumptions And Definitions

It is assumed herein:

  • the installation media is the Gentoo Linux LiveDVD (or equivalent),
  • the hard drive is to only have Linux installed on it,
  • there is only one hard drive,
  • Internet access is present, and,
  • a USB memory stick (or equivalent) is on-hand to hold encryption keys needed to boot the system.

You must have a USB memory stick! Do not store any keys on the hard drive.

The following variables are used below to represent a number of things that are system dependent:

Variable Definition
BOOT_DEV Device name of the (unencrypted) boot partition.
ENC_HDD A name given to the cryptsetup-created encrypted device made after calling luksOpen.
INST_HDD Installation hard drive's device name, e.g., /dev/sda. /dev/hdc, etc.
KEY_DEV USB memory stick device name.
KEY_FILE Full path to encryption key.
KEY_MOUNT Mounted directory path for $KEY_DEV.
ROOT_DEV Device name of the (encrypted) root partition.
ROOT_NAME The name of the root logical volume, e.g., root.
SWAP_NAME The name of the swap logical volume, e.g., swap.
VG_NAME The name of the volume group used with LVM, e.g., vg_hostos.

If You Get Stuck…

Since this post is very long, this helpful section is given before the start of any instructions so that it will be noticed and used if needed. If you get stuck after the encrypted partitions have been created and had to shutdown or reboot, then these are the steps to get back up-and-running:

  1. Boot the LiveDVD.
  2. Open a Konsole/Terminal window.
  3. If the network is not already set up, configure it.
  4. Insert and mount your USB key that holds the encryption key.
  5. Mount the encrypted partition so it can be used:
    cryptsetup --key-file $KEY_MOUNT/key.bin luksOpen $ROOT_DEV $ENC_HDD
  6. Activate the volumne group:
    vgchange -a y $VG_NAME
  7. Activate the swap partition:
    swapon /dev/$VG_NAME/$SWAP_NAME
  8. Run:
    mkdir -p /mnt/gentoo
  9. Mount the (encrypted) root partition:
    mount /dev/$VG_NAME/$ROOT_NAME /mnt/gentoo
  10. Run:
    mkdir -p /mnt/gentoo/boot
  11. Mount the (unencrypted) boot partition:
    mount $BOOT_DEV /mnt/gentoo/boot
  12. Copy resolv.conf over from the LiveDVD set up:
    cp -L /etc/resolv.conf /mnt/gentoo/etc/
  13. Mount /proc so it will be accessible from the chroot jail:
    mount -t proc none /mnt/gentoo/proc
  14. Mount /dev so it will be accessible from the chroot jail:
    mount --rbind /dev /mnt/gentoo/dev
  15. Enter a chroot jail on the encrypted drive:
    chroot /mnt/gentoo /bin/bash
  16. Use the environment from the hard drive:
    env-update
  17. Source /etc/profile:
    source /etc/profile
  18. Fix the prompt to indicate that the chroot jail is active:
    export PS1="(chroot) $PS1"

and then you can perform any required work. Obviously, if you did not create the swap partition or other details, some of the above commands given will not work.

Installation Steps

Partition The Hard Drive

After booting the LiveDVD, use any partition manager to partition the hard drive (e.g., fdisk, gparted, etc.). If you want to partition the drive with GPT then you must install it first:

emerge -av gptfdisk

Create two partitions:

Partition Notes
$BOOT_DEV FDISK type = 83, GPT type = 8300.
Minimum size = 16 MiB. Will remain unencrypted and needs to be large enough to hold contents of /boot.
$ROOT_DEV FDISK type = 7, GPT type = 0700.
Fill rest of hard drive.

Modern hard drives are faster if these partitions are aligned on 2048-byte (or megabyte in gparted) boundaries. This can be set in the expert menu in fdisk and gdisk.

Populate $ROOT_DEV With Random Data

This step is optional but one would likely want to have any previous hard drive content to appear no different than random data. To do this run the following command:

dd if=/dev/urandom of=$ROOT_DEV bs=2048

Create The Encryption Key

In order to encrypt the hard drive one needs to use one or more of:

  • a (long) passphrase, and/or,
  • a random file of binary data which can be encrypted with a passphrase using GPG later.

Ideally, one wants to have a very strong key and would normally choose to create a file with random data. The size of this file depends on the encryption method used. Herein we'll use the AES (i.e., the aes-xts-plain kernel module) with 512 bytes of random data. This key should be generated somewhere other than the installation hard drive (i.e., use the USB memory stick). One can do this with the following steps:

  • Insert and mount the USB memory stick.
  • Change the current directory to the USB memory stick.
  • Create a key file using /dev/random (which requires activity on the computer for entropy):
    head -c 512 /dev/random >keyfile.bin
  • Note that $KEY_FILE is $KEY_MOUNT/keyfile.bin below.

Seriously consider giving the keyfile a meaningful name. I would also suggest incorporating the hard drive's make, model, serial number, and partition number into its name. One can obtain such information without opening the computer by installing and running lshw.

If a passphrase is preferred, then don't specify any --key-file options to cryptsetup below. You can have up to eight (8) key files and/or passphrases set on any partition. If using passphrases, ensure that they are very long so as to have enough entropy. Instead of using passphrases, seriously consider using key files encrypted with GnuPG with a passphrase instead as used in this post.

Create And Mount The Encrypted Partition

Before any disk partition and data can be put onto the encrypted partition, it must first be formated using cryptsetup:

cryptsetup --cipher aes-xts-plain -s 512 \
  --key-file $KEY_FILE luksFormat $ROOT_DEV

NOTE: The Cryptsetup FAQ in response to the question, "Can I resize a dm-crypt or LUKS partition?" notes that aes-xts-plain can only be used for partitions less than 2 TiB. If the encrypted partition is to be greater than or equal to 2 TiB in size then use aes-xts-plain64.

Once the partition has been formatted (with respect to its encryption only), it can then be mounted to provide a device that appears to not be encrypted. Of course, the device is actually encrypted, but, for any programs to make use of it, it needs to have an unencrypted "view". The mount is accomplished using the following command:

cryptsetup --key-file $KEY_FILE luksOpen $ROOT_DEV $ENC_HDD

NOTE: Set $ENC_HDD to something meaningful. For example, if $ROOT_DEV is /dev/sda2 then perhaps set $ENC_HDD to hdd_a2.

The $ENC_HDD representing the unencrypted device is now online. If logical volume management (LVM) is used on it then $ENC_HDD should be thought of as a virtual encrypted hard drive. The full path of the device is: /dev/mapper/$ENC_HDD

Use LVM Create The Partitions To Install The Operating System

Instead of treating $ENC_HDD as a single partition, logical volume management (LVM) will be used to allow (i) the creation of a swap device and (ii) the root partition that the operating system will be installed on. One may not want to have Linux installed into one partition. If so, then instead of creating a single root partition, simply create all of the partitions, appropriately sized, as is appropriate for your circumstances.

Currently, there is only one "physical" (encrypted) volume, /dev/mapper/$ENC_HDD, and the LVM system must be told this (i.e., to create a physical volume):

pvcreate /dev/mapper/$ENC_HDD

With LVM, physical volumes are placed into (various) volume groups within which logical volumes (i.e., "partitions") are created. Since no previous volume groups have been created yet, one needs to be created:

vgcreate $VG_NAME /dev/mapper/$ENC_HDD

In order to do anything with a volume group, it must be activated:

vgchange -a y $VG_NAME

The swap partition should be on $ENC_HDD so it is encrypted. Should anything be swapped out to the hard drive, which will contain data from running programs, it will also be encrypted. Additionally, the swap partition should be contiguously stored, thus, the command used to create the swap partition is slightly different:

lvcreate -L$SWAP_SIZE --contiguous y -n$SWAP_NAME $VG_NAME

NOTE: $SWAP_SIZE will be a value such as 24G (which would be 24 GiB). See the lvcreate man page for further details.

The swap partition's device name is now /dev/$VG_NAME/$SWAP_NAME.

In order to create the root partition in a manner that uses all remaining space available in the volume group, the number of physical extents available needs to be obtained:

PESIZE=$( \
  vgdisplay | \
  egrep 'Free[[:space:]]+PE / Size' | \
  awk '{ print $5 }' \
)

Now the root partition can be created:

lvcreate -l $PESIZE -n$ROOT_NAME $VG_NAME

The root partition's device name is /dev/$VG_NAME/$ROOT_NAME.

Formatting And Mounting Required Partitions

In order to install and use the operating system a chroot jail will be needed and this will require (re-)mounting /dev and /proc in addition to the root and boot partitions so they are accessible from inside the chroot jail:

  1. Format the swap partition:
    mkswap /dev/$VG_NAME/$SWAP_NAME
  2. Turn on the swap partition:
    swapon /dev/$VG_NAME/$SWAP_NAME
  3. Format the root partition:
    mkfs.ext4 /dev/$VG_NAME/$ROOT_NAME
  4. Mount the root partition:
    mkdir -p /mnt/gentoo
    mount /dev/$VG_NAME/$ROOT_NAME /mnt/gentoo
  5. Format the boot partition:
    mkfs.ext4 $BOOT_DEV
  6. Mount the root partition:
    mkdir -p /mnt/gentoo/boot
    mount $BOOT_DEV /mnt/gentoo/boot
  7. Copy resolv.conf over from the LiveDVD set up:
    cp -L /etc/resolv.conf /mnt/gentoo/etc/
  8. Mount /proc so it will be accessible from the chroot jail:
    mount -t proc none /mnt/gentoo/proc
  9. Mount /dev so it will be accessible from the chroot jail:
    mount --rbind /dev /mnt/gentoo/dev
  10. Enter a chroot jail on the encrypted drive:
    chroot /mnt/gentoo /bin/bash
  11. Use the environment from the hard drive:
    env-update
  12. Source /etc/profile:
    source /etc/profile
  13. Fix the prompt to indicate that the chroot jail is active:
    export PS1="(chroot) $PS1"

Install The Operating System

At this point one should follow the normal installation procedure for Gentoo Linux starting immediately after the steps of formatting and mounting all partitions. Ensure that these packages are also installed:

  • sys-fs/cryptsetup – Must be the same version or higher as what was on the LiveDVD.
  • sys-apps/busybox – If you want a decent shell within initramfs.
  • eix – An available package query tool.
  • sys-kernel/genkernel – Needed to easily build the kernel.
  • sys-kernel/gentoo-sources or sys-kernel/hardened-sources – Linux kernel sources.
  • app-crypt/gnupg – Needed for security (i.e., to have a passphrase on the keyfile).
  • sys-kernel/module-rebuild – Tool to build out-of-kernel kernel modules.
  • app-portage/portage-utils – Additional portage package management tools.
  • sys-apps/util-linux – For /sbin/blkid

Ensuring cryptsetup Version Is Correct

It is possible for the version of cryptsetup on the LiveDVD to be different from the version that is considered "stable" in the operating system. If an older version relative to the LiveDVD of cryptsetup is installed then the system will likely not be able to boot properly. In order to determine if the version of cryptsetup is the same or newer than the one on the LiveDVD, do the following:

  1. Open another console window (i.e., don't use any chrooted terminal session) and run this command as root to obtain the version of cryptsetup on the LiveDVD:
    cryptsetup --version
  2. In the chroot jail console, run:
    emerge cryptsetup
    cryptsetup --version

If the version of cryptsetup in the chroot jail console is older than the one in the LiveDVD console, then Gentoo must be told to use the newer version:

  1. Edit /etc/portage/package.accept_keywords/crypto to contain this line:
    sys-fs/cryptsetup ~amd64
  2. Re-emerge cryptsetup:
    emerge -av --autounmask=y --autounmask-write=y cryptsetup

Configuring And Building The Linux Kernel

After running emerge -av gentoo-sources or emerge -av hardened-sources the linux kernel will have been installed into /usr/src/linux. The steps to configure a Linux kernel are:

  • Run make menuconfig.
  • Set any and all settings.
  • Save and exit the tool.
  • Build the kernel, its modules, and initramfs.
  • Install such to /boot.
  • Ensure the boot loader (e.g., grub) is able to boot the kernel.

While one can roll their own kernel configuration, this post focuses on using genkernel with the gentoo-sources or hardened-sources packages for these reasons:

  • the genkernel utility completely automates everything, except this post will intentionally run make menuconfig manually,
  • booting using an encrypted root partitions as described in this post will require the following to be built and installed in initramfs:
    • statically compiling sys-fs/cryptsetup,
    • statically compiling app-crypt/gnupg,
    • statically compiling (optionally) sys-apps/busybox,
    • statically compiling in disklabel (UUID) support, and,
    • writing a script/program to prompt the user for a key-file or passphrase in order to be able to mount the encrypted system.

In short, using genkernel automates a number a lot of extra work that most won't need to be concerned with on practical level, especially if time-constrained. In order to have genkernel automate the kernel building process, do the following:

  1. Change the current working directory to /usr/src/linux:
    cd /usr/src/linux
  2. Copy the LiveDVD's kernel configuration to the current directory:
    zcat /proc/config.gz >.config
  3. Configure the kernel by running make menuconfig and ensure the following options are set to Yes to avoid issues when booting:
    • Cryptographic API –> CBC support
    • Cryptographic API –> ECB support
    • Cryptographic API –> XTS support
    • Cryptographic API –> HMAC support
    • Cryptographic API –> all CRC32c options
    • Cryptographic API –> MD5 digest algorithm
    • Cryptographic API –> Michael MIC keyed digest algorithm
    • Cryptographic API –> SHA1 digest algorithm
    • Cryptographic API –> SHA224 and SHA256 digest algorithm
    • Cryptographic API –> SHA384 and SHA512 digest algorithms
    • Cryptographic API –> all AES options
  4. Before running genkernel, ensure these settings are set in /etc/genkernel.conf:
    OLDCONFIG="yes"
    MENUCONFIG="no"
    CLEAN="yes"
    MRPROPER="no"
    MOUNTBOOT="yes"
    SAVE_CONFIG="yes"
    POSTCLEAR="1"
    LVM="yes"
    LUKS="yes"
    GPG="yes"
    BUSYBOX="yes"
    DISKLABEL="yes"
  5. If you use MAKEOPTS in /etc/make.conf, then ensure you set the same in genkernel as genkernel does not use /etc/make.conf at all. (See this post on how to set MAKEOPTS.)
  6. Build the kernel, its modules, necessary statically linked software, and install it to /boot:
    genkernel all
  7. Determine any external-to-the-kernel modules for your system:
    module-rebuild populate
  8. Rebuild any external-to-the-kernel modules:
    module-rebuild rebuild

Configuring /etc/fstab and grub

Determining All Partitions' UUIDs

Run blkid and note the UUIDs for the following partitions:

  • $BOOT_DEV – This is the unencrypted boot partition.
  • $ROOT_DEV – This is the encrypted root partition UUID which grub will need in order to find the partition at boot time.
  • /dev/mapper/$VG_NAME/$SWAP_NAME – This is the swap partition.
  • /dev/mapper/$VG_NAME/$ROOT_NAME – This is the root partition that the booted operating system uses.

NOTE: The output from blkid has quotation marks surrounding them. You'll need to remove the quotation marks when placing the UUIDs in /etc/fstab and in grub.conf.

Configuring /etc/fstab

Edit /etc/fstab so that it contains the following:

UUID=ROOT_NAME_UUID   /            ext4    errors=remount-ro     0 1
UUID=BOOT_DEV_UUID    /boot        ext4    noauto,noatime        0 1
UUID=SWAP_NAME_UUID   none         swap    sw                    0 0
/dev/cdrom            /mnt/cdrom   auto    noauto,user,ro        0 0
proc                  /proc        proc    defaults              0 0
shm                   /dev/shm     tmpfs   nodev,nosuid,noexec   0 0

where:

  • ROOT_NAME_UUID is the UUID value for /dev/$VG_NAME/$ROOT_NAME,
  • BOOT_DEV_UUID is the UUID value for $BOOT_DEV, and,
  • SWAP_NAME_UUID is the UUID value for /dev/$VG_NAME/$SWAP_NAME.

Protecting The Keyfile

When the key was generated earlier, it was generated as a binary file. One, however, does not want to walk around with a key that that has no passphrase in case it is lost or stolen, one will need to crack the passphrase in order to obtain the key. To convert the key to a GPG protected key, do the following:

  1. Mount the USB memory stick with the key.
  2. Run:
    gpg --symmetric -o $KEY_FILE.gpg $KEY_FILE

Configuring grub

Except for editing /boot/grub/grub.conf installing and configuring grub is straight-forward:

  • If grub has not been installed yet, do so:
    emerge -av grub
  • Using the UUID for $ROOT_DEV from the previous section, edit /boot/grub/grub.conf (see below).
  • Copy the active mounts:
    cat /proc/mounts >/etc/mtab
  • Install grub to the hard drive's boot sector:
    grub-install --no-floppy $INST_HDD

Examine $ROOT_DEV and note the drive letter and partition number which are the last two characters in its name. The grub boot loader uses numbers to identify drives and partitions starting from 0, therefore, drive letters 'a', 'b', 'c', etc. correspond to 'hd0', 'hd1', 'hd2', etc. and partition numbers '1', '2', '3', etc. correspond to '0', '1', '2' etc. This matters since the line with root (hd0,1) needs to be set to the correct drive and partition number as per the system being installed. Further, the appropriate file names, UUID, $VG_NAME, $KEY_DEV, $KEY_FILE, $KEY_MOUNT, $ROOT_NAME must be set/replaced with the proper values for grub to boot. Knowing this, entries in /boot/grub/grub.conf will be similar to:

title Gentoo Linux (Keyfile Boot)
root (hd0,0)
kernel /boot/kernel-genkernel-x86_64-2.6.39-gentoo-r3 root=/dev/ram0 crypt_root=UUID=ROOT_DEV_UUID real_root=/dev/mapper/$VG_NAME-$ROOT_NAME root_keydev=$KEY_DEV root_key=$KEY_FILE.gpg rootfstype=ext4 key_timeout=0 dolvm
initrd /boot/initramfs-genkernel-x86_64-2.6.39-gentoo-r3

title Gentoo Linux (Passphrase Boot)
root (hd0,0)
kernel /boot/kernel-genkernel-x86_64-2.6.39-gentoo-r3 root=/dev/ram0 crypt_root=UUID=ROOT_DEV_UUID real_root=/dev/mapper/$VG_NAME-$ROOT_NAME rootfstype=ext4 key_timeout=0 dolvm
initrd /boot/initramfs-genkernel-x86_64-2.6.39-gentoo-r3

Reboot

Finally, the system can be rebooted. If everything was set correctly, the system should fully reboot. If not, follow the instructions in the "If You Get Stuck…" section to be able to access the contents of the encrypted partition so any issues can be fixed.

Kindly contact me should any typos or errors be found within this post.

Parallel Builds With Gentoo's Emerge

Overview

Gentoo allows one to specify the degree of concurrency to be employed when emerging packages with Portage in two ways by specifying:

  • the -j and the -l options in the MAKEOPTS variable in /etc/make.conf, or,
  • the --jobs and --load-average options on the emerge command line or in the EMERGE_DEFAULT_OPTS variable in /etc/make.conf.

The article describes some of my experiences with using the above parameters.

Determining Hardware Limits On Parallelism

Assuming the operating system being used to run Portage is Linux, one needs to determine the number of CPU cores and theads available for parallelism. This can be determined by executing the following:

grep '^processor' /proc/cpuinfo | sort -u | wc -l

For brevity, this amount will be referred to as NJOBS below, i.e., as if this were run:

NJOBS=$(grep '^processor' /proc/cpuinfo | sort -u | wc -l)

Maximizing Processor Saturation

Ideally one would want all processors busy performing work as this would allow one to minimize the total amount of time compiling code. However, one does not want the system load to become excessive for the following reasons:

  • to minimize process switching, and,
  • to keep the system responsive to user input.

The first step in this process is to set MAKEOPTS to have as its maximum number if parallel tasks to be NJOBS+1 and to limit starting any new jobs if the load is NJOBS or higher in /etc/make.conf:

MAKEOPTS="-j$((NJOBS+1)) -l${NJOBS}"

where the -j option sets the maximum number of parallel jobs that can be run via make and the -l prevents any new parallel job starting unless the load is below the amount specified. (The reason the number of jobs is set to one higher than the number of processors is to help ensure saturation of processor utilization.)

There is no need to set an NJOBS variable in /etc/make.conf as hardware will not change, so one should simply substitute the proper values in MAKEOPTS. For example, MAKEOPTS suitable for an i7 processor would be:

MAKEOPTS="-j9 -l8"

Setting MAKEOPTS is very safe as there are very few packages with parallel make issues. Typically, if there are issues, the ebuild for that package will turn off the option, or, will give notice that -j1 must be used.

Setting MAKEOPTS will improve the build times of a number of packages, however:

  • many packages don't perform parallel makes, and,
  • many packages' build procedures won't saturate all processors with load.

This is most apparent when running long-build tasks such as emerge -eav world or when building large software programs such as Firefox, Chromium, or LibreOffice. In my experience an i7 processor will have a load between 1.00 and 2.00 most of the time if only MAKEOPTS is set. This is not even close to ideal.

To better utilize processors, one has to tell emerge to also run parallel jobs. This is done using the --jobs and --load-average options with emerge:

emerge --jobs=${NJOBS} --load-average=${NJOBS} world

Here, unlike MAKEOPTS, one need only set the jobs and the load limit to the number of processors as tasks will be run within such that will generate load. Since there is a load average limit specified here and in MAKEOPTS the system should not become overly busy and start thrashing or need to excessive amounts of process switching. (If such is an issue, then lower the load average settings.)

Since emerge's command line overrides anything set in /etc/make.conf, I set EMERGE_DEFAULT_OPTS in /etc/make.conf as follows:

EMERGE_DEFAULT_OPTS="--jobs=${NJOBS} --load-average=${NJOBS}"

i.e., for an i7 this would be:

EMERGE_DEFAULT_OPTS="--jobs=8 --load-average=8"

so I don't have to specify such on the command line. For packages that have issues being compiled in parallel, one need only override --jobs on the command line setting it to 1:

emerge -j1 world

Dealing With Failed Builds

There are some instances of packages that will simply not build when emerged as parallel jobs. This poses a significant issue if one was performing an emerge world of hundreds or thousands of packages. Fortunately, emerge has the ability to resume a compile as well as to skip the first package when resuming. Assuming the packages are already built on the computer, you can safely skip the package and rebuild it later with -j1. If you are extra careful not to do such more than once (or you'll lose the ability to resume), you can even do this to build the package (e.g., in another window) and when done, emerge -rav --skipfirst to resume the build process.

Want More Output?

When the number of jobs is greater than one, there is only a summary output. To see the tasks actually being performed, run:

qlop -c

or the continuously updating:

while true ; do clear ; qlop -c ; sleep 2 ; done

To see the actual build output that one sees with emerge -j1, simply run tail for the package, PKG, being built:

tail -f /var/tmp/portage/*/${PKG}*/temp/build.log

by replacing ${PKG} with the name of the package.

Different Values For Jobs and Load Averages

Elsewhere on the Internet, I've seen people use different values such as having the number of processors and assigning half to MAKEOPTS and the other half to emerge. Unfortunately, this will on average saturate and utilize only half of the processors. In my experience, even this "average" does not happen frequently!

Logically, it is much better to set the jobs to the number of processors and then to use load average settings to limit the load to limit the total number of parallel tasks created during the build process. This strategy, as described above, allows all processors to be used without excessive loads being created (e.g., on an i7 the highest loads I've seen with the above settings is approximately 9.0). It also allows parallelism to occur when packages are being built that don't use parallel make (or are limited in how much parallelism they can do). This is why the emerge load average value is set as it is above –as the MAKEOPTS load average ensures parallel tasks only start running if the load is not too high. My experience with such are very positive with good to excellent load averages when things can be done in parallel.

Conclusion

In my experience, using the settings outlined above significantly speeds up compiles on multi-core machines. There are only a couple of packages that don't like the parallel emerge where I need to intervene but the rest build without a problem and my cores/processors become mostly utilized instead of being mostly idle.

Flashing Firmware On Older Hardware

Overview

Sometimes on hardware, one has to reinstall its firmware. This is a process of connecting the device via an Ethernet cable, USB cable, or to the serial port. I recently had to do this via a serial port and then to software capable of talking to the serial port. A small but big problem is old fashioned serial ports and cables are a little harder to come by these days! Fortunately, USB to serial port cables are available, however getting software that will cooperate with the hardware's quirks to upload the firmware turned out to be a big issue.

Firmware Flashing Issues

The hardware that I had to reinstall the firmware to requires one to type in confirmations using two characters. If any extra characters are added or those two characters are incorrect it fails. Worse, to upload the firmware, one has to:

  1. start the firmware upload,
  2. initiate an ASCII upload (without any character translations) of the firmware that dumps, character-by-character data to the serial port,
  3. when done the user must confirm such, and,
  4. hardware flow control must be used.

As I discovered, item (2) was an issue with many programs (e.g., minicom, cutecom, etc.) under Linux. Some programs like cutecom would only send one line at a time and this causes carriage returns to be send as well –causing the failure of any firmware upload because of extra characters (i.e., the ENTER key / ASCII CR). Under Windows, there is no serial port program anymore and ones I sought out on the Internet crashed. After some more searching, I found a suitable program called picocom that worked under Linux!

picocom: Simple Software That Works

The program picocom is very simple, but, effective as there are no high-level "features" to get in the way of any of the required steps above. The program provides ability to interact with the serial port, to send and receive files, and to update a number of settings (e.g., baud) either while running the program or via command line options. It does this using an escape keystroke (i.e., Ctrl-A) when followed by a specific keystroke will invoke operations. For example, to send a file one would key in Ctrl-A Ctrl-S.

Normally most programs decide how to send and receive files, but, picocom requires such to be passed in as command line arguments. For example, to connect to /dev/ttyUSB0 at 19200 bps and to use an ASCII transfer protocol to send files, this is the picocom command line required:

picocom -b 19200 --send-cmd "ascii-xfr -sv" /dev/ttyUSB0

There is little to no feedback when Ctrl-A features are invoked and it is absolutely necessary to read the man page for the keystrokes required to invoke certain features as picocom is very minimal with respect to its user interface.

In the end, this program did everything I could ask for: I could key in commands required to initiate the firmware upload, confirm that it is to be installed once it has been uploaded, and it properly uploaded the firmware as-is. So if you are looking for a simple, bare bones program to upload something like firmware to a device, try picocom!

P.S. To quit the program use the keystrokes: Ctrl-A Ctrl-Q.

A Template Argument Extender

Overview

When writing C++ code, one may come across the need to have a set of N arguments of all identical types for a variadic template type. In other words, it would be nicer to write:

typedef type_extend<0, std::tuple, std::string>::type t0;
typedef type_extend<1, std::tuple, std::string>::type t1;
typedef type_extend<2, std::tuple, std::string>::type t2;
typedef type_extend<3, std::tuple, std::string>::type t3;
typedef type_extend<4, std::tuple, std::string>::type t4;

instead of:

typedef std::tuple<> t0;
typedef std::tuple<std::string> t1;
typedef std::tuple<std::string, std::string> t2;
typedef std::tuple<std::string, std::string, std::string> t3;
typedef std::tuple<std::string, std::string, std::string, std::string> t4;

In other words, the goal is to have a positive integer (i.e., the first parameter to type_extend) that determines how many times the last argument passed to type_extend will be duplicated in an argument list that will be applied as the template arguments to the type denoted by the second parameter passed to type_extend.

This article explores how to do this using variadic templates which will be part of the upcoming C++ standard.


Template Template Parameters

In the following code, notice the use of std::tuple appears without any template arguments even though std::tuple requires a list of types as its template arguments:

typedef type_extend<2, std::tuple, std::string>::type t2;

C++ allows the programmer to specify that template arguments are placeholders for class templates. Such placeholders are called template template parameters and are declared the same as class templates except that struct and union cannot be used:

template <template<typename X> class T> // OK
void some_func(T<float>*);

template <template<typename X> typename T> // ERROR
void some_func(T<double>*);

template <template<typename X> struct T> // ERROR
void some_func(T<short>*);

template <template<typename X> union T> // ERROR
void some_func(T<long>*);

Notice that the above code allows the user to specify a class that has only one template argument without specifying the template argument:

template <typename N>
struct Number
{
  N num;
};

template <template <typename> class T>
struct SomeClass
{
  // This class sets the argument to T as float...
  T<float> something;
};

int main()
{
  // Notice Number has no arguments...
  typedef SomeClass<Number> N;

  N n;
  n.something.num = 3.14F; // NOTE: num is a float
}

The usefulness of template template parameters is to allow one to omit the template arguments when passing a class template type as a template argument. The downside in using template template parameters are that (a) any use must fully specify all template parameters to the template template placeholder name and (b) the number of template arguments must exactly match the type being passed in. The restriction that (b) places is important to note: the STL containers all have default arguments and therefore they all have at least two arguments –not one!


Implementing type_extend

In order to build up a variable argument list of types, two class templates will be defined: type_extend and type_extend_impl. The type_extend class templates have the three template arguments previously discussed. The type_extend_impl class templates will need to build up the list of template arguments up to the number specified. The purpose of type_extend is to handle the N-th case and the 0-th (i.e., base) case and its definition is straight-forward:

template <unsigned N, template <typename...> class T, typename Arg>
struct type_extend
{
  typedef typename type_extend_impl<N-1,T,Arg>::type type;
};

template <template <typename...> class T, typename Arg>
struct type_extend<0,T,Arg>
{
  typedef T<> type;
};

In the base case (i.e., in the template specialization type_extend<0,T,Arg>), the computed type is T without any template arguments. In all other cases, the computed type is determined by the definition of type in type_extend_impl.


Implementing type_extend_impl

Since the only way to go from some value of an integer N as a template argument to 0 is to use recursion, the first step is to determine what is the base case's type result that needs to be computed. Since the type_extend class template handled the case where there were no arguments, then the base case of type_extend_impl will need to be T having only one template argument. As for all other cases, the number of arguments needs to be increased by one per recursive iteration. To following code accomplishes this:

template <
  unsigned N,
  template <typename...> class T,
  typename Arg,
  typename... Args
>
struct type_extend_impl
{
  // This class was invoked with Arg, Args... and to increase the
  // Arg one more time we need to add it again which is why Arg
  // appears twice below...
  typedef
    typename type_extend_impl<N-1,T,Arg,Arg,Args...>::type
    type
  ;
};

template <
  template <typename...> class T,
  typename Arg,
  typename... Args
>
struct type_extend_impl<0,T,Arg,Args...>
{
  // Base case: Stop the recursion and expose Arg.
  typedef T<Arg,Args...> type;
};

Certainly do take time to digest the above, but, the above code works since each recursive call to type_extend_impl except the last adds one Arg to the template argument list and reduces N by one.


Testing type_extend

All that is left is to test that this works. This test will assume the use of one of the GNU C++ version 4.5 or higher compilers. Suitable test code would be:

#include <tuple>

// Place definition of type_extend_impl here.
// Place definition of type_extend here.

#include <iostream>
#include <typeinfo>
#include <string>
#include <cstdlib>

#include <cxxabi.h>   // GCC-specific

template <typename T>
std::string demangled_name_of_type()
{
  // NOTE: See G++ Manual Chapter 27.
  //       http://gcc.gnu.org/onlinedocs/libstdc++/
  int status;
  char *name = abi::__cxa_demangle(typeid(T).name(), 0, 0, &status);
  std::string realname(name);
  free(name);
  return realname;
}

int main()
{
  typedef type_extend<0,std::tuple,unsigned>::type test0;
  typedef type_extend<1,std::tuple,unsigned>::type test1;
  typedef type_extend<2,std::tuple,unsigned>::type test2;
  typedef type_extend<3,std::tuple,unsigned>::type test3;
  typedef type_extend<4,std::tuple,unsigned>::type test4;

  std::cout
    << "0: " << demangled_name_of_type<test0>() << std::endl
    << "1: " << demangled_name_of_type<test1>() << std::endl
    << "2: " << demangled_name_of_type<test2>() << std::endl
    << "3: " << demangled_name_of_type<test3>() << std::endl
    << "4: " << demangled_name_of_type<test4>() << std::endl
  ;
}

Save it in a file (e.g., called program.cxx and compile it:

g++ -std=c++0x program.cxx

When it is run, the following output will appear:

0: std::tuple<>
1: std::tuple<unsigned int>
2: std::tuple<unsigned int, unsigned int>
3: std::tuple<unsigned int, unsigned int, unsigned int>
4: std::tuple<unsigned int, unsigned int, unsigned int, unsigned int>

Success!


Template Aliases

Another feature, not yet implemented by GCC, in the next standard is the ability to alias template types:

template <unsigned N, typename Arg>
using n_tuple = typename type_extend<N,std::tuple,Arg>::type;

Using template aliases would allow one to easily eliminate the need to write ::type and to partially substitute template arguments. For example, the template alias n_tuple could subsequently be used as follows:

n_tuple<3,std::string>

which would be equivalent to the t3 typedef at the very top of this file in the Overview section.

Installing Ubuntu In A VirtualBox Session For Web Development

Overview

Often when I teach a web development class I get a number of questions related to what is a good way to install software suitable for web development on one's computer. Of course, officially, this is all provided on the school's computers, computer science students (i) want to learn how to set up such on their own machines and (ii) want to be able to do web development even when they are not connected to the Internet. Although outside the official support of any web course(s) that I may be teaching, the purpose of this article is to outline how one would set out to do this using Ubuntu and VirtualBox.

Disclaimer: While I have made the effort to provide accurate information, it is entirely possible there are errors, etc. Do read the documentation for everything mentioned herein and your use of such is completely at your own risk.

Step I: Acquire VirtualBox & Ubuntu

Download and Install VirtualBox

This is the easiest step! Simply go to http://www.virtualbox.org and download the latest VirtualBox program and install it. If VirtualBox will be run on a 64-bit computer, be sure to download the 64-bit version.

VirtualBox is a software program that is capable of running an entire PC inside a virtual machine. This means other operating systems can be installed inside of it, booted, and then run –all inside a window on your computer. Nicely this allows one to have and use other operating systems on your computer without actually having to resize, create, and format partitions, set up your computer for dual booting, etc. This also allows you to use the operating system you use the most at the same time as other operatings systems (one per virtual machine that you run) at the same time –which is otherwise impossible using only a single computer.

Download Ubuntu

Before one can do anything with VirtualBox, one needs installation disc(s) for an operating system so that it can be installed. For this article, download the Ubuntu CD-ROM ISO image from http://www.ubuntu.com. Again, if this will be run on a 64-bit computer, be sure to download the 64-bit "amd64" version of Ubuntu. The "amd64" version will work with Intel or AMD64 chipsets.

Now that VirtualBox has been installed and the Ubuntu installation CD-ROM ISO file, the next step is to create a virtual machine session in VirtualBox and install Ubuntu!

Step II: Establish The Virtual Machine

Create The Virtual Machine

Unlike a real computer, with VirtualBox one can configure the which hardware resources as well as the amount of RAM, hard disk space, etc. that the virtual machine will be allowed to have. To use any software program that let's you run virtual machines, it is highly recommended that you have extra of all of the resources required to run the operating system and the software in it. Currently, to boot your computer and to run software you computer needs to have a certain amount of RAM available and a certain amount of hard disk space available + free RAM and free hard disk space to do things. When running a virtual machine, the same is true –but one will also need an additional amount of RAM and hard disk space that the virtual machine needs as well.

Important: If you don't have enough RAM or disk space to do this step, then you will have to acquire it in order to proceed further.

The amount hard disk space available to your virtual machine is easy to set at the beginning and harder to increase later. Since VirtualBox has a "grow" option for hard disk space that increases in size as more space is used, one is best to create an appropriately large hard drive. As for RAM, this can be easily changed at a later point in time.

To create the virtual machine in VirtualBox, follow these steps:

  1. Run VirtualBox.
  2. Click the New button to create a virtual machine.
  3. A Create New Virtual Machine dialog will appear.
  4. Click the Next > button.
  5. Type in a Name for the virtual machine (e.g., Ubuntu).
  6. Select the Operating System (OS) Type for the machine (e.g., Operating System: Linux, Version: Ubuntu or Ubuntu 64-bit).
  7. Click the Next > button.
  8. Select the Base Memory Size. If unsure, go with the recommended amount (e.g., 384 MiB).
  9. Click the Next > button.
  10. Ensure that the Boot Hard Disk checkbox is selected.
  11. Select the Create new hard disk radio button.
  12. Click the Next > button.
  13. A Create New Virtual Disk dialog window will appear. Click the Next > button.
  14. Under Storage Type select the Dynamically expanding storage radio button.
  15. Click the Next > button.
  16. Choose a Location to store the virtual hard disk image if the default is not liked.
  17. Choose a Size for your the disk (e.g., 8 GiB). Remember: this is not easily changed later!
  18. Click the Next > button.
  19. Click the Finish button to close the Create New Virtual Disk dialog.
  20. Click the Finish button to close the Create New Virtual Machine dialog.
  21. Ensure that the newly created virtual machine is selected.
  22. Click the Settings button in the tool bar.
  23. Click the Storage item in the left-hand side listbox.
  24. Click the Empty CD-ROM disc under IDE Controller.
  25. Click the CD-ROM drop-down button and select Choose a virtual CD/DVD disk file….
  26. Find and select the ISO file that was downloaded for Ubuntu.
  27. Click OK in the Settings dialog window.
  28. Click Start button in the toolbar to start the virtual machine to start installing Ubuntu!

Step III: Install Ubuntu

Installing Ubuntu

The last step in the previous section started the virtual machine. Just like a real computer would, you can see it (quickly!) count up the memory, look for the disk drives present, and boot off the Ubuntu installation CD-ROM. There are two modes to run Ubuntu off the CD-ROM: (i) as a LiveCD or to (ii) install. You can install Ubuntu with either option, but, the LiveCD option let's you run try Ubuntu without installing anything on your computer. There isn't a need to select the LiveCD option when using a virtual machine –except if you need to "rescue" an unbootable computer, so choose just to install Ubuntu.

Do the following steps when the installation screen appears:

  1. Click on English and choose Install Ubuntu.
  2. On the Time page choose the appropriate region (e.g., Canada) and an appropriate timezone city/locale (e.g., Toronto).
  3. Click the Forward button.
  4. On the keyboard page, choose the appropriate keyboard layout (e.g., USA keyboard layout).
  5. Click the Forward button.
  6. On the disk partitioning screen, select Use the entire disk.
  7. Click the Forward button.
  8. Enter in your full name, a login, a password, and a machine name.
  9. Click the Forward button.
  10. Click Install.
  11. Wait for all of the files to be downloaded and copied.
  12. Click Restart Now.
  13. Since your CD-ROM isn't real, removing it is a little different than what you may be used to! When prompted to remove the CD-ROM do the following:
    1. From the VirtualBox machine window's pull down menu, select the Devices menu.
    2. Select the CD/DVD Devices menu item.
    3. Uncheck the CD-ROM ISO file. This will "eject" the installation CD-ROM.
    4. Press enter inside the virtual machine window to reboot.
  14. After rebooting, login using your login and password.
  15. If prompted, install any updates.

Step IV: Post-OS-Install Software Configuration

Software Installation

With Ubuntu installed, all that remains is to install Apache web server, PHP, PostgreSQL, and a set of handy utilities! Here are the steps to do this:

  1. In Ubuntu, open a Terminal window via Applications | Accessories | Terminal.
  2. Run:
    sudo apt-get install aptitude
  3. Run:
    sudo aptitude install openssh-server
  4. Run:
    sudo aptitude install apache2 apache2-doc php5 \
    php5-mcrypt php5-ps php5-timezonedb php5-xmlrpc php5-xsl \
    php5-cli php5-curl php5-dev php5-gd php5-imagick php5-mysql php5-pgsql \
    php5-tidy php5-xmlrpc php5-xsl php5-xdebug php5-uuid php5-gmp php5-recode \
    sablotron libapache2-modxslt
  5. To install the PostgreSQL database run:
    sudo aptitude install postgresql postgresql-client postgresql-doc php5-pgsql
  6. To install the MySQL database run:
    sudo aptitude install mysql-client mysql-server php5-mysql
  7. Run:
    sudo aptitude install tinyca
  8. Run:
    sudo aptitude install vim ctags vim-doc vim-scripts
  9. Run:
    sudo aptitude install xsel
  10. Run:
    sudo aptitude install gedit-plugins
  11. Run:
    sudo aptitude install gimp
  12. Run:
    sudo aptitude install dwww w3-recs doxygen doxygen-doc
  13. Run:
    sudo aptitude install w3c-dtd-xhtml w3c-markup-validator
  14. Run:
    sudo aptitude install csstidy xmlstarlet
  15. Run:
    sudo aptitude install mutt postfix qpopper
    1. Choose Local only as the delivery option.
    2. Type in the name exactly the same as what was given the system during the install.
    3. Run:
      sudo nano -w /etc/postfix/main.cf
    4. Edit the line starting with mydestination= to read:
      mydestination = 60-334, localhost.localdomain, localhost, local.dev, www.local.dev
    5. Run:
      sudo /etc/init.d/postfix reload
    6. Run:
      sudo nano -w /etc/aliases
    7. Ensure that the aliases file has this content (with YOUR_LOGIN_NAME replaced with your login):
      root: YOUR_LOGIN_NAME
      postmaster: YOUR_LOGIN_NAME
      webmaster: YOUR_LOGIN_NAME
      webadmin: YOUR_LOGIN_NAME
  16. Before the mail program can be configured to deliver (localhost) mail, an email needs to be sent as follows:
    1. Run:
      mutt
    2. Answer No to the Create mail folder in home folder question.
    3. Hit m to generate an email to send.
    4. Enter your login name in the To: line.
    5. Enter Test in the Subject: line.
    6. Enter This is a test. in the body.
    7. Hit Ctrl-X and save the message (using the given file name).
    8. Hit y to send the message.
    9. Hit q to quit.
  17. With an email waiting, configure the Evolution mail client as follows:
    1. Open Applications | Office | Evolution Mail and Calendar.
    2. Since this is the first time you are opening Evolution, a dialog window will appear to allow you to configure its settings. Click Forward.
    3. Click Forward (if there is a backup screen).
    4. Enter your Full Name, your email address within the virtual machine (i.e., YOUR_LOGIN_NAME@localhost), and make it the default account.
    5. Click Forward.
    6. Set the Server Type to Local delivery and the path to /var/mail/spool/YOUR_LOGIN_NAME.
    7. Click Forward.
    8. Check messages every minute.
    9. Click Forward.
    10. Set the Outgoing Mail Server Type to SMTP and the Server Name to localhost.
    11. Click Forward.
    12. Click Forward.
    13. Click Apply.
  18. Right-click Applications | Office | Evolution Mail and Calendar and choose Add to Panel. This will add the Evolution icon to the panel at the top of the screen.
  19. Run Firefox:
    1. Create a bookmark for dwww:
      http://localhost/dwww/
    2. Create a bookmark for the w3-recs package's directory (via dwww):
      http://localhost/cgi-bin/dwww/usr/share/doc/w3-recs/html/index.html
    3. Create a bookmark for dwww's Help | Standards menu (i.e., for w3-recs package files):
      http://localhost/dwww/menu/shelp_standards.html
    4. Create a bookmark for the W3C Markup Validator:
      http://localhost/w3c-markup-validator
    5. Install these Add-Ons: Firebug, FirePHP, Firefinder for Firebug, FireQuery, FireXPath, Inline Code FInder for Firebug, and Web Developer.
  20. Configure the Apache web server for virtual hosting:
    1. Ensure that /etc/hosts has in it (e.g., sudo nano -w /etc/hosts) while noting that any 127/8 address will work:
      127.0.1.2 www.local.dev www
    2. Run:
      sudo nano -w /etc/apache2/sites-available/local.dev

      and add:

      <VirtualHost 127.0.1.2:80>
        ServerName www.local.dev
        ServerAlias local.dev
        ServerAdmin webmaster@localhost

        DocumentRoot /home/YOUR_LOGIN_NAME/web/public
        <Directory /home/YOUR_LOGIN_NAME/web/public>
          Options Indexes FollowSymLinks MultiViews
          AllowOverride None
          Order allow,deny
          Allow from all
        </Directory>

        LogLevel warn
        ErrorLog /home/YOUR_LOGIN_NAME/web/logs/error.log
        CustomLog /home/YOUR_LOGIN_NAME/web/logs/access.log combined
        ServerSignature Off

        <IfModule mod_php5.c>
          php_flag magic_quotes_gpc Off
          php_flag magic_quotes_runtime Off
          php_flag file_uploads On
          php_flag short_open_tag On
          php_flag session.auto_start Off
          php_flag session.bug_compat_warn Off

          php_value upload_max_filesize 16M
          php_value post_max_size 16M

          php_value error_log /home/YOUR_LOGIN_NAME/web/logs/php_errors.log
          php_flag display_errors Off
          php_flag display_startup_errors Off
        </IfModule>

        <IfModule mod_dir.c>
            DirectoryIndex index.php index.xml index.html index.htm
        </IfModule>
      </VirtualHost>
  21. Run:
    sudo nano -w /etc/apache2/ports.conf

    to have the following line in it (which must match the IP in the /etc/hosts file):

    NameVirtualHost 127.0.1.2:80
  22. Run:
    mkdir -p /home/YOUR_LOGIN_NAME/web/{public,log}
  23. Run:
    sudo chown -R www-data:www-data /home/YOUR_LOGIN_NAME/web/log
  24. Run:
    sudo a2ensite local.dev
  25. Run:
    sudo /etc/init.d/apache2 force-reload

Step V: Accelerate VirtualBox (Optional)

Installing VirtualBox Additions

VirtualBox provides a number of device drivers that provide features (e.g., Shared Folders) and accelerate the speed of the virtual machine. (The instructions below applied to VirtualBox version 3.x. I've not checked that they still apply to version 4.x.) To install these VirtualBox Additions do the following:

  1. Ensure you are logged into the virtual machine.
  2. Choose from the VirtualBox Session window Devices | Install Guest Additions….
  3. In the Ubuntu window, click on the Places menu.
  4. Open the CD-ROM menu item.
  5. After the CD-ROM drive appears, open Applications | Accessories | Terminal.
  6. Run:
    cd /media
  7. Run:
    ls
  8. Note the CD-ROM name, e.g., VBOXADDITIONS_3.1.8_61349.
  9. Run:
    cd VBOX*
  10. Run (if a 32-bit install, otherwise change the name appropriately):
    Run: sudo bash ./VBoxLinuxAdditions-x86.run
  11. Wait for the drivers to build.
  12. Run:
    exit
  13. In the CD-ROM window, click on the Eject icon for the CD-ROM. You may have to make the Window larger to see it.
  14. In the VirtualBox window menu, click on Devices | CD / DVD Devices… and uncheck VBoxGuestAdditions.iso.
  15. If you get slow screen updates, then turn off 3D in Ubuntu by:
    1. Opening System | Preferences | Appearance.
    2. Click on the Visual Effects tab.
    3. Choose None.
    4. Click Close.
  16. If VirtualBox is upgraded, remember to redo the above to upgrade your Guest Additions!

Step VI: Enable Shared Folders And Bridged Networking

Web development tasks can be simplified considerably by enabling the Bridged Networking feature and using VirtualBox' Shared Folders. Both of these are found in the Settings window.

The bridged networking feature allows the host OS to talk directly to the guest OS (i.e.. Ubuntu). After the the network setting is changed from NAT to Bridged and the computer restarted, the IP address of Ubuntu can be found by running:

sudo /sbin/ifconfig

and then using that to connect to your virtual machine from the host OS.

The Shared Folders feature makes it very easy to move files to/from the guest and host OSes. To configure Shared Folders do the following:

  1. Before booting Ubuntu, edit the Shared Folders settings and add the desired folder(s). Be sure to write down each shared folder's name as that will be needed below.
  2. One can mount the shared folder by running in a Terminal window:
    sudo mount -t vboxsf SHARED_FOLDER_NAME A_MOUNT_DIRECTORY
  3. Know that to mount a device in Linux the A_MOUNT_DIRECTORY must previously exist as a directory:
    sudo mkdir /media/windows
  4. For example, if the A_MOUNT_DIRECTORY was /media/windows and the share SHARED_FOLDER_NAME was myspace, you could mount it using:
    sudo mount -t vboxsf -o rw,uid=1000 myspace /media/windows
  5. You could also have it permanently mount the share every time you boot by adding something like this to /etc/fstab (where uid=1000 is only true if you are the only in your virtual machine):
    public_html /home/YOUR_LOGIN_NAME/web/public vboxsf rw,auto,uid=1000 0 0

Step VII: Read Documentation

There are a lot of commands and details above. Be sure to read the installed online documentation accessing via dwww right on the virtual machine. Additionally, seek out the documentation on the Internet for the various programs. Kindly do not send me, "How do I do this [but it is already on the page]? Can you do this for me [because I don't want to invest any time towards such]?" emails especially if you did not read all relevant documentation for the items of concern and give serious efforts towards resolving the issues you are trying to resolve. (Sadly people do send emails like that and they are very inappropriate.)

If you are new at and are learning how to do all of this, truly the best way is to struggle through it on your own especially armed with the level of detail I have provided on this page. If you do, you will learn a lot and you will remember it.

Linux Containers, /dev/pts/ptmx, and Gentoo

For virtualization technologies (e.g., environments, machines) Linux now supports containersย which allow multiple instances of the devpts filesystem. This allows the /dev/pts instance indices in one virtual environment to be independent of the indices allocated in another environment.

As detailed in the kernel file Documentation/filesystems/devpts.txt, this requires the kernel option:

CONFIG_DEVPTS_MULTIPLE_INSTANCES=y

and the "-o newinstance" mount option when mounting devpts.

Gentoo's sys-apps/openrc-0.8.3-r1 configuration has a /dev/init.d/devfs script in the sysinit runlevel. Despite it name and the fact that the system uses udev, the devfs initscript states that it mounts "required stuff as user may not have [them] in /etc/fstab". Unfortunately, the devfs script does not check for entries in fstab before mounting the devpts and shm filesystems.ย Fortunately, the Documentation/filesystem/devpts.txt nicely mentions what to do if initscripts are not updated to handle CONFIG_DEVPTS_MULTIPLE_INSTANCES. Since one doesn't likely want to do such things manually after every reboot and it is generally a bad idea to start editing a distribution's installation scripts, here's an initscript that performs the devpts.txt file's instructions:

#!/sbin/runscript

description="Multiple PTY fix for /dev/ptmx"

depend()
{
  after udev devfs
}

start()
{
  if [ -e /dev/pts/ptmx ]; then
    # Use next line if devpts is in /etc/fstab...
    mount -o remount /dev/pts
    # Or uncomment next line...
    #chmod 0666 /dev/pts/ptmx
    rm /dev/ptmx
    ln -s pts/ptmx /dev/ptmx
    return 0
  else
    echo "/dev/ptmx does not exist!"
    return 1
  fi
}

which also requires this line to be in /etc/fstab (but adjust the perms to your liking):

devpts /dev/pts devpts rw,relatime,gid=5,mode=666,ptmxmode=666 0 0

and to run in the sysinit runlevel:

# rc-update add ptmx-fix sysinit

Test it by rebooting and checking that /dev/ptmx is a symlink to /dev/pts/ptmx and the permissions of /dev/pts/ptmx are 0666. ๐Ÿ™‚