apply_tuple(op, t) (C++11 Final Version)

Overview

It has been a long while since my last post! All of my previous posts were written at a time when GCC and Clang had various issues with C++11 and did not have various portions of C++11 implemented. Very nicely, GCC and CLang now fully implement C++11! Despite the hiatus on this blog, I have been very busy writing C++ code including a lot of C++ template metaprogramming code. Included within such has been updates to apply_tuple() function. This post describes the (final) updates for apply_tuple() freely using appropriate C++11 features as needed.


Deleting Old Stuff

When I started messing around with coding apply_tuple() in 2011 (see the Articles By Topic link):

  • the C++11 standard had only been recently approved,
  • the C++ compilers were beginning to implement C++11,
  • C++11 features could not necessarily be used as they triggered compiler bugs or they were not yet implemented, and
  • everyone that wrote code in C++98/C++03 were all starting to explore how to do things in new and better ways in C++11.

As a result, the older apply_tuple() was done using older C++98 techniques –the last vestiges of such appeared in the Part III of my last apply_tuple() blog post which I am happy to delete from the code. Specifically, the following are completely deleted from the previous blog post:

  • struct make_seq_indices_impl
  • return_type type alias
  • apply_tuple_return_type_impl()
  • apply_tuple_return_type()
  • tuple_size() since it is not needed for one line of code

Since C++11 supports the decltype keyword to determine the type of an expression. This allows one to delete return_type, apply_tuple_return_type_impl(), and apply_tuple_return_type(). Additionally, SFINAE can be done with template parameters and this allows the deletion of struct make_seq_indices_impl. Finally, since tuple_size() is only one line of code, it was removed and replaced with a template parameter with a default value that computed the size of a tuple.

Rather than explain everything relative to the last version which forces one to read all of the previous blog posts, I will explain the solution from start to finish here. This is the final version of apply_tuple(). If you are interested in the earlier versions then look at my older posts –although after reading this, you might wonder along with me if I should just delete the older posts!

If you still need to write C++98 code, then the older posts are useful. Except for variadic template parameters, everything can be easily done using C++98 features. Variadic parameters can be handled in C++98 by hard coding definitions having 1, 2, 3, etc. parameters as needed. The use of constexpr (e.g., tuple_size) can be defined as C++ template metaprogramming (TMP) structs instead. The definitions of std::decay, etc. can be used via Boost or TR1 library code.

The discussion of the final apply_tuple() solution will start at the top of the source code file and work its way to the end. Simply concatenate all of the code for the complete working program.


The Required #include Files

This solution requires the use of std::tuple-related items and std::size_t, therefore these #include files are needed:

#include <cstddef>
#include <type_traits>


Generating the Tuple Indices

C++'s std::tuple requires using get<index>(tuple_var)</index> to access the tuple's elements. Unfortunately for newcomers to C++, INDEX must be a compile-time constant std::size_t that is permitted to be used as a template parameter. This means one cannot use a traditional for-loop to iterate through all elements to process them. While one could write a recursive function template to do this, that is too much work. It is better to create a list of std::size_t indices inside of a template parameter list that can be unrolled with a variadic template parameter list of tuples. In essence, if the tuple has 5 elements, one wants to obtain these indices:

indices<0,1,2,3,4>

These indices can be generated in a straight-forward manner. First one needs to tell the C++ compiler that indices is an empty struct that has zero or more std::size_t valued template parameters:

template <std::size_t...>
struct indices { };

The purpose of indices is only to hold the index values needed to be used with:

get<INDEX>(tuple_var)

The actual generating of these values is done in a C++ TMP metafunction called make_seq_indices.

If you are familiar with what was discussed in previous posts concerning make_seq_indices, know that the solution below was rewritten to be simpler and safer (since it checks that Begin <= End).

The definition of make_seq_indices is in several parts. It is defined using a struct that is partially specialized over its template arguments. The partial specializations cover these conditions:

  • the general case, i.e., when Begin > End, if invoked generates a compile-time error message via static_assert,
  • when Begin < End, and
  • when Begin == End which ends the make_seq_indices metafunction.

The definition of the general case requires Begin and End to be set, it sets Indices to be an empty list, and the final parameter is needed to use std::enable_if to define an appropriate partial specialization based on the values of Begin and End. The definition of the general case of make_seq_indices is:

template <
  std::size_t Begin,
  std::size_t End,
  typename Indices = indices<>,
  typename Enable = void
>
struct make_seq_indices
{
  static_assert(Begin <= End, "Begin must be <= End");
};

Due to the use of std::enable_if below, the general case will only be used when Begin <= End.

In the case when Begin < End one needs to add the current value of Begin to the end of the list of indices, I<Indices...> and recursively invoke make_seq_indices with the value of Begin plus one:

template <
  std::size_t Begin,
  std::size_t End,
  template <std::size_t...> class I,
  std::size_t... Indices
>
struct make_seq_indices<
  Begin, End,
  I<Indices...>,
  typename std::enable_if<Begin < End, void>::type
>
{
  using type =
    typename make_seq_indices<
      Begin+1, End,
      I<Indices..., Begin>
    >::type
  ;
};

Finally, when Begin == End the recursive calls must stop and the resulting Indices type is returned:

template <
  std::size_t Begin,
  std::size_t End,
  typename Indices
>
struct make_seq_indices<
  Begin, End,
  Indices,
  typename std::enable_if<Begin == End, void>::type
>
{
  using type = Indices;
};

Since it is very annoying to write expressions such as:

typename make_seq_indices<0, 10>::type

to obtain a list of indices, a C++11 type alias will be used instead like this:

make_seq_indices_T<0, 10>

to obtain a list of indices. The type alias is defined as:

template <std::size_t Begin, std::size_t End>
using make_seq_indices_T =
  typename make_seq_indices<Begin, End>::type
;


Applying an Operation on a List of Function Arguments

It is useful to consider how to pass a set of function arguments to a function or functor. The code to do this is:

template <typename Op, typename... Args>
inline constexpr auto apply(Op&& op, Args&&... args) ->
  decltype(std::forward<Op>(op)(std::forward<Args>(args)...))
{
  return std::forward<Op>(op)(std::forward<Args>(args)...);
}

The use of std::forward in conjunction with Op&& and Args&&... allows the following:

  • the compiler will perfectly forward op and all arguments, args... to the function/functor being called, and,
  • the std::forward used with op permits op to be treated as an rvalue if it is an rvalue.

The second bullet is important since C++11 supports "Move Semantics For *this" member functions and using std::forward on op will ensure op is an rvalue if it was originally an rvalue.

For more information on "Move Semantics For *this" see N2439.

Unlike the tuple solution below, nothing special needs to be done to pass the arguments in this version to op since the Args parameter pack is simply expanded inside the call to op:

std::forward<Args>(args)...


Applying an Operation to Each and Every std::tuple Element

Unlike the apply() function above, to apply a tuple the apply_tuple() function will receive a single std::tuple argument –but it will need to retrieve each tuple element for that single argument. Since the single std::tuple function argument is not and cannot be a template parameter pack, one needs a way of obtaining a template parameter pack containing the indices required to invoke the std::get() function on each tuple argument. Doing so enables one to expand the template parameter pack over the indices –which will give us the desired result:

// This function overload applies op to all tuple indices...
template <
  typename Op,
  typename Tuple,
  template <std::size_t...> class I,
  std::size_t... Indices
>
inline constexpr auto apply_tuple(
  Op&& op,
  Tuple&& t,
  I<Indices...>&&
) ->
  decltype(
    std::forward<Op>(op)(
      std::get<Indices>(std::forward<Tuple>(t))...
    )
  )
{
  return
    std::forward<Op>(op)(
      std::get<Indices>(std::forward<Tuple>(t))...
    )
  ;
}

But this is not the function the client user will invoke since one wants the indices to be generated automatically. Generating indices was the purpose of make_seq_indices and make_seq_indices_T metafunction! Thus, the definition of apply_tuple() client will directly invoke is:

// This function overload forwards op and t along with the
// indices of the tuple generated by make_seq_indices...
template <
  typename Op,
  typename Tuple,
  typename Indices =
    make_seq_indices_T<
      0,
      std::tuple_size<typename std::decay<Tuple>::type>::value
    >
>
inline constexpr auto apply_tuple(Op&& op, Tuple&& t) ->
  decltype(
    apply_tuple(
      std::forward<Op>(op),
      std::forward<Tuple>(t),
      Indices{}
    )
  )
{
  return
    apply_tuple(
      std::forward<Op>(op),
      std::forward<Tuple>(t),
      Indices{}
    )
  ;
}

Notice that there is an extra template parameter, Indices, that has a default value set. The caller of this function will never set its value. This is a handy way of creating a template parameter "variable" to hold a compile-time computed value in a function template's parameter list. In this particular instance, the Indices is used twice and by storing Indices in a template parameter, one does not have to copy-and-paste the code for it twice.

Earlier code went through some extraordinary tricks to obtain the return type of the call, op(args...). With C++11's decltype, the return value of such can be obtained by writing, decltype(op(args...)) provided the new function declarator syntax is used where the return type appears after the function arguments. This code is easier to read and understand. Unfortunately, one does have to duplicate the code in with the function's return type and with the return statement in the function.


A Sample Use

The following code is an example of using apply() and apply_tuple():

#include <iostream>

double add(int a, double b) { return a+b; }
std::tuple<int,double> some_tuple{2, 34.2};

int main()
{
  using namespace std;
  cout
    << apply(add, 2, 34.2) << endl
    << apply_tuple(add, some_tuple) << endl
  ;
}


Closing Comments

This solution is shorter and much easier to maintain than the previous solutions in my earlier blog posts. This is understandable given the amount of time that has elapsed since I first started and the evolution of C++ compilers from C++98 support to fully supporting C++11. I have been doing a lot of C++11 coding (normal and C++ TMP) within the last year and have been able to explore all of the new C++11 language features and having been learning new and better ways to write code that is more easily maintained and more efficient (than C++98, e.g., fully exploiting move semantics). One of the tasks that I have been doing now that GCC and CLang fully support C++11, is revisiting and updating old code to better use C++11's features. This post clearly shows doing such can be well-worth the effort!

I will again resume blogging concerning C++11 topics and other cool things in the fall. Between now and then, I will be posting some simpler coding-related posts as fun things to try out! 🙂

C++11's Move Semantics Are Not Free

Overview

I was taking a break on StackOverflow and discovered this question about expression templates (ETs):

I was wondering whether lambdas together with move semantics or any other new feature can do as good as ETs. Any thoughts?

So I answered the question: no, expression templates are still needed! This blog post is a rewritten version of my response on StackOverflow.

C++11 Move Operations Are Not Free

If you are new to C++11 think of move semantics as optimized copy operations. Never think of them as operations with zero overhead because they don't have zero overhead. In the worst case, moving something has the same complexity as copying: i.e., O(moving data) = O(copying data). In the best case (for classes/structs with data members), the cost of moving is at least the cost of copying pointers to internal data structures: i.e., Ω(moving data) = Ω(copying pointers to internal data structures). In no case, except when the class/struct has no data members, will the cost of moving something be zero: i.e., Ω(moving data) ≠ 0 just as Ω(copying data) ≠ 0 under the same circumstances.

If it ever appears that the cost of moving or copying data in C++ is zero, it is because of the copy elision optimization and the return value optimization (RVO) are being performed by the C++ compiler –or in a very niche set of cases your classes/structs have no member variables.

The example presented below and in the StackOverflow answer I made shows why the costs of moves needs to be considered. In the example, a math_vector class using an statically-sized std::array as its internal representation. Thus, the cost of moving a math_vector object is the same as copying it and so avoiding any extra copies or moves involving temporaries is of utmost importance. However, I don't stop there as the next question one would likely ask is: how does one get rid of the extra copies and or moves? The method used to accomplish this was the C++ template metaprogramming technique called expression templates.

The Objective and the Problem

When code is written, the goal usually sought after is to have maximum efficiency while keeping the code understandable and maintainable by humans. Object-oriented and object-based programming can help achieve this goal as it groups the operations that can be performed on types together. This leads to code that looks like:

{
  cout << "CASE 1:\n";
  math_vector<3> a{1.0, 1.1, 1.2};
  math_vector<3> b{2.0, 2.1, 2.2};
  math_vector<3> c{3.0, 3.1, 3.2};
  math_vector<3> d{4.0, 4.1, 4.2};
  math_vector<3> result = a + b + c + d;
  cout << '[' << &result << "]: " << result << "\n";
}
cout << endl;
{
  cout << "CASE 2:\n";
  math_vector<3> result =
    math_vector<3>{1.0, 1.1, 1.2} +
    math_vector<3>{2.0, 2.1, 2.2} +
    math_vector<3>{3.0, 3.1, 3.2} +
    math_vector<3>{4.0, 4.1, 4.2}
  ;
  cout << '[' << &result << "]: " << result << "\n";
}

where: math_vector<3> is represents a (mathematical) vector object that has 3 dimensions representing internally by a statically-sized std::array<long double, 3>

When compiled with C++11 move-aware code but without using any expression template coding techniques, the above code outputs:

CASE 1:
0x7ffff9902980: math_vector(initlist)
0x7ffff99029b0: math_vector(initlist)
0x7ffff99029e0: math_vector(initlist)
0x7ffff9902a10: math_vector(initlist)
0x7ffff9902ae0: math_vector(copy: 0x7ffff9902980)
0x7ffff9902ab0: math_vector(move: 0x7ffff9902ae0)
0x7ffff9902a40: math_vector(move: 0x7ffff9902ab0)
0x7ffff9902ab0: ~math_vector()
0x7ffff9902ae0: ~math_vector()
[0x7ffff9902a40]: (10,10.4,10.8)
0x7ffff9902a40: ~math_vector()
0x7ffff9902a10: ~math_vector()
0x7ffff99029e0: ~math_vector()
0x7ffff99029b0: ~math_vector()
0x7ffff9902980: ~math_vector()

CASE 2:
0x7ffff9902b10: math_vector(initlist)
0x7ffff9902b80: math_vector(initlist)
0x7ffff9902bf0: math_vector(initlist)
0x7ffff9902c30: math_vector(initlist)
0x7ffff9902bc0: math_vector(move: 0x7ffff9902c30)
0x7ffff9902b50: math_vector(move: 0x7ffff9902bc0)
0x7ffff9902a40: math_vector(move: 0x7ffff9902b50)
0x7ffff9902b50: ~math_vector()
0x7ffff9902bc0: ~math_vector()
0x7ffff9902c30: ~math_vector()
0x7ffff9902bf0: ~math_vector()
0x7ffff9902b80: ~math_vector()
0x7ffff9902b10: ~math_vector()
[0x7ffff9902a40]: (10,10.4,10.8)
0x7ffff9902a40: ~math_vector()

i.e., CASE 1 requires four explicit constructed instances using initializer lists (i.e., the "initlist" items), the "result" variable (i.e., `0x7fff8d6eddd0`), and an additional three objects for copying and moving; CASE 2 is better only requiring three extra objects for moving.

But one can do better: zero extra temporaries should be possible with the one caveat: all explicitly instantiated types would still be created (i.e., the four "initlist" constructors and "result"). This will be achieved using the expressions templates technique by creating:

  • a proxy math_vector_expr<leftexpr ,BinaryOp,RightExpr> class to hold an expression not computed yet,
  • a proxy plus_op class was created to hold the addition operation not computed yet,
  • a constructor added to math_vector to accept a math_vector_expr object,
  • "starter" member functions in math_vector to trigger the creation of the expression template (e.g., operator + and operator +=), and,
  • "end" member functions in math_vector to force the computation to occur (e.g., by assigning a math_vector_expr to a math_vector and by passing a math_vector_expr to a math_vector constructor).

Key is realizing that care was taken to ensure that all function arguments are perfectly forwarded and all return values are moved with the proxy objects and all code invoked from the use of such. Additionally, code was written to ensure that copies would not be needlessly be made by writing overloads for rvalue references –not just constant references– and moving returned values when they are lvalues referring to rvalue references.

If you want or need to more about move semantics, lvalues, and rvalues see my series that starting with Applying std::tuple to Functors Efficiently.

The results using expression templates are wonderful: no extra temporaries are created in CASE 1 or CASE 2. This is demonstrated by the following output:

CASE 1:
0x7fff4d2ed820: math_vector(initlist)
0x7fff4d2ed850: math_vector(initlist)
0x7fff4d2ed880: math_vector(initlist)
0x7fff4d2ed8b0: math_vector(initlist)
0x7fff4d2ed8e0: math_vector(expr: 0x7fff4d2ed950)
[0x7fff4d2ed8e0]: (10,10.4,10.8)
0x7fff4d2ed8e0: ~math_vector()
0x7fff4d2ed8b0: ~math_vector()
0x7fff4d2ed880: ~math_vector()
0x7fff4d2ed850: ~math_vector()
0x7fff4d2ed820: ~math_vector()

CASE 2:
0x7fff4d2ed990: math_vector(initlist)
0x7fff4d2ed9e0: math_vector(initlist)
0x7fff4d2eda30: math_vector(initlist)
0x7fff4d2eda70: math_vector(initlist)
0x7fff4d2ed8e0: math_vector(expr: 0x7fff4d2ed980)
0x7fff4d2eda70: ~math_vector()
0x7fff4d2eda30: ~math_vector()
0x7fff4d2ed9e0: ~math_vector()
0x7fff4d2ed990: ~math_vector()
[0x7fff4d2ed8e0]: (10,10.4,10.8)
0x7fff4d2ed8e0: ~math_vector()

i.e., only five math_vector's are created in each case: the four "initlist" objects and the "result" variable. If one looks at the resulting assembler code the results are even better:

fldt  128(%rsp)
leaq  128(%rsp), %rdi
leaq  80(%rsp), %rbp
fldt  176(%rsp)
faddp %st, %st(1)
fldt  224(%rsp)
faddp %st, %st(1)
fldt  272(%rsp)
faddp %st, %st(1)
fstpt 80(%rsp)
fldt  144(%rsp)
fldt  192(%rsp)
faddp %st, %st(1)
fldt  240(%rsp)
faddp %st, %st(1)
fldt  288(%rsp)
faddp %st, %st(1)
fstpt 96(%rsp)
fldt  160(%rsp)
fldt  208(%rsp)
faddp %st, %st(1)
fldt  256(%rsp)
faddp %st, %st(1)
fldt  304(%rsp)
faddp %st, %st(1)
fstpt 112(%rsp)

which appears between the initialization of the four "initlist" variables and the output of the "result" data. The reason this is an "even better" result is that absolutely no function calls are done here: only CPU operations related to loading and adding floating point numbers. This is what one would do when writing hand-optimizing the code, e.g., unrolling loops, etc.

Expression Templates

Before showing the code below, a short bit concerning expression templates should be provided. Expression templates are a form of template metaprogramming. Template metaprogramming is a style of programming using C++ templates that has the compiler compute the code to generate at compile-time. This can be done because the C++ template mechanism is itself Turing-complete. Unlike "normal" programming it involves the use of types and/or constants.

Aside: Overall template metaprogramming is not easy, so if you are new to C++ wait until you are at least an intermediate-skilled C++ programmer before jumping into the C++ metaprogramming world as you will need to understand key subtle points of the language. Many of my other C++ posts on this site use template metaprogramming if you want to see some of the things that can be done and some of what it entails.

The expression template technique involves building parse trees on recognized expressions using user-defined types. Each type in these parse trees is a proxy object representing the values of what is to be computed at a later point in time. Thus, each proxy object stores its arguments and any operations to be performed without actually performing the operation until such absolutely needs to be done or otherwise desired.

In the code below, this is exactly what the math_vector_expr class does. Notice that it stores the "left expression" and the "right expression" for some "binary operator". Notice that both the left and right expressions are stored, but, the binary operation was not since plus_op class does not have any state in this example. If it did, then the binary operation would have also been stored.

The actual construction of the parse tree is triggered from operator + code in the math_vector class since it returns a math_vector_expr. Notice that the math_vector_expr class doesn't ever compute anything –it merely stores its arguments and returns another math_vector_expr. Key here is to have all math_vector_expr instances rvalues or at least constant references to exploit compiler optimizations and avoid generating code with run-time overhead. Finally, notice that only by passing a math_vector_expr to the math_vector constructor or by assigning one to a math_veector will the math_vector_expr finally be asked to compute the values belonging to each vector element. When coupled with C++11 compiler optimizations the end result is the elision of all compiler temporary variables and total code efficiency. It is totally and very cool!

C++ Code

The following C++ code is the example code used to produce all output above. Be sure to compile the code with at least level one optimizations. This code will work with both clang++ v3.1 and g++ v4.8 (and likely v4.7). I compiled the code using these options: -std=c++11 -O3.

If you want to build the code so that it does not use expression templates, then #define DONT_USE_EXPR_TEMPL. If not, then ensure the macro is not defined.

#include <array>
#include <algorithm>
#include <initializer_list>
#include <type_traits>
#include <iostream>

//#define DONT_USE_EXPR_TEMPL

//=====================================================================

template <std::size_t N> class math_vector;

template <
  typename LeftExpr,
  typename BinaryOp,
  typename RightExpr
>
class math_vector_expr
{
  public:
    math_vector_expr() = delete;

    math_vector_expr(LeftExpr l, RightExpr r) :
      l_(std::forward<LeftExpr>(l)),
      r_(std::forward<RightExpr>(r))
    {
    }

    // Prohibit copying...
    math_vector_expr(math_vector_expr const&) = delete;
    math_vector_expr& operator =(math_vector_expr const&) = delete;

    // Allow moves...
    math_vector_expr(math_vector_expr&&) = default;
    math_vector_expr& operator =(math_vector_expr&&) = default;

    template <typename RE>
    auto operator +(RE&& re) const ->
      math_vector_expr<
        math_vector_expr<LeftExpr,BinaryOp,RightExpr> const&,
        BinaryOp,
        decltype(std::forward<RE>(re))
      >
    {
      return
        math_vector_expr<
          math_vector_expr<LeftExpr,BinaryOp,RightExpr> const&,
          BinaryOp,
          decltype(std::forward<RE>(re))
        >(*this, std::forward<RE>(re))
      ;
    }

    auto le() ->
      typename std::add_lvalue_reference<LeftExpr>::type
      { return l_; }

    auto le() const ->
      typename std::add_lvalue_reference<
        typename std::add_const<LeftExpr>::type
      >::type
      { return l_; }

    auto re() ->
      typename std::add_lvalue_reference<RightExpr>::type
      { return r_; }

    auto re() const ->
      typename std::add_lvalue_reference<
        typename std::add_const<RightExpr>::type
      >::type
      { return r_; }

    auto operator [](std::size_t index) const ->
      decltype(
        BinaryOp::apply(this->le()[index], this->re()[index])
      )
    {
      return BinaryOp::apply(le()[index], re()[index]);
    }

  private:
    LeftExpr l_;
    RightExpr r_;
};

//=====================================================================

template <typename T>
struct plus_op
{
  static T apply(T const& a, T const& b)
  {
    return a + b;
  }

  static T apply(T&& a, T const& b)
  {
    a += b;
    return std::move(a);
  }

  static T apply(T const& a, T&& b)
  {
    b += a;
    return std::move(b);
  }

  static T apply(T&& a, T&& b)
  {
    a += b;
    return std::move(a);
  }
};

//=====================================================================

template <std::size_t N>
class math_vector
{
  using impl_type = std::array<long double, N>;

  public:
    math_vector()
    {
      using namespace std;
      fill(begin(v_), end(v_), impl_type{});
      std::cout << this << ": math_vector()" << endl;
    }

    math_vector(math_vector const& mv) noexcept
    {
      using namespace std;
      copy(begin(mv.v_), end(mv.v_), begin(v_));
      std::cout << this << ": math_vector(copy: " << &mv << ")" << endl;
    }

    math_vector(math_vector&& mv) noexcept
    {
      using namespace std;
      move(begin(mv.v_), end(mv.v_), begin(v_));
      std::cout << this << ": math_vector(move: " << &mv << ")" << endl;
    }

    math_vector(std::initializer_list<typename impl_type::value_type> l)
    {
      using namespace std;
      copy(begin(l), end(l), begin(v_));
      std::cout << this << ": math_vector(initlist)" << endl;
    }

    math_vector& operator =(math_vector const& mv) noexcept
    {
      using namespace std;
      copy(begin(mv.v_), end(mv.v_), begin(v_));
      std::cout << this << ": math_vector op =(copy: " << &mv << ")" << endl;
      return *this;
    }

    math_vector& operator =(math_vector&& mv) noexcept
    {
      using namespace std;
      move(begin(mv.v_), end(mv.v_), begin(v_));
      std::cout << this << ": math_vector op =(move: " << &mv << ")" << endl;
      return *this;
    }

    ~math_vector()
    {
      using namespace std;
      std::cout << this << ": ~math_vector()" << endl;
    }

    void swap(math_vector& mv)
    {
      using namespace std;
      for (std::size_t i = 0; i<N; ++i)
        swap(v_[i], mv[i]);
    }

    auto operator [](std::size_t index) const
      -> typename impl_type::value_type const&
    {
      return v_[index];
    }

    auto operator [](std::size_t index)
      -> typename impl_type::value_type&
    {
      return v_[index];
    }

    math_vector& operator +=(math_vector const& b)
    {
      for (std::size_t i = 0; i<N; ++i)
        v_[i] += b[i];
      return *this;
    }

  #ifndef DONT_USE_EXPR_TEMPL

    template <typename LE, typename Op, typename RE>
    math_vector(math_vector_expr<LE,Op,RE>&& mve)
    {
      for (std::size_t i = 0; i < N; ++i)
        v_[i] = mve[i];
      std::cout << this << ": math_vector(expr: " << &mve << ")" << std::endl;
    }

    template <typename RightExpr>
    math_vector& operator =(RightExpr&& re)
    {
      for (std::size_t i = 0; i<N; ++i)
        v_[i] = re[i];
      return *this;
    }

    template <typename RightExpr>
    math_vector& operator +=(RightExpr&& re)
    {
      for (std::size_t i = 0; i<N; ++i)
        v_[i] += re[i];
      return *this;
    }

    template <typename RightExpr>
    auto operator +(RightExpr&& re) const ->
      math_vector_expr<
        math_vector const&,
        plus_op<typename impl_type::value_type>,
        decltype(std::forward<RightExpr>(re))
      >
    {
      return
        math_vector_expr<
          math_vector const&,
          plus_op<typename impl_type::value_type>,
          decltype(std::forward<RightExpr>(re))
        >(
          *this,
          std::forward<RightExpr>(re)
        )
      ;
    }
  #endif // #ifndef DONT_USE_EXPR_TEMPL

  private:
    impl_type v_;
};

//=====================================================================

template <std::size_t N>
inline void swap(math_vector<N>& a, math_vector<N>& b)
{
  a.swap(b);
}

//=====================================================================

#ifdef DONT_USE_EXPR_TEMPL

template <std::size_t N>
inline math_vector<N> operator +(
  math_vector<N> const& a,
  math_vector<N> const& b
)
{
  math_vector<N> retval(a);
  retval += b;
  return retval;
}

template <std::size_t N>
inline math_vector<N> operator +(
  math_vector<N>&& a,
  math_vector<N> const& b
)
{
  a += b;
  return std::move(a);
}

template <std::size_t N>
inline math_vector<N> operator +(
  math_vector<N> const& a,
  math_vector<N>&& b
)
{
  b += a;
  return std::move(b);
}

template <std::size_t N>
inline math_vector<N> operator +(
  math_vector<N>&& a,
  math_vector<N>&& b
)
{
  a += std::move(b);
  return std::move(a);
}

#endif // #ifdef DONT_USE_EXPR_TEMPL

//=====================================================================

template <std::size_t N>
std::ostream& operator <<(std::ostream& os, math_vector<N> const& mv)
{
  os << '(';
  for (std::size_t i = 0; i < N; ++i)
    os << mv[i] << ((i+1 != N) ? ',' : ')');
  return os;
}

//=====================================================================

int main()
{
  using namespace std;

  try
  {
    {
      cout << "CASE 1:\n";
      math_vector<3> a{1.0, 1.1, 1.2};
      math_vector<3> b{2.0, 2.1, 2.2};
      math_vector<3> c{3.0, 3.1, 3.2};
      math_vector<3> d{4.0, 4.1, 4.2};
      math_vector<3> result = a + b + c + d;
      cout << '[' << &result << "]: " << result << "\n";
    }
    cout << endl;
    {
      cout << "CASE 2:\n";
      math_vector<3> result =
        math_vector<3>{1.0, 1.1, 1.2} +
        math_vector<3>{2.0, 2.1, 2.2} +
        math_vector<3>{3.0, 3.1, 3.2} +
        math_vector<3>{4.0, 4.1, 4.2}
      ;
      cout << '[' << &result << "]: " << result << "\n";
    }
  }
  catch (...)
  {
    return 1;
  }
}

//=====================================================================

An Enhanced Template Parameter Extender

Overview

In A Template Argument Extender C++ template metaprogramming was used to repeat a single template parameter type N times. This article outlines how such can be enhanced to repeat a set of template parameter types N types.


Defining The Problem

The previous article outlined how to use C++ template metaprogramming (C++ TMP) to create a template class having a single type repeated N times as a parameter. For example, the type:

n_tuple<0, int>

would become:

std::tuple<>

whereas:

n_tuple<1, int> // 1
n_tuple<2, int> // 2
n_tuple<3, int> // 3

would become:

std::tuple<int> // 1
std::tuple<int, int> // 2
std::tuple<int, int, int> // 3

Unfortunately, if one wanted to repeat a list of types:

n_tuple<2, int, double>

to produce:

std::tuple<int, double, int, double>

one was out of luck with my previous article. But as you might have guessed it is possible to easily write C++ TMP code to handle such with a little cleverness! 🙂


Defining A Placeholder Type

One of the weird clever things one does when writing C++ TMP code is creating types that are not at all used at run-time. These types are simply placeholders so that the C++ TMP code transformations can be made to work. In this article, a placeholder type is defined and used purely for compile-time only code transformations. The purpose of the placeholder is to represent a type list for a set of template parameters:

template <typename... T>
struct tl { };

The reason one needs such a type is because template parameter packs:

  • are not types (e.g., one cannot hold a template parameter pack in a typedef),
  • cannot have their parameters directly manipulated, and,
  • they can only be expanded (e.g., using ...) or queried for their size (e.g., using sizeof...).

However when using a placeholder type template parameter packs:

  • can be held as the template parameters of the placeholder types,
  • can be passed around and stored in typedefs via the placeholder type since the placeholder type is a type,
  • can be (indirectly) queried/manipulated using template template parameters and partial template specialization,
  • can even have a concrete run-time implementation if needed.

Thus, by using a placeholder type, one can circumvent the limitations of template parameter packs. Some of these benefits of handling template parameter packs by using a placeholder type will be apparent in the code below.


Defining A Template Alias

Although one cannot have template typedefs in C++, the C++11 standard introduced template aliases using the using keyword. The definition of the above n_tuple as a template alias is:

template <unsigned N, typename... Args>
using n_tuple = typename repeater<std::tuple, N, tl<Args...>>::type;

where N is the number of times to repeat Args... as the parameter list for std::tuple. Notice Args... is passed using the tl placeholder type defined above.

So all that is left is to define the repeater class.


Defining The repeater Class

General Template Definition

The repeater class is where the real work is done. For the repeater class to work, one needs:

  • a template template parameter of the type that when provided the repeated template parameters will determine the type of the final result,
  • the number of times to repeat the types, N,
  • the types to be repeated (passed using the placeholder type to keep such separate from the actual repeated results), and,
  • a temporary list of already repeated arguments (initially empty).

Thus the general definition of repeater (which must appear before any specializations) is:

template <
  template <typename...> class ResultHolder,
  unsigned N,
  typename Repeated,
  typename... ArgsList
>
struct repeater;

From the definition of n_tuple:

  1. ResultHolder is set to std::tuple,
  2. N is the number of times to repeat the parameters contained in Repeated,
  3. Repeated needs to be separated out from the wrapper placeholder type (i.e., by using a template template type in the specializations) from the parameters it holds, and,
  4. ArgsList is initially empty but will hold the recursive repeated-expansion of the template arguments.

Recursive Case

This case applies when N > 0. When such is true, we'll append ToBeRepeated... to ArgsList..., decrement N, and re-invoke repeater:

template <
  template <typename...> class ResultHolder,
  unsigned N,
  template <typename...> class Holder,
  typename... ToBeRepeated,
  typename... ArgsList
>
struct repeater<
  ResultHolder,
  N,
  Holder<ToBeRepeated...>,
  ArgsList...
>
{
  typedef
    typename repeater<
      ResultHolder,
      N-1,
      Holder<ToBeRepeated...>,
      ArgsList..., ToBeRepeated...
    >::type
    type
  ;
};

Base Case

This case applies when N == 0. In this case ArgsList... is now the complete list of template parameter types that must be placed in ResultHolder and returned via the type typedef:

template <
  template <typename...> class ResultHolder,
  template <typename...> class Holder,
  typename... ToBeRepeated,
  typename... ArgsList
>
struct repeater<
  ResultHolder,
  0U,
  Holder<ToBeRepeated...>,
  ArgsList...
>
{
  typedef ResultHolder<ArgsList...> type;
};

As done within this article, one should build up ArgsList... before placing it in ResultHolder to avoid any potential errors with ResultHolder requiring a certain minimum of type parameters.


Testing The Solution

The resulting code can now be tested (again I am using code specific to GCC as I did in the original article):

#include <tuple>

// Definition of tl
// Definition of repeater
// Definition of n_tuple

#include <string>
#include <typeinfo>
#include <cstdlib>
#include <cxxabi.h> // GCC-specific

template <typename T>
std::string demangled_name_of_type()  // GCC-specific
{
  // NOTE: See GCC Manual Chapter 27.
  //       http://gcc.gnu.org/onlinedocs/libstdc++/
  int status;
  char *name = abi::__cxa_demangle(typeid(T).name(), 0, 0, &status);
  std::string realname(name);
  free(name);
  return realname;
}

#include <iostream>

int main()
{
  using namespace std;

  cout
    << demangled_name_of_type<n_tuple<0>>() << endl
    << demangled_name_of_type<n_tuple<0,int>>() << endl
    << demangled_name_of_type<n_tuple<1,int>>() << endl
    << demangled_name_of_type<n_tuple<2,int>>() << endl
    << demangled_name_of_type<n_tuple<0,char,short,long>>() << endl
    << demangled_name_of_type<n_tuple<1,char,short,long>>() << endl
    << demangled_name_of_type<n_tuple<2,char,short,long>>() << endl
  ;
}

For example under Gentoo Linux I compiled main.cxx above using GCC v4.7 (snapshot 20111217) using this command:

g++-4.7.0-alpha20111217 \
  -Xlinker -R /usr/lib/gcc/x86_64-pc-linux-gnu/4.7.0-alpha20111217 \
  main.cxx

which produces this output when run:

$ ./a.out
std::tuple<>
std::tuple<>
std::tuple<int>
std::tuple<int, int>
std::tuple<>
std::tuple<char, short, long>
std::tuple<char, short, long, char, short, long>
$

The goal of having a list-of-types extended automatically has been reached!


Closing Comments

As is typical when one revisits old code, one improves the previous code. Such is true here: the above code is simpler and easier-to-understand than my previous article's code. The use of a template alias further enhances readability and end-user usability. The alias also avoids the need for a special class simply to properly invoke the repeater class.

Newcomers to C++ TMP should note that the placeholder type used above could be any template class type accepting 0 or more types as template parameters. The placeholder's only use is to permit the extraction of its template parameters using partial template specialization. If combined with other C++ TMP code then not hard-coding the placeholder type is especially convenient: should one use repeater directly, one does not need to extract the parameters from one template class type and then place those types into a special placeholder type just to use repeater.

If one is using repeater directly, one can pass an initial set of template parameters for ArgsList.... These parameters will not be repeated but will be used as the first parameters of the Result type regardless of the value of N.

Finally note that one could have placed ArgsList... within a placeholder type as well. I wanted it to be easy to pass in a list of mandatory initial template parameters when using repeater directly. If one wanted to process multiple parameter lists independently then using a placeholder type would be essential.

Happy Coding!

A Template Argument Extender

Overview

When writing C++ code, one may come across the need to have a set of N arguments of all identical types for a variadic template type. In other words, it would be nicer to write:

typedef type_extend<0, std::tuple, std::string>::type t0;
typedef type_extend<1, std::tuple, std::string>::type t1;
typedef type_extend<2, std::tuple, std::string>::type t2;
typedef type_extend<3, std::tuple, std::string>::type t3;
typedef type_extend<4, std::tuple, std::string>::type t4;

instead of:

typedef std::tuple<> t0;
typedef std::tuple<std::string> t1;
typedef std::tuple<std::string, std::string> t2;
typedef std::tuple<std::string, std::string, std::string> t3;
typedef std::tuple<std::string, std::string, std::string, std::string> t4;

In other words, the goal is to have a positive integer (i.e., the first parameter to type_extend) that determines how many times the last argument passed to type_extend will be duplicated in an argument list that will be applied as the template arguments to the type denoted by the second parameter passed to type_extend.

This article explores how to do this using variadic templates which will be part of the upcoming C++ standard.


Template Template Parameters

In the following code, notice the use of std::tuple appears without any template arguments even though std::tuple requires a list of types as its template arguments:

typedef type_extend<2, std::tuple, std::string>::type t2;

C++ allows the programmer to specify that template arguments are placeholders for class templates. Such placeholders are called template template parameters and are declared the same as class templates except that struct and union cannot be used:

template <template<typename X> class T> // OK
void some_func(T<float>*);

template <template<typename X> typename T> // ERROR
void some_func(T<double>*);

template <template<typename X> struct T> // ERROR
void some_func(T<short>*);

template <template<typename X> union T> // ERROR
void some_func(T<long>*);

Notice that the above code allows the user to specify a class that has only one template argument without specifying the template argument:

template <typename N>
struct Number
{
  N num;
};

template <template <typename> class T>
struct SomeClass
{
  // This class sets the argument to T as float...
  T<float> something;
};

int main()
{
  // Notice Number has no arguments...
  typedef SomeClass<Number> N;

  N n;
  n.something.num = 3.14F; // NOTE: num is a float
}

The usefulness of template template parameters is to allow one to omit the template arguments when passing a class template type as a template argument. The downside in using template template parameters are that (a) any use must fully specify all template parameters to the template template placeholder name and (b) the number of template arguments must exactly match the type being passed in. The restriction that (b) places is important to note: the STL containers all have default arguments and therefore they all have at least two arguments –not one!


Implementing type_extend

In order to build up a variable argument list of types, two class templates will be defined: type_extend and type_extend_impl. The type_extend class templates have the three template arguments previously discussed. The type_extend_impl class templates will need to build up the list of template arguments up to the number specified. The purpose of type_extend is to handle the N-th case and the 0-th (i.e., base) case and its definition is straight-forward:

template <unsigned N, template <typename...> class T, typename Arg>
struct type_extend
{
  typedef typename type_extend_impl<N-1,T,Arg>::type type;
};

template <template <typename...> class T, typename Arg>
struct type_extend<0,T,Arg>
{
  typedef T<> type;
};

In the base case (i.e., in the template specialization type_extend<0,T,Arg>), the computed type is T without any template arguments. In all other cases, the computed type is determined by the definition of type in type_extend_impl.


Implementing type_extend_impl

Since the only way to go from some value of an integer N as a template argument to 0 is to use recursion, the first step is to determine what is the base case's type result that needs to be computed. Since the type_extend class template handled the case where there were no arguments, then the base case of type_extend_impl will need to be T having only one template argument. As for all other cases, the number of arguments needs to be increased by one per recursive iteration. To following code accomplishes this:

template <
  unsigned N,
  template <typename...> class T,
  typename Arg,
  typename... Args
>
struct type_extend_impl
{
  // This class was invoked with Arg, Args... and to increase the
  // Arg one more time we need to add it again which is why Arg
  // appears twice below...
  typedef
    typename type_extend_impl<N-1,T,Arg,Arg,Args...>::type
    type
  ;
};

template <
  template <typename...> class T,
  typename Arg,
  typename... Args
>
struct type_extend_impl<0,T,Arg,Args...>
{
  // Base case: Stop the recursion and expose Arg.
  typedef T<Arg,Args...> type;
};

Certainly do take time to digest the above, but, the above code works since each recursive call to type_extend_impl except the last adds one Arg to the template argument list and reduces N by one.


Testing type_extend

All that is left is to test that this works. This test will assume the use of one of the GNU C++ version 4.5 or higher compilers. Suitable test code would be:

#include <tuple>

// Place definition of type_extend_impl here.
// Place definition of type_extend here.

#include <iostream>
#include <typeinfo>
#include <string>
#include <cstdlib>

#include <cxxabi.h>   // GCC-specific

template <typename T>
std::string demangled_name_of_type()
{
  // NOTE: See G++ Manual Chapter 27.
  //       http://gcc.gnu.org/onlinedocs/libstdc++/
  int status;
  char *name = abi::__cxa_demangle(typeid(T).name(), 0, 0, &status);
  std::string realname(name);
  free(name);
  return realname;
}

int main()
{
  typedef type_extend<0,std::tuple,unsigned>::type test0;
  typedef type_extend<1,std::tuple,unsigned>::type test1;
  typedef type_extend<2,std::tuple,unsigned>::type test2;
  typedef type_extend<3,std::tuple,unsigned>::type test3;
  typedef type_extend<4,std::tuple,unsigned>::type test4;

  std::cout
    << "0: " << demangled_name_of_type<test0>() << std::endl
    << "1: " << demangled_name_of_type<test1>() << std::endl
    << "2: " << demangled_name_of_type<test2>() << std::endl
    << "3: " << demangled_name_of_type<test3>() << std::endl
    << "4: " << demangled_name_of_type<test4>() << std::endl
  ;
}

Save it in a file (e.g., called program.cxx and compile it:

g++ -std=c++0x program.cxx

When it is run, the following output will appear:

0: std::tuple<>
1: std::tuple<unsigned int>
2: std::tuple<unsigned int, unsigned int>
3: std::tuple<unsigned int, unsigned int, unsigned int>
4: std::tuple<unsigned int, unsigned int, unsigned int, unsigned int>

Success!


Template Aliases

Another feature, not yet implemented by GCC, in the next standard is the ability to alias template types:

template <unsigned N, typename Arg>
using n_tuple = typename type_extend<N,std::tuple,Arg>::type;

Using template aliases would allow one to easily eliminate the need to write ::type and to partially substitute template arguments. For example, the template alias n_tuple could subsequently be used as follows:

n_tuple<3,std::string>

which would be equivalent to the t3 typedef at the very top of this file in the Overview section.