apply_tuple(op, t) (C++11 Final Version)

Overview

It has been a long while since my last post! All of my previous posts were written at a time when GCC and Clang had various issues with C++11 and did not have various portions of C++11 implemented. Very nicely, GCC and CLang now fully implement C++11! Despite the hiatus on this blog, I have been very busy writing C++ code including a lot of C++ template metaprogramming code. Included within such has been updates to apply_tuple() function. This post describes the (final) updates for apply_tuple() freely using appropriate C++11 features as needed.


Deleting Old Stuff

When I started messing around with coding apply_tuple() in 2011 (see the Articles By Topic link):

  • the C++11 standard had only been recently approved,
  • the C++ compilers were beginning to implement C++11,
  • C++11 features could not necessarily be used as they triggered compiler bugs or they were not yet implemented, and
  • everyone that wrote code in C++98/C++03 were all starting to explore how to do things in new and better ways in C++11.

As a result, the older apply_tuple() was done using older C++98 techniques –the last vestiges of such appeared in the Part III of my last apply_tuple() blog post which I am happy to delete from the code. Specifically, the following are completely deleted from the previous blog post:

  • struct make_seq_indices_impl
  • return_type type alias
  • apply_tuple_return_type_impl()
  • apply_tuple_return_type()
  • tuple_size() since it is not needed for one line of code

Since C++11 supports the decltype keyword to determine the type of an expression. This allows one to delete return_type, apply_tuple_return_type_impl(), and apply_tuple_return_type(). Additionally, SFINAE can be done with template parameters and this allows the deletion of struct make_seq_indices_impl. Finally, since tuple_size() is only one line of code, it was removed and replaced with a template parameter with a default value that computed the size of a tuple.

Rather than explain everything relative to the last version which forces one to read all of the previous blog posts, I will explain the solution from start to finish here. This is the final version of apply_tuple(). If you are interested in the earlier versions then look at my older posts –although after reading this, you might wonder along with me if I should just delete the older posts!

If you still need to write C++98 code, then the older posts are useful. Except for variadic template parameters, everything can be easily done using C++98 features. Variadic parameters can be handled in C++98 by hard coding definitions having 1, 2, 3, etc. parameters as needed. The use of constexpr (e.g., tuple_size) can be defined as C++ template metaprogramming (TMP) structs instead. The definitions of std::decay, etc. can be used via Boost or TR1 library code.

The discussion of the final apply_tuple() solution will start at the top of the source code file and work its way to the end. Simply concatenate all of the code for the complete working program.


The Required #include Files

This solution requires the use of std::tuple-related items and std::size_t, therefore these #include files are needed:

#include <cstddef>
#include <type_traits>


Generating the Tuple Indices

C++'s std::tuple requires using get<index>(tuple_var)</index> to access the tuple's elements. Unfortunately for newcomers to C++, INDEX must be a compile-time constant std::size_t that is permitted to be used as a template parameter. This means one cannot use a traditional for-loop to iterate through all elements to process them. While one could write a recursive function template to do this, that is too much work. It is better to create a list of std::size_t indices inside of a template parameter list that can be unrolled with a variadic template parameter list of tuples. In essence, if the tuple has 5 elements, one wants to obtain these indices:

indices<0,1,2,3,4>

These indices can be generated in a straight-forward manner. First one needs to tell the C++ compiler that indices is an empty struct that has zero or more std::size_t valued template parameters:

template <std::size_t...>
struct indices { };

The purpose of indices is only to hold the index values needed to be used with:

get<INDEX>(tuple_var)

The actual generating of these values is done in a C++ TMP metafunction called make_seq_indices.

If you are familiar with what was discussed in previous posts concerning make_seq_indices, know that the solution below was rewritten to be simpler and safer (since it checks that Begin <= End).

The definition of make_seq_indices is in several parts. It is defined using a struct that is partially specialized over its template arguments. The partial specializations cover these conditions:

  • the general case, i.e., when Begin > End, if invoked generates a compile-time error message via static_assert,
  • when Begin < End, and
  • when Begin == End which ends the make_seq_indices metafunction.

The definition of the general case requires Begin and End to be set, it sets Indices to be an empty list, and the final parameter is needed to use std::enable_if to define an appropriate partial specialization based on the values of Begin and End. The definition of the general case of make_seq_indices is:

template <
  std::size_t Begin,
  std::size_t End,
  typename Indices = indices<>,
  typename Enable = void
>
struct make_seq_indices
{
  static_assert(Begin <= End, "Begin must be <= End");
};

Due to the use of std::enable_if below, the general case will only be used when Begin <= End.

In the case when Begin < End one needs to add the current value of Begin to the end of the list of indices, I<Indices...> and recursively invoke make_seq_indices with the value of Begin plus one:

template <
  std::size_t Begin,
  std::size_t End,
  template <std::size_t...> class I,
  std::size_t... Indices
>
struct make_seq_indices<
  Begin, End,
  I<Indices...>,
  typename std::enable_if<Begin < End, void>::type
>
{
  using type =
    typename make_seq_indices<
      Begin+1, End,
      I<Indices..., Begin>
    >::type
  ;
};

Finally, when Begin == End the recursive calls must stop and the resulting Indices type is returned:

template <
  std::size_t Begin,
  std::size_t End,
  typename Indices
>
struct make_seq_indices<
  Begin, End,
  Indices,
  typename std::enable_if<Begin == End, void>::type
>
{
  using type = Indices;
};

Since it is very annoying to write expressions such as:

typename make_seq_indices<0, 10>::type

to obtain a list of indices, a C++11 type alias will be used instead like this:

make_seq_indices_T<0, 10>

to obtain a list of indices. The type alias is defined as:

template <std::size_t Begin, std::size_t End>
using make_seq_indices_T =
  typename make_seq_indices<Begin, End>::type
;


Applying an Operation on a List of Function Arguments

It is useful to consider how to pass a set of function arguments to a function or functor. The code to do this is:

template <typename Op, typename... Args>
inline constexpr auto apply(Op&& op, Args&&... args) ->
  decltype(std::forward<Op>(op)(std::forward<Args>(args)...))
{
  return std::forward<Op>(op)(std::forward<Args>(args)...);
}

The use of std::forward in conjunction with Op&& and Args&&... allows the following:

  • the compiler will perfectly forward op and all arguments, args... to the function/functor being called, and,
  • the std::forward used with op permits op to be treated as an rvalue if it is an rvalue.

The second bullet is important since C++11 supports "Move Semantics For *this" member functions and using std::forward on op will ensure op is an rvalue if it was originally an rvalue.

For more information on "Move Semantics For *this" see N2439.

Unlike the tuple solution below, nothing special needs to be done to pass the arguments in this version to op since the Args parameter pack is simply expanded inside the call to op:

std::forward<Args>(args)...


Applying an Operation to Each and Every std::tuple Element

Unlike the apply() function above, to apply a tuple the apply_tuple() function will receive a single std::tuple argument –but it will need to retrieve each tuple element for that single argument. Since the single std::tuple function argument is not and cannot be a template parameter pack, one needs a way of obtaining a template parameter pack containing the indices required to invoke the std::get() function on each tuple argument. Doing so enables one to expand the template parameter pack over the indices –which will give us the desired result:

// This function overload applies op to all tuple indices...
template <
  typename Op,
  typename Tuple,
  template <std::size_t...> class I,
  std::size_t... Indices
>
inline constexpr auto apply_tuple(
  Op&& op,
  Tuple&& t,
  I<Indices...>&&
) ->
  decltype(
    std::forward<Op>(op)(
      std::get<Indices>(std::forward<Tuple>(t))...
    )
  )
{
  return
    std::forward<Op>(op)(
      std::get<Indices>(std::forward<Tuple>(t))...
    )
  ;
}

But this is not the function the client user will invoke since one wants the indices to be generated automatically. Generating indices was the purpose of make_seq_indices and make_seq_indices_T metafunction! Thus, the definition of apply_tuple() client will directly invoke is:

// This function overload forwards op and t along with the
// indices of the tuple generated by make_seq_indices...
template <
  typename Op,
  typename Tuple,
  typename Indices =
    make_seq_indices_T<
      0,
      std::tuple_size<typename std::decay<Tuple>::type>::value
    >
>
inline constexpr auto apply_tuple(Op&& op, Tuple&& t) ->
  decltype(
    apply_tuple(
      std::forward<Op>(op),
      std::forward<Tuple>(t),
      Indices{}
    )
  )
{
  return
    apply_tuple(
      std::forward<Op>(op),
      std::forward<Tuple>(t),
      Indices{}
    )
  ;
}

Notice that there is an extra template parameter, Indices, that has a default value set. The caller of this function will never set its value. This is a handy way of creating a template parameter "variable" to hold a compile-time computed value in a function template's parameter list. In this particular instance, the Indices is used twice and by storing Indices in a template parameter, one does not have to copy-and-paste the code for it twice.

Earlier code went through some extraordinary tricks to obtain the return type of the call, op(args...). With C++11's decltype, the return value of such can be obtained by writing, decltype(op(args...)) provided the new function declarator syntax is used where the return type appears after the function arguments. This code is easier to read and understand. Unfortunately, one does have to duplicate the code in with the function's return type and with the return statement in the function.


A Sample Use

The following code is an example of using apply() and apply_tuple():

#include <iostream>

double add(int a, double b) { return a+b; }
std::tuple<int,double> some_tuple{2, 34.2};

int main()
{
  using namespace std;
  cout
    << apply(add, 2, 34.2) << endl
    << apply_tuple(add, some_tuple) << endl
  ;
}


Closing Comments

This solution is shorter and much easier to maintain than the previous solutions in my earlier blog posts. This is understandable given the amount of time that has elapsed since I first started and the evolution of C++ compilers from C++98 support to fully supporting C++11. I have been doing a lot of C++11 coding (normal and C++ TMP) within the last year and have been able to explore all of the new C++11 language features and having been learning new and better ways to write code that is more easily maintained and more efficient (than C++98, e.g., fully exploiting move semantics). One of the tasks that I have been doing now that GCC and CLang fully support C++11, is revisiting and updating old code to better use C++11's features. This post clearly shows doing such can be well-worth the effort!

I will again resume blogging concerning C++11 topics and other cool things in the fall. Between now and then, I will be posting some simpler coding-related posts as fun things to try out! 🙂

Applying Tuples To Functors and Functions: The Home Stretch (Part III)

Overview

I missed three things that allow the previously posted solution to require half of the functions (3 instead of 6):

  1. it is legal to return the result of an expression whose type is void() if the function itself is a void function,
  2. std::get() is sufficiently overloaded for lvalues and rvalues so some of the uses of std::forward need not be used,
  3. the use of a class to extract the template parameter list can be eliminated if passed as an unused function argument (i.e., to rely on function argument type deduction).

The code presented below applies these changes.

Bryan (a.k.a. Bearded Code Warrior) has updated his blog to of his distinct alternate version of apply_tuple().


Supporting Code

The supporting code to the solution is the same except std::enable_if is no longer needed:

#include <cstddef>
#include <tuple>
#include <type_traits>

//===================================================================

template <std::size_t...>
struct indices { };

//===================================================================

template <
  std::size_t Begin,
  std::size_t End,
  typename Indices
>
struct make_seq_indices_impl;

template <
  std::size_t Begin,
  std::size_t End,
  std::size_t... Indices
>
struct make_seq_indices_impl<Begin, End, indices<Indices...>>
{
  using type =
    typename
      make_seq_indices_impl<Begin+1, End,indices<Indices..., Begin>
    >::type
  ;
};

template <std::size_t End, std::size_t... Indices>
struct make_seq_indices_impl<End, End, indices<Indices...>>
{
  using type = indices<Indices...>;
};

template <std::size_t Begin, std::size_t End>
using make_seq_indices =
  typename make_seq_indices_impl<Begin, End, indices<>>::type;

//===================================================================

template <typename F>
using return_type = typename std::result_of<F>::type;

template <typename Tuple>
constexpr std::size_t tuple_size()
{
  return std::tuple_size<typename std::decay<Tuple>::type>::value;
}

// No definition exists for the next prototype...
template <
  typename Op,
  typename T,
  template <std::size_t...> class I, std::size_t... Indices
>
constexpr auto apply_tuple_return_type_impl(
  Op&& op,
  T&& t,
  I<Indices...>
) ->
  return_type<Op(
    decltype(std::get<Indices>(std::forward<T>(t)))...
  )>
;

// No definition exists for the next prototype...
template <typename Op, typename T>
constexpr auto apply_tuple_return_type(Op&& op, T&& t) ->
  decltype(apply_tuple_return_type_impl(
    op, t, make_seq_indices<0,tuple_size<T>()>{}
  ));


One Function For apply()

The previous version of apply() required two versions: one for void returns and one for non-void. Since C++ permits returning a void expression when the function is void, e.g.,

void f()
{
}

void g()
{
  return f();
}

only one version of apply() is needed to handle both void and non-void versions:

template <typename Op, typename... Args>
inline auto apply(Op&& op, Args&&... args)
  -> return_type<Op(Args...)>
{
  return op( std::forward<Args>(args)... );
}


Two Functions For apply_tuple()

Similarly, the previous version of apply_tuple() required four versions: two for non-void and two for void. Thus, one can reduce the number of functions needed to two from four:

template <
  typename Ret, typename Op, typename T,
  template <std::size_t...> class I, std::size_t... Indices
>
inline Ret apply_tuple(Op&& op, T&& t, I<Indices...>)
{
  return op(std::get<Indices>(std::forward<T>(t))...);
}

template <typename Op, typename Tuple>
inline auto apply_tuple(Op&& op, Tuple&& t)
  -> decltype(apply_tuple_return_type(
    std::forward<Op>(op), std::forward<Tuple>(t)
  ))
{
  return
    apply_tuple<
      decltype(apply_tuple_return_type(
        std::forward<Op>(op), std::forward<Tuple>(t)
      ))
    >(
      std::forward<Op>(op), std::forward<Tuple>(t),
      make_seq_indices<0,tuple_size<Tuple>()>{}
    );
  ;
}

Notice that the indices are passed as an unused third argument for the function template to infer Indices. The C++ compiler will (or should!) eliminate the overhead of passing such a parameter.


Closing Comments

The above version is terse, short, efficient, and substantially easier to understand than any of the previous versions. (My apologies for missing (oops!) the above easy simplications which were pointed out in a comment to my previously posted article.)

The technique used by Bryan mentioned earlier conceptually does the same thing. The difference with Bryan's solution is how it works. Bryan's solution recursively unrolls all template arguments and then applies them to the function, whereas, the solutions I've presented expands all template arguments all at once (i.e., without using recursion to unroll them).

Applying Tuple To Functors and Functions: The Home Stretch (Part II)

Overview

If you've been following this article series, this is the article you've been waiting to read! It is the one where I show the final versions of apply() and apply_tuple(). These versions have the following features:

  • perfect forwarding of all tuple elements to the corresponding arguments of a function (as with all previous versions), and,
  • rvalue template argument deduction is used to reduce the previous solution's apply_tuple() and apply_tuple_impl::apply_tuple() total of eight (8) functions to four (4),

This article will present the full solution while briefly discussing the elements that make it work.

Update (Feb. 15, 2012): After posting this article, I realized I missed a few (obvious!) simplifications to the code which make it substantially easier to read and understand. The main change was to make apply_tuple() use an easy-to-use template alias as a compile-time type function guard for std::enable_if. A non-cosmetic change was also made as well: the apply_tuple_return_type_impl() prototype was modified to use std::get to remove any possibility of any type mismatches (e.g., differences with const/volatile qualifiers) within the code. I have replaced the previous code and updated the text only in places that make specific code references that would otherwise no longer match the code.


Supporting Code

Required #include Files

The following are the required #include files:

#include <cstddef>
#include <tuple>
#include <functional>
#include <type_traits>

Computing The Indices

In order to extract the elements of a std::tuple, the tuple index is needed as a template parameter, and, to extract all elements of a tuple in a single parameter expansion, a complete set of tuple parameter indices is needed to index the provided tuple. This solution stored the indices using the following type:

template <std::size_t...>
struct indices;

which has no definition since there will no (run-time) instance of indices created.

Generating The Indices

Previously, the indices were generated in a slightly different way. In order to make it easier to reduce the number of apply_tuple()-related functions from eight (8) to four (4), the method of generating indices was changed. Now, to generate the indices, one uses make_seq_indices<BEGIN,END> where BEGIN is the starting index (i.e., 0) and END is one past the last index. Thus the definition of make_seq_indices and its supporting code is:

template <std::size_t Begin, std::size_t End, typename Indices>
struct make_seq_indices_impl;

template <std::size_t Begin, std::size_t End, std::size_t... Indices>
struct make_seq_indices_impl<Begin, End, indices<Indices...>>
{
  using type =
    typename
      make_seq_indices_impl<Begin+1, End,indices<Indices..., Begin>
    >::type
  ;
};

template <std::size_t End, std::size_t... Indices>
struct make_seq_indices_impl<End, End, indices<Indices...>>
{
  using type = indices<Indices...>;
};

template <std::size_t Begin, std::size_t End>
using make_seq_indices =
  typename make_seq_indices_impl<Begin, End, indices<>>::type;

Indirectly Determining the Return Type

The first argument to apply() and apply_tuple() is a function or a functor. For the code to work, one needs to be able to easily determine the return type without directly using template template parameters within apply_tuple(). I came up with a clever way to do this only using function prototypes:

template <typename F>
using return_type = typename std::result_of<F>::type;

template <typename Tuple>
constexpr std::size_t tuple_size()
{
  return std::tuple_size<typename std::decay<Tuple>::type>::value;
}

template <
  typename Op,
  typename T,
  template <std::size_t...> class I,
  std::size_t... Indices
>
constexpr auto apply_tuple_return_type_impl(
  Op&& op, T&& t, I<Indices...>
)
  -> return_type<
    Op(decltype(std::get<Indices>(std::forward<T>(t)))...)
  >;

template <typename Op, typename T>
constexpr auto apply_tuple_return_type(Op&& op, T&& t)
  -> decltype(
    apply_tuple_return_type_impl(
      op, t, make_seq_indices<0,tuple_size<T>()>{}
    )
  );

where:

  • return_type and tuple_size are convenience compile-time type functions, and,
  • apply_tuple_return_type accepts forwarded types and passes them to apply_tuple_return_type_impl for template argument deduction so that function's/functor's return type can be determined.

The trick here is to:

  • use function prototypes to help ensure run-time code for these functions is never generated,
  • use function prototypes to trigger template argument deduction,
  • use the auto return type to permit the use of std::result_of to find the return type of Op(OpArgs...), and,
  • the auto return type to allow decltype to be used with the function's arguments applied to apply_tuple_return_type_impl to enable computing the return type.

A consequence of the above is that it allows any client code to determine the return type of a function/functor without directly using template template parameters. For example, later in this article apply_tuple_return_type_impl is used like this:

decltype(apply_tuple_return_type(
  std::forward<Op>(op),
  std::forward<Tuple>(t)
))

Since template template parameters are not (directly) used used to determine the types that will be passed to the function/functor, there is no need to write both lvalue and rvalue overloaded functions to handle the tuple parameter. Since the above definition also uses the same function, std::get, to determine tuple element type as apply_tuple() no type mismatches will occur as a result of using this prototype and the apply_tuple() functions.

Aside: Ensuring that template template parameters need not be directly used along with another item mentioned later was key to reducing the number of functions required from eight (8) to four (4) with apply_tuple and apply_tuple_impl::apply_tuple().

Helper Functions For Readability

Without the following definitions, the presented code is harder to read and understand:

template <typename T>
constexpr bool is_void()
{
  return std::is_void<T>::value;
}

template <typename F>
constexpr bool returns_void()
{
  return std::is_void<return_type<F>>::value;
}

template <bool Cond, typename T>
using enable_if = typename std::enable_if<Cond,T>::type;

template <typename Ret>
using if_returns_void_case = enable_if<is_void<Ret>(),Ret>;

template <typename Ret>
using if_returns_nonvoid_case = enable_if<!is_void<Ret>(),Ret>;

where:

  • is_void, returns_void, and enable_if were defined to avoid having to write typename and ::type in numerous places,
  • if_returns_nonvoid_case invokes std::enable_if producing a valid type when op does not return void, and,
  • if_returns_void_case invokes std::enable_if producing a valid type when op returns void.

These definitions will beautify the code below.


Defining apply() and apply_tuple()

Definition of apply()

Recall that the definition of apply() does not use tuples: it simply forwards all arguments after the first argument to Op. So its definition remains unchanged:

template <typename Op, typename... Args>
inline auto apply(Op&& op, Args&&... args)
  -> if_returns_nonvoid_case<return_type<Op(Args...)>>
{
  return op( std::forward<Args>(args)... );
}

template <typename Op, typename... Args>
inline auto apply(Op&& op, Args&&... args)
  -> if_returns_void_case<return_type<Op(Args...)>>
{
  op( std::forward<Args>(args)... );
}

Certainly, this was the easy case! 🙂

Definition of apply_tuple()

The definitions below forwards all tuple elements to a functor or a function:

// void cannot be the default value for Enable...
template <typename Indices, typename Ret, typename Enable = Ret>
struct apply_tuple_impl;

// void return case...
template <
  template <std::size_t...> class I, std::size_t... Indices,
  typename Ret
>
struct apply_tuple_impl<I<Indices...>, Ret, void>
{
  template <typename Op, typename T>
  inline static Ret apply_tuple(Op&& op, T&& t)
  {
    op(
      std::forward<
        decltype(std::get<Indices>(std::forward<T>(t)))
      >(std::get<Indices>(std::forward<T>(t)))...
    );
  }
};

// non-void return case...
template <
  template <std::size_t...> class I, std::size_t... Indices,
  typename Ret
>
struct apply_tuple_impl<
  I<Indices...>,
  Ret,
  if_returns_nonvoid_case<Ret>
>
{
  template <typename Op, typename T>
  inline static Ret apply_tuple(Op&& op, T&& t)
  {
    return op(
      std::forward<
        decltype(std::get<Indices>(std::forward<T>(t)))
      >(std::get<Indices>(std::forward<T>(t)))...
    );
  }
};

// non-void return case...
template <typename Op, typename Tuple>
inline auto apply_tuple(Op&& op, Tuple&& t) ->
  if_returns_nonvoid_case<
    decltype(
      apply_tuple_return_type(
        std::forward<Op>(op), std::forward<Tuple>(t)
      )
    )
  >
{
  return
    apply_tuple_impl<
      make_seq_indices<0,tuple_size<Tuple>()>,
      decltype(
        apply_tuple_return_type(
          std::forward<Op>(op), std::forward<Tuple>(t)
        )
      )
    >::apply_tuple(std::forward<Op>(op),std::forward<Tuple>(t))
  ;
}

// void return case...
template <typename Op, typename Tuple>
inline auto apply_tuple(Op&& op, Tuple&& t) ->
  if_returns_void_case<
    decltype(
      apply_tuple_return_type(
        std::forward<Op>(op), std::forward<Tuple>(t)
      )
    )
  >
{
  apply_tuple_impl<
    make_seq_indices<0,tuple_size<Tuple>()>,
    decltype(
      apply_tuple_return_type(
        std::forward<Op>(op), std::forward<Tuple>(t)
      )
    )
  >::apply_tuple(std::forward<Op>(op),std::forward<Tuple>(t));
}

Starting with apply_tuple(), note the following:

  1. Two versions of apply_tuple() are needed: one for operations returning void and one for those returning a type. The use of enable_if allows the compiler to select the correct function based on Op.
  2. std::decay is used to strip Tuple down to is base type so that the tuple's size can be determined.
  3. With the size, make_seq_indices is invoked to generate the indices.
  4. decltype is used to determine the operation's return type.

The purpose of apply_tuple() is to forward everything to apply_tuple_impl::apply_tuple(). This is needed since the indices must be a parameter pack. Unlike previous version, the indices and the return type are computed without using a template template type within apply_tuple(). With apply_tuple() setting things up, the definition of apply_tuple_impl::apply_tuple() is straight-forward. The reader will note the code is identical to previous code since it still invokes std::get<Indices>(std::forward<T>(t)). This is what has changed:

  1. The template arguments do not make use of any template template parameters. This allows rvalue template argument deduction and reference collapsing to take place. Consequently, this allows one to write one function for the two cases: the void return type and non-void return type.
  2. What allowed Item 1 to be done was realizing that one wants to use decltype within the std::forward template parameter to determine the type of each element instead of the tuple's template argument parameter pack. While this duplicates the get code twice, it permits the parameter pack expansion to work identically without explicitly using any template template parmeters. It also avoids any possibility of incorrect types between what is passed to the function and what is provided to std::forward.

The choice of not directly using template template parameters is what enables this solution to eliminate four (4) function definitions (i.e., the lvalue overloads). As a bonus the solution's type safety is now robust since std::forward's parameter type exactly matches what is provided to the function. This avoids the possibility of compiler "error novels" should somehow some of the types don't match up.


Closing Comments

Overall, this solution is not pretty, but, it is straight-forward, maintainable (especially if some comments are added), and, best of all, its use by end-users is almost effortless. The end-user need not ever see the above as he/she would simply write code like this:

double add(int a, double b)
{
  return a+b;
}

//...
std::tuple<int,double> some_tuple;
// ...

apply(add, 2, 34.2);  // or
apply_tuple(add, some_tuple);

Such code can be easily written by newcomers and experienced C++ programmers alike!

Happy Coding! 🙂

Applying Tuple To Functors and Functions: The Home Stretch (Part I)

Overview

This article updates my previous article on writing an efficient implementation of applying a tuple to a function using perfect forwarding.

This coding endeavour and article series started last year when fellow student, Bryan St. Amour, wrote an initial version and we've been hacking and discussing this (and other metaprogramming topics) off-and-on every since with our own separate code solutions:

  • his implementation unrolls the tuple to be applied using function calls, and,
  • my version relies on expanding the arguments pack within a single function call.

Bryan's solution syntactically resembles a functional programming style, e.g., his latest version uses template aliases as type functions and constexpr functions to perform the "magic" in a terse way. My approach has been to generate the function call without "recursively" unrolling the tuple elements using a function, i.e., to unroll all tuple elements in a single parameter pack expansion. Both of our solutions perform exactly the same operation from the end-user's perspective and are efficient with various pros and cons to each version –which will be a future article.

Aside: Bryan has an excellent update to his code and tells me he will be blogging about it soon. When he does I will delete this comment and link directly to it in the above paragraph.

Bryan and I agree we have worked out the details of what is necessary to extract a std::tuple's elements and efficiently pass them to a function call: we have been (and are) polishing our code and better exploiting C++11. Hence, we are in "The Home Stretch".

To avoid having far too much in one article, I will split up The Home Stretch article over a series of blog posts. The solutions presented so far:

  • do not yet handle applying a tuple to function that return void,
  • do not yet employ rvalue template argument deduction for apply_tuple() and apply_tuple_impl::apply_tuple() functions, and,
  • do not yet make use of template aliases and/or constexpr functions in C++11.

Clearly a final solution should have such with the same efficiency of the previously discussed versions. Know that such is possible and will be presented.

Article Source Code: cpp_code-apply_tuple-v6.zip


Handling void Functions/Functors

A small oversight in the previously presented code was that it could not handle functions and functors that return void. Fortunately this is easy to fix by using std::enable_if found in <type_traits>.

How std::enable_if Works

Details concerning std::enable_if can be found in the Boost Enable If library which explains one can use std::enable_if in practice with class and function templates.

NOTE: This article will only explore one aspect of its use as it pertains to the presented code.

A C++11 definition of std::enable_if could be:

template <bool Cond, class T>
struct enable_if { typedef T type; };

template <class T>
struct enable_if<false, T> { };

which makes it a compile-time C++ template metaprogramming function where:

  • if Cond is true, then std::enable_if<true,T>::type is defined and is a valid type, otherwise,
  • if Cond is false, then std::enable_if<false,T>::type is not defined and is, therefore, an invalid type.

It is important to recognize that templates are patterns and when template parameters are substituted in the template, the result may or may not contain valid code and/or type constructs. Equally important is recognizing, in the case of function templates, that the C++ compiler will look for the best match of a given function when it is overloaded. Both of these together will permit std::enable_if to be used without triggering a compile-time error and to properly select code for a function that returns void versus one that does not. For example, consider:

template <class T>
auto example(T)
  ->
    typename std::enable_if<
      std::is_same<T,void*>::value,
      void
    >::type
{
  std::cout << "void return!" << std::endl;
}

template <class T>
auto example(T t)
  ->
    typename std::enable_if<
      !std::is_same<T,void*>::value,
      T
    >::type
{
  std::cout << "This one returns t!" << std::endl;
  return t;
}

for which one should note:

  • if the type of the argument is void*, then std::enable_if<Cond,T>::type in the first example function will be void since std::is_same<T,void*>::value is true but it is not defined in the second example function; or,
  • if the type of the argument is not void*, then std::enable_if<Cond,T>::type in the first example function but it is not defined in the second.

Since one of the definitions always results in an invalid definition, the compiler discards the function template with the invalid return type (which would be a compile-time error if there wasn't another function that could be selected) in favour of the one with a valid return type. Thus, given any type T, there exists an unambiguous definition of example for the compiler to use. 🙂

Aside: C and C++ compilers only select the function to call based on its name and the number and types of its arguments. Never is the return type of the function used to select which function to call. This has not changed here: std::enable_if simply permits one to select a function template based on one or more of its template parameters.

Using std::enable_if With apply_tuple

It is now straight-forward to use std::enable_if with apply_tuple to handle void functions as follows:

  • given a void function, apply_tuple cannot be defined to return the result of calling apply_tuple_impl::apply_tuple, and,
  • given a non-void function, apply_tuple must return what apply_tuple_impl::apply_tuple returns.

Except for the addition of std::enable_if, the previous code's definitions of apply_tuple and apply_tuple_impl::apply_tuple represent the latter. Thus, like example(T) above, one only needs to write versions of apply_tuple and apply_tuple_impl::apply_tuple when a void function is passed as an argument. Before presenting the solution, let's define two C++11 template aliases to determine the return type using std::enable_if so the code is more readable and maintainable:

template <typename Op, typename... Args>
using enable_if_op_does_not_return_void =
  typename std::enable_if<
    !std::is_void<
      typename std::result_of<Op(Args...)>::type
    >::value,
    typename std::result_of<Op(Args...)>::type
  >::type
;

template <typename Op, typename... Args>
using enable_if_op_returns_void =
  typename std::enable_if<
    std::is_void<
      typename std::result_of<Op(Args...)>::type
    >::value,
    typename std::result_of<Op(Args...)>::type
  >::type
;

which permits apply() to be defined as:

template <typename Op, typename... Args>
inline auto apply(Op&& op, Args&&... args)
  -> enable_if_op_does_not_return_void<Op,Args...>
{
  return op( std::forward<Args>(args)... );
}

template <typename Op, typename... Args>
inline auto apply(Op&& op, Args&&... args)
  -> enable_if_op_returns_void<Op,Args...>
{
  op( std::forward<Args>(args)... );
}

and apply_tuple() to be defined as:

template <
  typename Indices
>
struct apply_tuple_impl;

template <
  template <std::size_t...> class I,
  std::size_t... Indices
>
struct apply_tuple_impl<I<Indices...>>
{
  template <
    typename Op,
    typename... OpArgs,
    template <typename...> class T
  >
  static auto apply_tuple(Op&& op, T<OpArgs...>&& t)
    -> enable_if_op_does_not_return_void<Op,OpArgs...>
  {
    return op( std::forward<OpArgs>(std::get<Indices>(t))... );
  }

  template <
    typename Op,
    typename... OpArgs,
    template <typename...> class T
  >
  static auto apply_tuple(Op&& op, T<OpArgs...>&& t)
    -> enable_if_op_returns_void<Op,OpArgs...>
  {
    op( std::forward<OpArgs>(std::get<Indices>(t))... );
  }

  template <
    typename Op,
    typename... OpArgs,
    template <typename...> class T
  >
  static auto apply_tuple(Op&& op, T<OpArgs...> const& t)
    -> enable_if_op_does_not_return_void<Op,OpArgs...>
  {
    return op( std::forward<OpArgs const>(std::get<Indices>(t))... );
  }

  template <
    typename Op,
    typename... OpArgs,
    template <typename...> class T
  >
  static auto apply_tuple(Op&& op, T<OpArgs...> const& t)
    -> enable_if_op_returns_void<Op,OpArgs...>
  {
    op( std::forward<OpArgs const>(std::get<Indices>(t))... );
  }
};

template <
  typename Op,
  typename... OpArgs,
  typename Indices = typename make_indices<0, OpArgs...>::type,
  template <typename...> class T
>
inline auto apply_tuple(Op&& op, T<OpArgs...>&& t)
  -> enable_if_op_does_not_return_void<Op,OpArgs...>
{
  return apply_tuple_impl<Indices>::apply_tuple(
    std::forward<Op>(op),
    std::forward<T<OpArgs...>>(t)
  );
}

template <
  typename Op,
  typename... OpArgs,
  typename Indices = typename make_indices<0, OpArgs...>::type,
  template <typename...> class T
>
inline auto apply_tuple(Op&& op, T<OpArgs...> const& t)
  -> enable_if_op_does_not_return_void<Op,OpArgs...>
{
  return apply_tuple_impl<Indices>::apply_tuple(
    std::forward<Op>(op),
    std::forward<T<OpArgs...> const>(t)
  );
}

template <
  typename Op,
  typename... OpArgs,
  typename Indices = typename make_indices<0, OpArgs...>::type,
  template <typename...> class T
>
inline auto apply_tuple(Op&& op, T<OpArgs...>&& t)
  -> enable_if_op_returns_void<Op,OpArgs...>
{
  apply_tuple_impl<Indices>::apply_tuple(
    std::forward<Op>(op),
    std::forward<T<OpArgs...>>(t)
  );
}

template <
  typename Op,
  typename... OpArgs,
  typename Indices = typename make_indices<0, OpArgs...>::type,
  template <typename...> class T
>
inline auto apply_tuple(Op&& op, T<OpArgs...> const& t)
  -> enable_if_op_returns_void<Op,OpArgs...>
{
  apply_tuple_impl<Indices>::apply_tuple(
    std::forward<Op>(op),
    std::forward<T<OpArgs...> const>(t)
  );
}

Success! 🙂


Closing Comments

Although the above code now supports void functions, the number of apply-related functions has doubled even though the code within each function is essentially the same. Fortunately, it is possible to exploit C++11's ability to use rvalue template argument deduction to accept lvalues or rvalues (by collapsing references). This feature only works if the function argument is a simple template type (which it is not here). Fortunately, with some cleverness, this is possible and will be the focus of the next article. The resulting solution will eliminate one half of the functions above (i.e., those taking constant references will be eliminated). 🙂

Tweaking Applying std::tuple To Functors Efficiently

Overview

In previous posts I developed and tested code that could apply tuples to functions and function objects (functors) efficiently. In a reply comment to my last post, it was noted that apply_tuple() does not work with lvalues. This post outlines how to fix the code so it works with both lvalues and rvalues.


Some Code Demonstrating That Lvalues Won't Compile

Nothing fancy has to be done to show that passing an lvalue to apply_tuple() will fail to compile. One need only write some code like this:

// place apply_tuple-related code here

int add(int a, int b)
{
  return a + b;
}

int main()
{
  std::tuple<int,int> t;
  apply_tuple(add, t);
  // etc.
}

This code fails to compile since the tuple argument is not an rvalue and the tuple argument type is not an exactly-as-passed template type parameter.

C++11 defines a special template argument deduction rule when a function template accepts an argument as an rvalue reference to a template type. When this occurs, if the argument is an lvalue reference of type T, then it is treated as T&; if the argument is an rvalue reference of type T, then it is treated as T&&. This convenience is only applied when T is an exactly-as-passed template argument.

Note the tuple type is not exactly-as-passed (it is computed):

template <
  typename Op,
  typename... OpArgs,
  typename Indices = typename make_indices<0, OpArgs...>::type,
  template <typename...> class T
>
auto apply_tuple(Op&& op, T<OpArgs...>&& t)
  -> typename std::result_of<Op(OpArgs...)>::type
{
  return apply_tuple_impl<Indices>::apply_tuple(
    std::forward<Op>(op),
    std::forward<T<OpArgs...>>(t)
  );
}


Fixing The Problem

When such situations arise the easiest way to fix it is to add an overload of the function that accepts an rvalue reference so that it accepts an lvalue reference, i.e., use a const&, instead:

// Rvalue apply_tuple()...
template <
  typename Op,
  typename... OpArgs,
  typename Indices = typename make_indices<0, OpArgs...>::type,
  template <typename...> class T
>
auto apply_tuple(Op&& op, T<OpArgs...>&& t)
  -> typename std::result_of<Op(OpArgs...)>::type
{
  return apply_tuple_impl<Indices>::apply_tuple(
    std::forward<Op>(op),
    std::forward<T<OpArgs...>>(t)
  );
}

// Lvalue apply_tuple()...
// std::forward could have been removed. It was kept
//   - to keep appearance of code close to original
//   - to show that, if kept, const needs to be used
template <
  typename Op,
  typename... OpArgs,
  typename Indices = typename make_indices<0, OpArgs...>::type,
  template <typename...> class T
>
auto apply_tuple(Op&& op, T<OpArgs...> const& t)
  -> typename std::result_of<Op(OpArgs...)>::type
{
  return apply_tuple_impl<Indices>::apply_tuple(
    std::forward<Op>(op),
    std::forward<T<OpArgs...> const>(t)
  );
}

and apply_tuple_impl<Indices>::apply_tuple() also needs to be similarly overloaded:

template <
  typename Indices
>
struct apply_tuple_impl;

template <
  template <std::size_t...> class I,
  std::size_t... Indices
>
struct apply_tuple_impl<I<Indices...>>
{
  // Rvalue apply_tuple()...
  template <
    typename Op,
    typename... OpArgs,
    template <typename...> class T
  >
  static auto apply_tuple(Op&& op, T<OpArgs...>&& t)
    -> typename std::result_of<Op(OpArgs...)>::type
  {
    return op(
      std::forward<OpArgs>(std::get<Indices>(t))...
    );
  }

  // Lvalue apply_tuple()...
  // Notice the added const used with std::forward's OpArgs.
  template <
    typename Op,
    typename... OpArgs,
    template <typename...> class T
  >
  static auto apply_tuple(Op&& op, T<OpArgs...> const& t)
    -> typename std::result_of<Op(OpArgs...)>::type
  {
    return op(
      std::forward<OpArgs const>(std::get<Indices>(t))...
    );
  }
};

Thus, by adding overloads for constant references (i.e., lvalues) apply_tuple() can now be used with lvalues. Adding these lvalues overloads is necessary since we could not exploit the special template argument rvalue deduction rule.


Closing Comments

The special template argument rvalue deduction rule is very convenient but its applicability is limited. When the rule cannot be exploited, one needs to write overloads to handle both lvalues and rvalues. With apply_tuple() one needs to extract the OpArgs... non-type parameter which unfortunately seems to prevent one from exploiting the rule.

Testing If std::tuple Is Being Applied To Functors Efficiently

Overview

In computer science it is never good enough to make a claim when the claim can be verified through testing. Fortunately C++ is a rich language which significantly and often aids the writing of test code.

In my previous blog article, Applying std::tuple To Functors Efficiently and Bearded Code Warrior's (BCW) most recent blog entry, Applying a function to a tuple – once again, claims are made that the C++ compiler will pass the tuple arguments efficiently to the desired function.

In my previous post, I had a long preamble explaining how the solution would work before presenting the solution near the end. In this post, I will start with the end results and explain how the code was written to obtain those results. This is done to emphasize why verifying claims/properties of code you've written is important.

Please note that BCW's apply() code is called apply_tuple() within my code. To avoid confusion, within this post all names of any BCW apply() functions are renamed to apply_tuple().

Update Feb. 3, 2012: I have changed Operand's operator ()(T…) function overloads to return Operand*. This ensures no copies/move are made with the returned value from the functor (as such is not relevant to this article series). All code in this series has been updated accordingly.

Article Source Code: cpp_code-apply_tuple-v3.zip.


What Is The Optimal Solution?

Before seeing any test results or writing any test code, first consider what is the optimal solution given a function applied to some arguments:

apply(foo, arg1, arg2, ..., argN);
apply_tuple(foo, make_tuple(arg1, arg2, ..., argN);

Assuming arg1, arg2, …, argN (ARGS) are all C++ class types; assuming all arguments were created prior to the call; and all arguments are valid rvalue references, then the optimal solutions are:

Table 1: Theoretical Optimal Apply Function Results
Call foo Function All N arguments
apply 0 misc
0 copies
0 moves
0 assign
0 misc
0 copies
0 moves
0 assign
apply_tuple 0 misc
0 copies
0 moves
0 assign
0 misc
0 copies
0 moves
0 assign

where:

Table 2: Definitions Of Terms In Table 1
Term Definition
misc Refers to the invocation of a default or any other constructor excluding any copy or move constructors.
copies Refers to the invocation of any copy constructor.
moves Refers to the invocation of any move constructor.
assign Refers to the invocation of any copy/move assignment operator.

For all tests all arguments (except foo) will be created as unnamed or compiler-generated temporary variables. Using only such temporary variables implies that the compiler should be able to reuse them to avoid doing any unnecessary copies/moves, etc. Provided the code being tested is properly written, optimal code should be generated as a result. If such is seen, then this test ought to also be a nice way to verify and test the claim that C++11 allows perfect forwarding!

Table 1 needs to be adjusted since temporary variables will be used throughout for all arguments except the function call argument. Specifically, the following items need to be added to Table 1:

  • the creation of the required (unnamed) initial variables, and,
  • any necessary copies, moves, and/or assignments that must be done for those variables.

These adjustments are reflected in Table 3:

Table 3: Theoretical Optimal Apply Function Results Adjusted For Required Initial Rvalue Reference Variables
Call foo Function All N arguments
apply 1 misc
0 copies
0 moves
0 assign
N misc
0 copies
0 moves
0 assign
apply_tuple 1 misc
0 copies
0 moves
0 assign
N misc
0 copies
N moves
0 assign

Clearly to populate a std::tuple with N rvalue references, a total of N moves, not copies, must be done. Additionally the actual original variables/functor must be created and this accounts for the misc constructor invocations listed in Table 3. Finally, in the case when the foo operation has no arguments, since zero-argument functions often do return a value, a value will be returned in those tests. Thus, there will be "1 misc" foo constructor invocation for any zero-argument cases in the results.

Finally note that within all tests performed within this article, all calls to apply_tuple will use std::make_tuple. Now that the theory looks good, let's check out reality!


The Test Results

Results For My Version Of apply() And apply_tuple()

Table 4 summarizes the results of using my previous blog entry's apply() and apply_tuple():

Table 4: Test Results Using Paul Preney's Apply Code
Call Functor Overhead N Argument Overhead
0 1 2 3
apply 1 misc
0 copies
0 moves
0 assign
1 misc
0 copies
0 moves
0 assign
1 misc
0 copies
0 moves
0 assign
2 misc
0 copies
0 moves
0 assign
3 misc
0 copies
0 moves
0 assign
apply_tuple 1 misc
0 copies
0 moves
0 assign
1 misc
0 copies
0 moves
0 assign
1 misc
0 copies
1 moves
0 assign
2 misc
0 copies
2 moves
0 assign
3 misc
0 copies
3 moves
0 assign

which matches the optimal results in Table 3. (Recall the zero-argument case creates one Operand as its return value hence the "1 misc" in those cases.) The actual test program output for apply() was:

Op() = 1.

Op LOG ============
default_constructor[this=0x7fff7c331c48]
destructor[this=0x7fff7c331c48]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331c50]
destructor[this=0x7fff7c331c50]
Operand HIST ===========
other_constructor 1
destructor 1
Operand END ============

Op()(Operand(-3)) = -3.

Op LOG ============
default_constructor[this=0x7fff7c331c49]
destructor[this=0x7fff7c331c49]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331c60]
destructor[this=0x7fff7c331c60]
Operand HIST ===========
other_constructor 1
destructor 1
Operand END ============

Op()(Operand(-3),Operand(-10)) = -13.

Op LOG ============
default_constructor[this=0x7fff7c331c4a]
destructor[this=0x7fff7c331c4a]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331c80]
other_constructor[this=0x7fff7c331c70]
destructor[this=0x7fff7c331c70]
destructor[this=0x7fff7c331c80]
Operand HIST ===========
other_constructor 2
destructor 2
Operand END ============

Op()(Operand(-3),Operand(-10),Operand(-100)) = -113.

Op LOG ============
default_constructor[this=0x7fff7c331c4b]
destructor[this=0x7fff7c331c4b]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331cb0]
other_constructor[this=0x7fff7c331ca0]
other_constructor[this=0x7fff7c331c90]
destructor[this=0x7fff7c331c90]
destructor[this=0x7fff7c331ca0]
destructor[this=0x7fff7c331cb0]
Operand HIST ===========
other_constructor 3
destructor 3
Operand END ============

and the output for apply_tuple() was:

Op() = 1.

Op LOG ============
default_constructor[this=0x7fff7c331c4c]
destructor[this=0x7fff7c331c4c]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331cc0]
destructor[this=0x7fff7c331cc0]
Operand HIST ===========
other_constructor 1
destructor 1
Operand END ============

Op()(Operand(-3)) = -3.

Op LOG ============
default_constructor[this=0x7fff7c331c4d]
destructor[this=0x7fff7c331c4d]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331cd0]
move_constructor[this=0x7fff7c331ce0,src=0x7fff7c331cd0]
destructor[this=0x7fff7c331ce0]
destructor[this=0x7fff7c331cd0]
Operand HIST ===========
move_constructor 1
other_constructor 1
destructor 2
Operand END ============

Op()(Operand(-3),Operand(-10)) = -13.

Op LOG ============
default_constructor[this=0x7fff7c331c4e]
destructor[this=0x7fff7c331c4e]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331d00]
other_constructor[this=0x7fff7c331cf0]
move_constructor[this=0x7fff7c331d40,src=0x7fff7c331d00]
move_constructor[this=0x7fff7c331d44,src=0x7fff7c331cf0]
destructor[this=0x7fff7c331d44]
destructor[this=0x7fff7c331d40]
destructor[this=0x7fff7c331cf0]
destructor[this=0x7fff7c331d00]
Operand HIST ===========
move_constructor 2
other_constructor 2
destructor 4
Operand END ============

Op()(Operand(-3),Operand(-10),Operand(-100)) = -113.

Op LOG ============
default_constructor[this=0x7fff7c331c4f]
destructor[this=0x7fff7c331c4f]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fff7c331d30]
other_constructor[this=0x7fff7c331d20]
other_constructor[this=0x7fff7c331d10]
move_constructor[this=0x7fff7c331d50,src=0x7fff7c331d30]
move_constructor[this=0x7fff7c331d54,src=0x7fff7c331d20]
move_constructor[this=0x7fff7c331d58,src=0x7fff7c331d10]
destructor[this=0x7fff7c331d58]
destructor[this=0x7fff7c331d54]
destructor[this=0x7fff7c331d50]
destructor[this=0x7fff7c331d10]
destructor[this=0x7fff7c331d20]
destructor[this=0x7fff7c331d30]
Operand HIST ===========
move_constructor 3
other_constructor 3
destructor 6
Operand END ============

where:

  • Op os foo,
  • Op() is the creation of a functor,
  • Op()(...) is the invocation of the functor with the … arguments passed in,
  • Operand is the type representing all arguments and return values,
  • Operand(value) is the creation of an argument with the value of value,
  • Op_LOG and Operand_LOG are the constructor, assignment, and destructor actions performed during the test in chronological order for all Op and Operand types respectively, and,
  • Op_HIST and Operand_HIST are the frequency histograms of the actions performed during the test for all Op and Operand types respectively.

These outputs were produced by the following code:

// TEST apply()...
APPLY_FULL0(Op);
APPLY_FULL1(Op, -3);
APPLY_FULL2(Op, -3, -10);
APPLY_FULL3(Op, -3, -10, -100);

// TEST apply_tuple()...
APPLY_TUPLE_FULL0(Op);
APPLY_TUPLE_FULL1(Op, -3);
APPLY_TUPLE_FULL2(Op, -3, -10);
APPLY_TUPLE_FULL3(Op, -3, -10, -100);

where APPLY_FULLn and APPLY_TUPLE_FULLn are macros that will be defined below. The reason macros were used was to ensure no side-effects were incurred through the use of function calls.

This test demonstrates:

  • my apply() and apply_tuple() implementations are optimal, and,
  • perfect forwarding of function arguments is a real achievement of C++11. 🙂

The code enabling these tests to be performed will be discussed after discussing the results of BCW's apply_tuple().


Results For BCW's Version Of apply_tuple()

BCW's apply_tuple() does not use any rvalue references and instead relies on the compiler optimizing the use of all recursive calls. Consequently, my hypothesis was that there would be more overhead in the general case using objects as the compiler would need to make copies of class instances. This is understandable since C++ compilers cannot use the multitude of assumptions it uses for any primitive types such as int, double, etc. That said, I was shocked to see just how bad the results were using BCW's code (after adding appropriate Operand operations supporting lvalues so BCW's code would compile):

Table 5: Test Results Using BCW's Apply Code's Function Call Overhead
Call Functor Overhead With N Arguments
0 1 2 3
apply N/A
apply_tuple 1 misc
0 copies
0 moves
0 assign
1 misc
1 copies
0 moves
0 assign
1 misc
2 copies
0 moves
0 assign
1 misc
3 copies
0 moves
0 assign

which shows how BCW's code unnecessarily copies the function object with each recursive call and:

Table 6: Test Results Using BCW's Apply Code's Argument Overhead
Call N Argument Overhead
0 1 2 3
apply N/A
apply_tuple 1 misc
0 copies
0 moves
0 assign
1 misc
3 copies
2 moves
0 assign
2 misc
8 copies
3 moves
0 assign
3 misc
16 copies
4 moves
0 assign

which shows how quickly the number of copies explodes as the argument size increases as well that the number of moves is equal to the number of arguments + 1, except in the zero-argument case as a special function overload is used to handle tuples with no elements.

Thus, it appears that BCW's code requires:

  • N copies of its function object, and,
  • some 2^(N+1) copies with N+1 moves of its arguments

for a single N-argument function call instead of the optimal:

  • zero copies of its function object, and,
  • zero copies with N moves of the function arguments.

So for N arguments:

  • N extra copies of the function argument are created,
  • it appears some O(2^(N+1)) copies of the arguments are created, and,
  • N extra moves of the arguments are performed.

all with GCC v4.7 (snapshot 2011-10-22) compiled using -std=c++0x and -O3 options! Wow!

This clearly shows just how relevant and practically useful rvalue references are in C++11 or any language that supports objects. Moreover, given how common functions with zero, one, two, and three arguments are, certainly BCW's code shouldn't be used unmodified with class/struct/union types. Thus, although BCW's code may compile and optimize well with the built-in C++ types, it fails to be anywhere near optimal for general class types.

Interestingly, one should note that BCW's solution, as posted in the blog article, performs, for N arguments, N recursive calls each passing increasing numbers of arguments by value. Such is clearly reflected in the values summarized in Table 5 and Table 6 whose values which were obtained from this test program output (whose computed output is also not completely correct due to some serious compiler warnings about references to temporaries in the recursive step of BCW's solution):

Op() = 1.

Op LOG ============
default_constructor[this=0x7fffede986cc]
destructor[this=0x7fffede986cc]
Op HIST ===========
default_constructor 1
destructor 1
Op END ============
Operand LOG ============
other_constructor[this=0x7fffede986d0]
destructor[this=0x7fffede986d0]
Operand HIST ===========
other_constructor 1
destructor 1
Operand END ============

Op()(Operand(-3)) = 20.

Op LOG ============
default_constructor[this=0x7fffede986cd]
copy_constructor[this=0x7fffede985bf,src=0x7fffede986cd]
destructor[this=0x7fffede985bf]
destructor[this=0x7fffede986cd]
Op HIST ===========
default_constructor 1
copy_constructor 1
destructor 2
Op END ============
Operand LOG ============
other_constructor[this=0x7fffede986e0]
move_constructor[this=0x7fffede986f0,src=0x7fffede986e0]
copy_constructor[this=0x7fffede985d0,src=0x7fffede986f0]
copy_constructor[this=0x7fffede985c0,src=0x7fffede986f0]
copy_constructor[this=0x7fffede985f0,src=0x7fffede985d0]
move_constructor[this=0x7fffede985e0,src=0x7fffede985f0]
destructor[this=0x7fffede985f0]
destructor[this=0x7fffede985e0]
destructor[this=0x7fffede985c0]
destructor[this=0x7fffede985d0]
destructor[this=0x7fffede986f0]
destructor[this=0x7fffede986e0]
Operand HIST ===========
copy_constructor 3
move_constructor 2
other_constructor 1
destructor 6
Operand END ============

Op()(Operand(-3),Operand(-10)) = -13.

Op LOG ============
default_constructor[this=0x7fffede986ce]
copy_constructor[this=0x7fffede984ce,src=0x7fffede986ce]
copy_constructor[this=0x7fffede984cf,src=0x7fffede984ce]
destructor[this=0x7fffede984cf]
destructor[this=0x7fffede984ce]
destructor[this=0x7fffede986ce]
Op HIST ===========
default_constructor 1
copy_constructor 2
destructor 3
Op END ============
Operand LOG ============
other_constructor[this=0x7fffede98710]
other_constructor[this=0x7fffede98700]
move_constructor[this=0x7fffede98750,src=0x7fffede98710]
move_constructor[this=0x7fffede98754,src=0x7fffede98700]
copy_constructor[this=0x7fffede984d0,src=0x7fffede98754]
copy_constructor[this=0x7fffede98520,src=0x7fffede98750]
copy_constructor[this=0x7fffede98524,src=0x7fffede98754]
copy_constructor[this=0x7fffede984f0,src=0x7fffede98520]
copy_constructor[this=0x7fffede984e0,src=0x7fffede984d0]
copy_constructor[this=0x7fffede98530,src=0x7fffede98520]
copy_constructor[this=0x7fffede98534,src=0x7fffede98524]
copy_constructor[this=0x7fffede98510,src=0x7fffede984e0]
move_constructor[this=0x7fffede98500,src=0x7fffede98510]
destructor[this=0x7fffede98510]
destructor[this=0x7fffede98500]
destructor[this=0x7fffede98534]
destructor[this=0x7fffede98530]
destructor[this=0x7fffede984e0]
destructor[this=0x7fffede984f0]
destructor[this=0x7fffede98524]
destructor[this=0x7fffede98520]
destructor[this=0x7fffede984d0]
destructor[this=0x7fffede98754]
destructor[this=0x7fffede98750]
destructor[this=0x7fffede98700]
destructor[this=0x7fffede98710]
Operand HIST ===========
copy_constructor 8
move_constructor 3
other_constructor 2
destructor 13
Operand END ============

Op()(Operand(-3),Operand(-10),Operand(-100)) = -113.

Op LOG ============
default_constructor[this=0x7fffede986cf]
copy_constructor[this=0x7fffede984de,src=0x7fffede986cf]
copy_constructor[this=0x7fffede984df,src=0x7fffede984de]
copy_constructor[this=0x7fffede9831f,src=0x7fffede984df]
destructor[this=0x7fffede9831f]
destructor[this=0x7fffede984df]
destructor[this=0x7fffede984de]
destructor[this=0x7fffede986cf]
Op HIST ===========
default_constructor 1
copy_constructor 3
destructor 4
Op END ============
Operand LOG ============
other_constructor[this=0x7fffede98740]
other_constructor[this=0x7fffede98730]
other_constructor[this=0x7fffede98720]
move_constructor[this=0x7fffede98760,src=0x7fffede98740]
move_constructor[this=0x7fffede98764,src=0x7fffede98730]
move_constructor[this=0x7fffede98768,src=0x7fffede98720]
copy_constructor[this=0x7fffede984e0,src=0x7fffede98768]
copy_constructor[this=0x7fffede98510,src=0x7fffede98760]
copy_constructor[this=0x7fffede98514,src=0x7fffede98764]
copy_constructor[this=0x7fffede98518,src=0x7fffede98768]
copy_constructor[this=0x7fffede98500,src=0x7fffede98514]
copy_constructor[this=0x7fffede984f0,src=0x7fffede984e0]
copy_constructor[this=0x7fffede98520,src=0x7fffede98510]
copy_constructor[this=0x7fffede98524,src=0x7fffede98514]
copy_constructor[this=0x7fffede98528,src=0x7fffede98518]
copy_constructor[this=0x7fffede98340,src=0x7fffede98520]
copy_constructor[this=0x7fffede98330,src=0x7fffede98500]
copy_constructor[this=0x7fffede98320,src=0x7fffede984f0]
copy_constructor[this=0x7fffede98370,src=0x7fffede98520]
copy_constructor[this=0x7fffede98374,src=0x7fffede98524]
copy_constructor[this=0x7fffede98378,src=0x7fffede98528]
copy_constructor[this=0x7fffede98360,src=0x7fffede98320]
move_constructor[this=0x7fffede98350,src=0x7fffede98360]
destructor[this=0x7fffede98360]
destructor[this=0x7fffede98350]
destructor[this=0x7fffede98378]
destructor[this=0x7fffede98374]
destructor[this=0x7fffede98370]
destructor[this=0x7fffede98320]
destructor[this=0x7fffede98330]
destructor[this=0x7fffede98340]
destructor[this=0x7fffede98528]
destructor[this=0x7fffede98524]
destructor[this=0x7fffede98520]
destructor[this=0x7fffede984f0]
destructor[this=0x7fffede98500]
destructor[this=0x7fffede98518]
destructor[this=0x7fffede98514]
destructor[this=0x7fffede98510]
destructor[this=0x7fffede984e0]
destructor[this=0x7fffede98768]
destructor[this=0x7fffede98764]
destructor[this=0x7fffede98760]
destructor[this=0x7fffede98720]
destructor[this=0x7fffede98730]
destructor[this=0x7fffede98740]
Operand HIST ===========
copy_constructor 16
move_constructor 4
other_constructor 3
destructor 23
Operand END ============

using the same macro code as the previous section's apply_tuple with my apply and apply_tuple code was deleted and replaced with BCW's code instead.

BCW's work is typically thorough but this instance shows that optimizations aren't magical, the programmer must use rvalue references where they ought to be used, and one should always double-check efficiency claims for code written. While in this instance BCW's code was found to have issues, we're all human and all coders all been guilty on various occassions of the assuming code efficiencies that weren't true!

I tried tweaking BCW's code while remaining faithful to the style of his solution by adding rvalue references. Unfortuantely, I ran into a snag: the snapshot of GCC v4.7 I am using gives "Sorry, unimplemented" messages when one tries to expand the TS… parameter pack inside a template argument. The latter is needed to obtain the type of each tuple argument using std::tuple_element so it can be passed to std::forward. Without such, one must use a class template as per my solution.


Tracking All Constructors, Assignments, and Destructors

Defining The Tracking Class

So how would one go about easily tracking and logging all constructor invocations, copy/move assignments, and when destructors are called? First one starts by writing a class template that does exactly this (say in a file called tracker.hxx):

#ifndef tracker_hxx_
#define tracker_hxx_

#include <list>
#include <map>
#include <tuple>
#include <ostream>

template <typename T>
struct tracker
{
  public:
    // Handle default constructor tracking...
    tracker()
    {
      update(DefaultConstr, this);
    }

    // Invoke for tracking any other constructors...
    explicit tracker(int)
    {
      update(OtherConstr, this);
    }

    // Handle copy constructor tracking...
    tracker(tracker const& src)
    {
      update(
        CopyConstr,
        this,
        &const_cast<tracker&>(src)
      );
    }

    // Handle move constructor tracking...
    tracker(tracker&& src)
    {
      update(MoveConstr, this, &src);
    }

    // Handle copy-assignment operator tracking...
    tracker& operator =(tracker const& src)
    {
      update(
        CopyAssign,
        this,
        &const_cast<tracker&>(src)
      );
      return *this;
    }

    // Handle move-assignment operator tracking...
    tracker& operator =(tracker&& src)
    {
      update(MoveAssign, this, &src);
      return *this;
    }

    // Handle destructor tracking...
    ~tracker()
    {
      update(Destruct, this);
    }

    // Clear out all log and histogram data...
    static void clear()
    {
      log_.clear();
      hist_.clear();
    }

    // Output the log to a stream...
    static std::ostream& output_log(std::ostream& os)
    {
      for (log_entry_type& le : log_)
      {
        output_log_entry(os, le);
        os << '\n';
      }
      return os;
    }

    // Output the histogram to a stream...
    static std::ostream& output_hist(std::ostream& os)
    {
      for (typename hist_type::value_type& he : hist_)
      {
        output_hist_entry(os, he);
        os << '\n';
      }
      return os;
    }

  private:
    // Enumeration to identify the various operations being logged
    // and tracked in the frequency histogram...
    enum op
    {
      DefaultConstr, CopyConstr, MoveConstr, OtherConstr,
      CopyAssign, MoveAssign,
      Destruct
    };

    // Define log_entry_type as a std::tuple such that:
    //   <0> is an instance of op
    //   <1> indicates the this object
    //   <2> indicates the source object (e.g., for copies/moves)
    typedef std::tuple<op, tracker*, tracker*> log_entry_type;

    // Define log_type as a list of log_entry_types...
    typedef std::list<log_entry_type> log_type;

    // Define hist_type as a map from op to size_t...
    typedef std::map<op, std::size_t> hist_type;

    // Update tracked information in the log and histogram...
    static void update(op o, tracker* this_v, tracker* src = nullptr)
    {
      auto log_entry = log_entry_type(o, this_v, src);

      // Append to log...
      log_.push_back(log_entry);

      // Update histogram...
      ++hist_[o];
    }

    // Output a log entry...
    static std::ostream& output_log_entry(std::ostream& os,
      log_entry_type const& le)
    {
      switch (std::get<0>(le))
      {
        case DefaultConstr:
          os
            << "default_constructor[this="
            << std::get<1>(le)
            << ']'
          ;
          break;

        case CopyConstr:
          os
            << "copy_constructor[this="
            << std::get<1>(le)
            << ",src="
            << std::get<2>(le)
            << ']'
          ;
          break;

        case MoveConstr:
          os
            << "move_constructor[this="
            << std::get<1>(le)
            << ",src="
            << std::get<2>(le)
            << ']'
          ;
          break;

        case OtherConstr:
          os
            << "other_constructor[this="
            << std::get<1>(le)
            << ']'
          ;
          break;

        case CopyAssign:
          os
            << "copy_assignment[this="
            << std::get<1>(le)
            << ",src="
            << std::get<2>(le)
            << ']'
          ;
          break;

        case MoveAssign:
          os
            << "move_assignment[this="
            << std::get<1>(le)
            << ",src="
            << std::get<2>(le)
            << ']'
          ;
          break;

        case Destruct:
          os
            << "destructor[this="
            << std::get<1>(le)
            << ']'
          ;
          break;
      }
      return os;
    }

    // Output a hist entry...
    static std::ostream& output_hist_entry(std::ostream& os,
      typename hist_type::value_type const& he)
    {
      switch (he.first)
      {
        case DefaultConstr:
          os << "default_constructor " << he.second;
          break;

        case CopyConstr:
          os << "copy_constructor " << he.second;
          break;

        case MoveConstr:
          os << "move_constructor " << he.second;
          break;

        case OtherConstr:
          os << "other_constructor " << he.second;
          break;

        case CopyAssign:
          os << "copy_assignment " << he.second;
          break;

        case MoveAssign:
          os << "move_assignment " << he.second;
          break;

        case Destruct:
          os << "destructor " << he.second;
          break;
      }
      return os;
    }

    // Define the log_ and hist_ variables statically...
    static log_type log_;
    static hist_type hist_;
};

// All class static variables must be instantiated outside the class...
template <typename T>
typename tracker<T>::log_type tracker<T>::log_;

template <typename T>
typename tracker<T>::hist_type tracker<T>::hist_;

#endif // #ifndef tracker_hxx_

Noteworthy is how almost everything is defined as private in order to not pollute the derived class with symbols. Even the static functions could have been defined outside the tracker class as friends to further avoid any pollution.

The idea behind tracker<T> is to:

  • log all constructor, copy/move assignment, and destructor invocations as they occur for some type T,
  • to log all object names (i.e., their addresses) upon which these invocations occur,
  • to log any source objects for all copying and move operations, and,
  • to keep an up-to-date frequency histogram of how many times each type of invocation occurred.

Using The Tracker Class With The Curiously Recurring Template Pattern (CRTP)

With such, one need only inherit from tracker passing in the name of the class doing the inheriting:

class Op : tracker<Op>
{
  public:
    using tracker<Op>::tracker<Op>;
};

Unfortunately, it appears the version of GCC I am using does not implement using syntax with inheriting constructors yet. As a work-around, one needs to define each constructor, assignment operator, and destructor in the derived class and invoke the same in the parent:

struct Op : tracker<Op>
{
  Op()
    : tracker()
  {
  }

  Op(Op const& a)
    : tracker(a)
  {
  }

  Op(Op&& a)
    : tracker(std::move(a))
  {
  }

  Op& operator =(Op const& a)
  {
    tracker::operator =(a);
    return *this;
  }

  Op& operator =(Op&& a)
  {
    tracker::operator =(std::move(a));
    return *this;
  }

  ~Op() = default;
};

Passing the class one is defining as a template argument to its base class is permitted and extremely useful in C++. It is a C++ programming pattern is called the Curiously Recurring Template Pattern (CRTP) and it works because all template instantiations are lazy (i.e., they only occur when they are required) and C++ permits forward declarations of types. When the compiler encounters class Op it now knows that Op is a class, so when it is told to inherit from tracker<Op> it is not an error provided that the use of Op inside of tracker does not need to know anything inside Op's definition outside of any member function's definition (not prototype)! If one tries to access members of Op from within tracker (but not within its member functions) the C++ compiler will give an error message about Op being an "incomplete type". Since tracker does no such thing, the compiler inherits from tracker<Op> and then processes the definition of Op!

The CRTP design pattern is extremely useful in C++ programs especially to perform the following:

  • to automatically perform tracking/logging functions including automatically adding/removing all instances of a type to data structures, and,
  • to eliminate the overhead of virtual function calls when run-time object polymorphism is not needed.

Since we want to track the function object instances and the function argument instances, two classes will need to be defined using CRTP:

  • Op representing the function object type, and,
  • Operand representing the function arguments' type.

Such would be defined as follows (say in a file called op_operand.hxx):

#ifndef op_operand_hxx_
#define op_operand_hxx_

#include "tracker.hxx"
#include <ostream>
#include <iostream>

//====================================================================

// Define the function argument type. It will hold an int value...
struct Operand : tracker<Operand>
{
  Operand()
    : tracker(), value(0) { }
  Operand(int v)
    : tracker(0), value(v) { }
  Operand(Operand const& a)
    : tracker(a), value(a.value) { }
  Operand(Operand&& a)
    : tracker(std::move(a)), value(a.value) { a.value = 0; }
  Operand& operator =(Operand const& a)
    { tracker::operator =(a); value = a.value; return *this; }
  Operand& operator =(Operand&& a)
  {
    tracker::operator =(std::move(a));
    value = a.value;
    a.value = 0;
    return *this;
  }
  ~Operand() = default;

  int value;
};

// Define the ability for Operands to be written to a stream...
std::ostream& operator <<(std::ostream& os, Operand const& o)
{
  os << o.value;
  return os;
}

//====================================================================

// Define the functor type...
struct Op : tracker<Op>
{
  Op() : tracker() { }
  Op(Op const& a) : tracker(a) { }
  Op(Op&& a) : tracker(std::move(a)) { }
  Op& operator =(Op const& a)
    { tracker::operator =(a); return *this; }
  Op& operator =(Op&& a)
    { tracker::operator =(std::move(a)); return *this; }
  ~Op() = default;

  // Define calling a function with no arguments...
  Operand* operator ()() const
    { return nullptr; }

  // Define calling a function with one argument...
  Operand* operator ()(Operand&& x) const
    { return nullptr; }
#ifdef OPERAND_LVALUE_REF_SUPPORT
  Operand* operator ()(Operand const& x) const
    { return nullptr; }
#endif

  // Define calling a function with two arguments...
  Operand* operator ()(Operand&& x, Operand&& y) const
    { x.value += y.value; return std::move(x); }
#ifdef OPERAND_LVALUE_REF_SUPPORT
  Operand* operator ()(Operand const& x, Operand const& y) const
  {
    Operand retval(x);
    retval.value += y.value;
    return nullptr;
  }
#endif

  // Define calling a function with three arguments...
  Operand* operator ()(Operand&& x, Operand&& y,
    Operand&& z) const
    { x.value += y.value + z.value; return nullptr; }
#ifdef OPERAND_LVALUE_REF_SUPPORT
  Operand* operator ()(Operand const& x, Operand const& y,
    Operand const& z) const
  {
    Operand retval(x);
    retval.value += y.value + z.value;
    return nullptr;
  }
#endif
};

//====================================================================

// Define all testing macros...

#define APPLY_TEST0(i) \
  std::cout << #i " = " << apply(i) << "." << std::endl

#define APPLY_TEST1(i,a) \
  std::cout << #i "(" #a ") = " << apply(i,a) << "." << std::endl

#define APPLY_TEST2(i,a,b) \
  std::cout << #i "(" #a "," #b ") = " << apply(i,a,b) \
  << "." << std::endl

#define APPLY_TEST3(i,a,b,c) \
  std::cout << #i "(" #a "," #b "," #c ") = " \
    << apply(i,a,b,c) << "." << std::endl


#define APPLY_TUPLE_TEST0(i) \
  std::cout << #i " = " << apply_tuple(i,std::tuple<>()) \
  << "." << std::endl

#define APPLY_TUPLE_TEST1(i,a) \
  std::cout << #i "(" #a ") = " \
    << apply_tuple(i,std::make_tuple(a)) << "." << std::endl

#define APPLY_TUPLE_TEST2(i,a,b) \
  std::cout << #i "(" #a "," #b ") = " \
    << apply_tuple(i,std::make_tuple(a,b)) << "." << std::endl

#define APPLY_TUPLE_TEST3(i,a,b,c) \
  std::cout << #i "(" #a "," #b "," #c ") = " \
    << apply_tuple(i,std::make_tuple(a,b,c)) << "." << std::endl


#define OUTPUT(type) \
  std::cout \
    << #type " LOG ============\n" \
    << tracker<type>::output_log \
    << #type " HIST ===========\n" \
    << tracker<type>::output_hist \
    << #type " END ============\n" \
  ;


#define CLEAR(type) tracker<type>::clear();

#define APPLY_FULL0(type) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TEST0(type()); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_FULL1(type,a) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TEST1(type(),Operand(a)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_FULL2(type,a,b) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TEST2(type(),Operand(a),Operand(b)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_FULL3(type,a,b,c) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TEST3(type(),Operand(a),Operand(b),Operand(c)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_TUPLE_FULL0(type) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TUPLE_TEST0(type()); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_TUPLE_FULL1(type,a) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TUPLE_TEST1(type(),Operand(a)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_TUPLE_FULL2(type,a,b) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TUPLE_TEST2(type(),Operand(a),Operand(b)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


#define APPLY_TUPLE_FULL3(type,a,b,c) \
  CLEAR(type); \
  CLEAR(Operand); \
  { APPLY_TUPLE_TEST3(type(),Operand(a),Operand(b),Operand(c)); } \
  std::endl(std::cout); \
  OUTPUT(type); \
  OUTPUT(Operand); \
  std::endl(std::cout);


//====================================================================

#endif // #ifndef op_operand_hxx_


Running The Tests

The C++ code to run tests on my code is:

#include <cstddef>

//====================================================================

template <std::size_t...>
struct indices;

template <std::size_t N, typename Indices, typename... Types>
struct make_indices_impl;

template <
  std::size_t N,
  std::size_t... Indices,
  typename Type, typename... Types
>
struct make_indices_impl<N, indices<Indices...>, Type, Types...>
{
  typedef
    typename make_indices_impl<
      N+1,
      indices<Indices...,N>,
      Types...>::type
    type
  ;
};

template <std::size_t N, std::size_t... Indices>
struct make_indices_impl<N, indices<Indices...>>
{
  typedef indices<Indices...> type;
};

template <std::size_t N, typename... Types>
struct make_indices
{
  typedef
    typename make_indices_impl<
      0,
      indices<>,
      Types...
    >::type
    type
  ;
};

//====================================================================

#include <tuple>
#include <functional>

//====================================================================

template <typename Op, typename... Args>
auto apply(Op&& op, Args&&... args)
  -> typename std::result_of<Op(Args...)>::type
{
  return op( std::forward<Args>(args)... );
}

//====================================================================

template <
  typename Indices
>
struct apply_tuple_impl;

template <
  template <std::size_t...> class I,
  std::size_t... Indices
>
struct apply_tuple_impl<I<Indices...>>
{
  template <
    typename Op,
    typename... OpArgs,
    template <typename...> class T = std::tuple
  >
  static auto apply_tuple(Op&& op, T<OpArgs...>&& t)
    -> typename std::result_of<Op(OpArgs...)>::type
  {
    return op( std::forward<OpArgs>(std::get<Indices>(t))... );
  }
};

template <
  typename Op,
  typename... OpArgs,
  typename Indices = typename make_indices<0, OpArgs...>::type,
  template <typename...> class T = std::tuple
>
auto apply_tuple(Op&& op, T<OpArgs...>&& t)
  -> typename std::result_of<Op(OpArgs...)>::type
{
  return apply_tuple_impl<Indices>::apply_tuple(
    std::forward<Op>(op),
    std::forward<T<OpArgs...>>(t)
  );
}

//====================================================================

#include "tracker.hxx"
#include "op_operand.hxx"

int main()
{
  // TEST apply()...
  APPLY_FULL0(Op);
  APPLY_FULL1(Op, -3);
  APPLY_FULL2(Op, -3, -10);
  APPLY_FULL3(Op, -3, -10, -100);

  std::cout
    << "====================================="
    << std::endl
    << std::endl
  ;

  // TEST apply_tuple()...
  APPLY_TUPLE_FULL0(Op);
  APPLY_TUPLE_FULL1(Op, -3);
  APPLY_TUPLE_FULL2(Op, -3, -10);
  APPLY_TUPLE_FULL3(Op, -3, -10, -100);

  return 0;
}

//====================================================================

and the C++ code to run tests on BCW's code is:

#include <cstddef>
#include <tuple>
#include <functional>

//====================================================================

// BCW's apply() renamed to apply_tuple()...

template <typename Func, typename... TupleTypes,
          typename... UnwrappedParams>
inline typename std::result_of<Func(TupleTypes...)>::type
  apply_tuple(Func f, std::tuple<TupleTypes...> ts,
      UnwrappedParams... params)
{
    return apply_tuple(f, ts, params...,
        std::get<sizeof...(UnwrappedParams)>(ts));
}

template <typename Func, typename... TupleTypes>
inline typename std::result_of<Func(TupleTypes...)>::type
  apply_tuple(Func f, std::tuple<TupleTypes...> ts,
      TupleTypes... params)
{
    return f(params...);
}

template <typename Func>
inline auto apply_tuple(Func f, std::tuple<> ts) -> decltype(f())
{
    return f();
}

//====================================================================

#include "tracker.hxx"

// BCW's code requires Op::operator ()() to support lvalues to compile.
// Thus, it is necessary to define the next line to have BCW's code
// to compile...
#define OPERAND_LVALUE_REF_SUPPORT
#include "op_operand.hxx"

int main()
{
  // TEST apply_tuple()...
  APPLY_TUPLE_FULL0(Op);
  APPLY_TUPLE_FULL1(Op, -3);
  APPLY_TUPLE_FULL2(Op, -3, -10);
  APPLY_TUPLE_FULL3(Op, -3, -10, -100);

  return 0;
}

//====================================================================


Closing Comments

Success has been achieved and tested! The apply functions were each tested to see whether or not they would perform optimally using C++11's rvalue references and ability to perfectly forward function arguments. Also demonstrated was how recursively defined code can have very significant overheads but such can be elided if code is written to exploit perfect argument forwarding with only a small amount of C++ template metaprogramming code. Once done, one can use such codes without being concerned about their overheads!

Another important lesson for myself and everyone that reads this post is just how important testing code for efficiency concerns –not just correctness– is before adding such code to a real program or into a library that is used in real programs. Ideally code once tested, debugged, etc. can be used as a blackbox with its documented guarantees. For this ideal to be reached, the properties and claims made in the documentation about the code need to be confirmed through testing –not just code study.

Should any any errors or omissions be found within this post, kindly advise me of such!

Happy Coding!

Applying std::tuple To Functors Efficiently

Overview

The new C++11 standard nicely adds tuples (i.e., std::tuple found in <tuple>) and while it is easy to invoke std::make_tuple and pass a number of arguments to create a new std::tuple there does not appear to be any function that efficiently does the reverse (i.e., apply all tuple elements efficiently to a function or a functor). This post details how to do such.

Update (Dec. 18, 2011): I have added some comments concerning the use of std::move in some return statements. When returning values from a function, one should simply write return somevalue; to enable the compiler to choose between moving and return value optimization (RVO). Because this article focuses on discussing move operations, I did not write the code to exploit RVO –it explicitly moves everythng– in order to avoid any confusion, but, please be aware that normally one would not use std::move on a variable being returned in a return statement.

Article Source Code: See the next article.


Using std::make_tuple

Using std::make_tuple to construct a tuple is very straight-forward:

#include <string>
#include <tuple>

int main()
{
  auto x = std::make_tuple(1, 4.3, std::string{"Hello!"});
}

would induce the variable x to have the type:

std::tuple<int, double, std::string>

The std::make_tuple function is a convenience function that allows one to avoid having to specify both the types and the values for a specific tuple. For example, the above code could have been written:

#include <string>
#include <tuple>

int main()
{
  std::tuple<int,double,std::string> x{1, 4.3, std::string{"Hello!"}};
}

It is also particularly handy when one needs to create and pass a tuple to a function as a temporary variable. By simply invoking std::make_tuple in a way that ensures its returned tuple is passed to the function, avoids having to formally declare tuple variables.

Accessing std::tuple's Elements

Given some std::tuple instance, t, the C++11 standard allows us to retrieve a specific element of a std::tuple by using get<N>(t) where N is the zero-based index of the desired element. It is also possible to retrieve the number of elements as well as their types, but, these will not be used in the solution presented in this article.

Requirements Of The Solution

The final solution should concern using a function called apply which should take these parameters:

  • the first argument must be a function (pointer) or a functor (i.e., a struct/class with an overloaded function call operator), and,
  • the second parameter must be a tuple whose sequence of types and values are the same as the first argument's function/functor arguments.

so when apply is called all of the std::tuple's elements are efficiently passed to the function/functor. For example this code's cout lines:

struct add1
{
  int operator ()(int x, int y) const
  {
    return x + y;
  }
};

int add2(int x, int y)
{
  return x + y;
}

// ...

std::cout << apply(add1(), std::make_tuple(3,4)) << std::endl;
std::cout << apply(add2, std::make_tuple(8,2)) << std::endl;

are equivalent to:

std::cout << add1()(3,4) << std::endl;
std::cout << add2(8,2) << std::endl;

and would output:

$ ./a.out
7
10
$

Before considering how to manipulate std::tuple's one should consider how to write the code that does the same without std::tuples:

std::cout << apply(add1(), 3, 4) << std::endl;
std::cout << apply(add2, 8, 2) << std::endl;

Ideally such should be a simpler version of a solution that works with std::tuple and, in this instance, this is what occurs.

The other requirement of the solution is maximum possible efficiency. To achieve this will require careful attention to ensure that no unnecessary copying or moving of any data is done. Ideally, the compiler should be able to use the get function to retrieve all tuple elements and place them in the function arguments all at compile time.

NOTE: This means that std::bind and std::function should not be used in the solution.

Rvalue References And Perfect Forwarding

What Is An Rvalue?

To meet the above requirements two new C++11 features will need to be exploited: r-value references and perfect forwarding of function arguments. Prior to C++11 only these special operations, with or without using templates, could be defined with user-defined types:

  • the ability to create a default-initialized object by defining a constructor that takes no arguments,
  • the ability to create a copy of another object of the same type by defining a constructor that has a single argument that receives the other object,
  • the ability to define how one object is assigned to another (of the same type) by defining a copy assignment operator, and,
  • the ability to define how an object is destroyed by writing a destructor.

In addition to the above C++11 adds the following:

  • the ability to move the contents of an object that is an rvalue into a newly created object because the former object will no longer be used (e.g., it will be destroyed), and,
  • the ability to move the contents of an object that is an rvalue into an existing object via assignment because the former object will no longer be used (e.g., it will be destroyed).

The first ability is called the move constructor and the second is called move assignment. Both operations take rvalue references for their arguments. Know that rvalue references are references and cannot be const. Since they are references, the following code is erroneous:

struct Example2 { };

Example2&& test()
{
  Example2 e;
  // ...
  return e;
}

It is illegal for the same reasons that this code is erroneous:

struct Example2 { };

Example2& test()
{
  Example2 e;
  // ...
  return e;
}

and that this code is erroneous:

struct Example3 { };

Example3* test()
{
  Example3 e;
  // ...
  return &e;
}

as all three return the address (implicitly for Example1 and Example2 and explicitly for Example3) of a local variable that will be destroyed once the test() returns.

What exactly is an rvalue? The name rvalue means right-hand-side-value and stands in constrast to an lvalue (which means left-hand-side-value). The usage of right and left refers to where it is allowed relative to an assignment operator: an rvalue can only appear on the right-hand side whereas an lvalue can appear on the left or right hand sides.

In terms of C++11 coding what is an rvalue and how does it differ from an lvalue? Loosely, an rvalue is:

  • a compiler-created temporary variable, or
  • any unnamed variable.

Hence an rvalue is a value that one cannot explicitly obtain its address via the address-of operator as there is no name to refer to it. (Because of nuances, the C++11 standard redefines lvalues and rvalues more precisely than this using a taxonomy. This level of detail will not be discussed here.)

Once a variable somehow acquires a name then it can be explicitly referenced and manipulated as an lvalue and so C++ no longer treats as an rvalue (with some caveats for explicitly named rvalue-reference variables) unless std::move(variable_name) is invoked on it.

Warning: Once std::move(variable_name) is invoked on a variable and its owned contents has been moved, that variable should only used as as a "zeroed-out" or "empty" object (whatever that means for that type). Effectively, when you use std::move() know that the ownership of that variable's state may be transferred (i.e., moved) to another variable since std::move() makes that variable become an rvalue.

One of the requirements of this article is to be able to have maximum efficiency: one does not want to copy the contents of a tuple into function arguments. In general, for proper program semantics with arbitrary functions copies cannot be made. Consider:

double test(int& i, double const& y)
{
  double retval = y * 3.141592;
  i = retval;
  return retval;
}

where i is passed by reference. If a copy of a variable was passed as an argument then the original variable wouldn't be modified. Clearly, one needs a way to perfectly forward the types in a tuple into each function argument in a compatible way that preserves the semantics of the program code. Prior to C++11 this was not possible. In C++11, it is provided rvalue references with their associated semantics and templates are used.

Lvalue and Rvalue References

Rvalue references are defined similarly to C++98 references. C++98 references are lvalues that are declared using a single ampersand and must be assigned to an lvalue reference at that point (with one exception). The ampersand is used because it is the address-of operator and a C++98 reference implicitly takes the address-of of the expression it is initialized with. For example, this C/C++ code:

int i = 4;
int * const ref = &i;
// ...
*ref = 46; // i will now be set to 46

is semantically equivalent (in terms of its underlying implementation) to this C++98 code:

int i = 4;
int& ref = i;
// ...
ref = 46; // i will now be set to 46

i.e., any C++98 reference is equivalent (in terms of its underlying implementation) to a constant pointer to whatever its type is. This implies that once a reference's (address) value is set, it cannot be changed –as it is a constant pointer internally.

C++98 references allowed the ugliness of C's pointers to disappear as it allowed one to implicitly pass variables by reference:

void set(int &i, int value)
{
  i = value;
}
// ...
int myvar;
set(myvar, 9);

instead of writing this C equivalent:

void set(int *i, int value)
{
  *i = value;
}
// ...
int myvar;
set(&myvar, 9);

which also required a lot more attention to always correctly using pointers, etc. to avoid misuing them –the likely result of which was a program crash. Thus C++98 references allowed the programmer to:

  • avoid having to explicitly use the address-of operator for any function call arguments that were to be implicitly passed by reference, and,
  • avoid having to explicitly dereference pointer variables in functions being passed as references –as they are typically only manipulated as values within those functions.

Thus in C++98 it is no longer a burden to efficiently and easily use pass-by-reference. No doubt this encourages programmers to use pass-by-reference when it is more efficient and appropriate to do so.

C++98 references are lvalues and can only refer to rvalues with one exception: constant references referring to compiler-created temporary variables (e.g., passed as function arguments). Consider converting the value function parameter in the above example to be a constant lvalue reference:

void set(int &i, int const&value)
{
  i = value;
}
// ...
int myvar;
set(myvar, 9); // NOTE: 9 must become a temporary --an rvalue!!

Perhaps to one's horror the above code correctly compiles and runs! For the compiler to call set(myvar, 9) it needs to create a temporary variable of type int so its address can be taken and passed to the function! The issue here is the temporary variable is an rvalue –not an lvalue. The C++98 ISO committee reviewed this specific issue carefully and resolved that since the reference refers to constant data that such should be legal. Why? In part: it is constant, i.e., its value won't be changing, and, casting away const is bad and can easily have implementation-defined results. Thus, this "violation" is a non-issue. The result of permitting such:

  • it allows more efficient variable passing to functions,
  • it enables compiler to better optimize code, and,
  • it completely removes the burden of writing overloaded versions of functions just so that literals and other rvalues can be passed to arguments taking constant references.

So all was well and good: plenty of cleaner and efficient C++ code was subsequently written and used successfully. However, being a language of absolute efficiency and robustness with high- and low-level styles of coding, there were still problems that caused programmers to twist their code to be efficient. Consider this example code that critics of object-oriented programming (OOP) often cite as an argument against OOP:

class Matrix
{
  public:
    // NOTE: All code below assumes that nrows_ and ncols_
    //       is hard-coded to be 1000 by 1000 for simplicity
    //       here since how to properly write a general matrix
    //       class is not the purpose of this blog post! :-)

    Matrix()
      : nrows_(1000), ncols_(1000), data_(nrows_*ncols_) { }

    // ...

    // Permit access to the sizes...
    std::size_t nrows() const;
    std::size_t ncols() const;

    // Permit access using the function call operator...
    double& operator ()(std::size_t row, std::size_t col);
    double const& operator ()(std::size_t row, std::size_t col) const;

  private:
    std::size_t const nrows_;   // Hard-coded to be 1000
    std::size_t const ncols_;   // Hard-coded to be 1000
    std::vector<double> data_;
};

// Define addition...
Matrix& operator +=(Matrix& a, Matrix const& b)
{
  for (std::size_t i=0; i != b.nrows(); ++i)
    for (std::size_t j=0; j != b.ncols(); ++j)
      a(i,j) += b(i,j);   // No copies: just += each element.

  return a;   // Return a by reference as it was passed!
}

Matrix operator +(Matrix const& a, Matrix const& b)
{
  Matrix retval(a);   // Copy a into retval
  retval += b;        // Add b to retval
  return retval;      // Return retval by value
}

// ...
Matrix a, b, c;
// ...
Matrix d = a + b + c; // Can be very inefficient!

i.e., given three matrices add them (non-destructively) together and place their result in a new matrix. The addition code be written written using two for loops in a procedural manner (assuming the memory was previously allocated, all four matrices are the same size, and we are not trying to optimize anything else):

std::size_t const nrows = a.nrows();
std::size_t ncols = a.ncols();

for (std::size_t i=0; i != nrows; ++i)
  for (std::size_t j=0; j != ncols; ++j)
    d(i,j) = a(i,j) + b(i,j) + c(i,j);

Unfortunately, in OOP languages (except in C++ when using a library that optimizes such using the expression templates technique) this is not the code that is generated by the compiler. Instead the compiler generates code based on the operations being applied to the objects reflect in the expression:

Matrix d = a + b + c;

which does the following (assuming the expression templates code is not written induce optimization):

  1. First a + b is evaluated. This requires calling operator +(a,b) and returns a compiler-created temporary Matrix value. Let's call this value t1.
  2. Now t1 + c is evaluated. This also requires calling operator +(t1,b) again and it too returns a compiler-created temporary Matrix value. Let's call this value t2.
  3. Having evaluated the right-hand size of the initialization (as this is not the assignment operator!) the constructor will invoke the copy constructor for d passing in t2 so its value can be copied.
  4. Finally the compiler destroys all temporary values: t1 and t2.

Clearly the generated code, although it will produce the correct answer, allocates more RAM and will take more time to execute (as there will be more cache misses than hits especially with large matrices). Clearly this is unacceptable. On the other hand, one does want the code to execute according to the functions defined for the classes being used. What should one do? The expression templates technique is a form of C++ template metaprogramming that is considered to be advanced, is there an easier way? The short answer is with C++11, "Yes!" Before doing that, with C++98, many programmers would write:

Matrix d = a;
d += b;
d += c;

instead of Matrix d = a * b * c;. The former is more efficient, the latter is easier to read and understand –not to mention commonly written in actual code. Ideally a programming language should not force the programmer to make such trade-offs.

Important: Using rvalue references does not eliminate the need for the expression templates technique –there are still many things that won't be optimized. The expression templates technique allows one to write code that recognizes how types are used and transform them (to presumably equivalent, more efficient forms). With matrices, such can be used to determine at compile-time an optimal way to perform matrix multiplication (in expressions with varying matrix sizes) such as:

Matrix d = a * b * c;

Moving Stuff Around

In the above example the temporary variables, t1 and t2, were rvalues and the compiler should have somehow reused t1 instead of also making t2. But, the compiler is not human and does not know the programmer's intention or anything about matrices! This means the compiler cannot simply reuse t1 and call operator +=(t1,c) instead: it was told to only call operator +!!.

So how can rvalue references help with expressions like Matrix d = a + b + c;? Through the definition of move constructors and move assignment operators, they allow the programmer to implicitly tell the compiler how to efficient move the contents/state of one rvalue variable to another. This is particularly handy since the compiler actively does create temporary variables when evaluating expressions that the programmer otherwise has no control over! Re-examining Matrix d = a + b + c;:

  1. First the compiler must make a temporary to perform a + b. Let's again call this temporary t1. So there are no changes here.
  2. Next the compiler will perform t1 + c this too will return a copy. Hey, there are no improvements!! Let's stop.

Yes, there are no improvements. To get them, the Matrix class (and the types it internally uses) need to define move constructors and assignment operators. Nicely a C++11 compiler will define default ones –which are of benefit if the underlying types define such (e.g., Matrix' use of std::vector). So, I lied: there were improvements, but, unless one understands the move constructor and move assignment operator, it is hard-to-comprehend what benefits are obtained through the use of rvalues.

Consider this simple array-like class:

#include <algorithm>

class MyArray
{
  public:
    // Default constructor...
    MyArray()
      : size_(s), data_(nullptr)
    {
    }

    // Allow one to specify a size...
    MyArray(std::size_t s)
      : size_(s), data_(new int[s])
    {
    }

    // Define copy constructor...
    // Notice that it accepts an lvalue reference to a constant
    //   of the same type as this class (i.e., MyArray).
    MyArray(MyArray const& a)
      : size_(a.size_), data_(new int[a.size_])
    {
      std::copy(a.data_, a.data_+a.size_, data_);
    }

    // Define move constructor...
    // Notice that it accepts an rvalue reference (&&) to
    //   an instance of the same type as this class (i.e., MyArray).
    // Notice that the data_ in a is simply assigned to this object
    //   and then a.data_ is set to be null. Why? The data is being
    //   MOVED from a to this. a will need to still be properly
    //   destroyed so a.data_ needs to be set to null.
    MyArray(MyArray&& a)
      : size_(a.size_), data_(a.data_)
    {
      a.size = 0;           // Reset size
      a.data_ = nullptr;    // Reset pointer
    }

    // Define copy assignment operator...
    // Notice that exception-safe code is presented here.
    // Notice that making a copy is necessary.
    MyArray& operator =(MyArray const& a)
    {
      MyArray temp(a);               // Make copy of a
      std::swap(size_, temp.size_);  // Swap sizes
      std::swap(data_, temp.data_);  // Swap data pointers

      // Destroy temp as *this is now a copy of a! :-)
      return *this;
    }

    // Define move assignment operator...
    // One could write this without temp to be slightly
    // more efficient. This code reuses the above definitions
    // and keeps the actual movement code in the move
    // constructor only.
    MyArray&& operator =(MyArray&& a)
    {
      MyArray temp(std::move(a));    // Move a into temp
      std::swap(size_, temp.size_);  // Swap sizes
      std::swap(data_, temp.data_);  // Swap data pointers
      return *this;
    }

    // ...

  private:
    std::size_t size_;
    int* data_;
};

Perhaps it is much easier to see that moving is actually moving the data from one object to another! It can only be done with rvalues as rvalues are not to be the receivers of assignment operations: they are intended to only be used as values even though we can also acquire their addresses as they are located somewhere in the computer's RAM.

Notice that std::move(a) was used. Isn't a declared to be an rvalue reference? Yes it is, but, once it acquires a name it has to be treated as an lvalue. The std::move function is C++11's way for a programmer to explicitly make a lvalue an rvalue. Once this is done the novice programmer should not use the variable whose content was moved –as its state has been reset so that its destructor can be executed without error and a minimum of overhead.

Note: The purpose of a named rvalue reference declaration is to ensure that it can only be initialized with an rvalue. It does not mean it is itself an rvalue –although it refers to one!

Let's re-examine:

Matrix d = a + b + c;

which clearly computes a rvalued result which is used to initialize d. Provided the std::vector class has a move constructor (it does) and one is in Matrix then the compiler need not use the copy constructor: it can use the move constructor instead. This implies the data in t2 will be moved/transferred to d directly –not copied into d. This implies one fewer Matrix has to be dynamically allocated in the expression. Awesome!

Concerning the expression:

a + b + c;

all we've told the compiler is how to handle this function:

Matrix operator +(Matrix const& a, Matrix const& b);

which states given two Matrix instances return a new one. Returning any value by value from a function implies that its value can be treated as an rvalue. Thus, instead of making a copy of the Matrix returned, copying the contents over, and then destroying the local variable, the compiler is free to simply move it from the local variable to another one that is available –since the local variable will be destroyed.

One can do even better by telling the compiler how to add values that are rvalues. Only looking at operator + and making simple changes:

Matrix operator +(Matrix const& a, Matrix const& b); // as before

// [2]
inline Matrix operator +(Matrix&& a, Matrix const& b)
{
  Matrix retval(std::move(a));   // Move a into retval
  retval += b;                   // Add b to retval
  return retval;                 // Returns as an rvalue
  // Some may prefer: return std::move(a += b);

  // In general don't use std::move() in a return statement
  // that returns a variable. Simply write: return retval;
  // This will allow compilers to choose between RVO or moving.
}

// [3]
inline Matrix operator +(Matrix const& a, Matrix&& b)
{
  // This works since matrix addition is associative...
  return operator +(std::move(b), a); // call [2] above
}

// [4]
inline Matrix operator +(Matrix&& a, Matrix&& b)
{
  return operator +(std::move(a), b); // call [2] above
}

With these defined when rvalues are being added the cost of creating a new temporary matrix state (i.e., the RAM to hold the 1000 by 1000 matrix!) is now completely avoided. The compiler does still create the temporary variables and needs to (as they are allocated on the stack), but, the overheads have been sliced to a bare minimum.

Revisiting what happens when Matrix d = a + b + c; is run with the above additions and changes (assuming std::vector behaves like MyArray: i.e., moves null out the internal pointers):

  1. First a + b is calculated and a full temporary matrix is returned as temporary t1. No change here.
  2. Now t1 + c is performed. Since t1 is an rvalue as it is a compiler-created temporary, it invokes:
    Matrix operator +(Matrix&& a, Matrix const& b)

    which avoids creating a new temporary with its own separate storage. Instead it creates a new temporary, t2, by move constructing it from the stored result in retval which was moved from t1 (given the above code). The savings here are huge: no unnecessary additional RAM was allocated on the heap!

  3. Finally, t2 is moved into d's constructor without allocating any additional RAM on the heap!

So, if you're counting temporaries one might argue that there is not much improvement. But if one is counting the amount of RAM allocated and copying there are major improvements. Essentially, only one Matrix is dynamically allocated in RAM (to hold the initial temporary) and that winds up ultimately being moved into d.

Note: C++ can elide constructors and perform other operations to further optimize the above code. The programmer can do additional things too. If one will not be using the a, b, and c variables again, you can help the compiler out by having all variables become rvalues (which would move everything –no copies):

Matrix d = std::move(a) + std::move(b) + std::move(c);

Perfect Forwarding

If the type of a function parameter is an rvalue reference to a template type parameter for a function template then the type parameter is deduced to be: an lvalue reference if an lvalue is passed, or, a value type (which is an rvalue and, thus, can be treated as an rvalue reference by the compiler) assuming the overall call is valid.

Important: Recall that only with function templates, the compiler will deduce the (template) types of its arguments. This specific type of deduction mentioned here only works for function templates when an rvalue reference to a type that is a template parameter is used. If the argument is an rvalue to a non-template type this specific deduction will not occur.

Consider:

struct Silly { };

template <typename T>
void func(T&& t);

int main()
{
  Silly s;
  func(s);             //    Lvalue. Argument type: Silly&
  func(Silly());       // Temporary. Argument type: Silly&&
  func(std::move(s));  //  Rvalue forced. Arg type: Silly&&
}

By using this deductive property, C++ can preserve the argument types while forwarding them to another function. To do this, one needs to use std::forward<arg_type>(arg_variable_name) which takes care of lvalue/rvalue issues when doing such. All that is needed for one to exploit such is to simply use it.

Enough sidebar issues! Let's figure out how to efficiently pass a std::tuple to regular functions and then functors!


Writing apply() To Not Use std::tuple

Using variadic templates solving this problem is straight-forward:

#include <tuple>
#include <functional>
  // NOTE: g++ v4.5 needs <functional> for std::result_of
  //       g++ v4.7 does not

template <typename Op, typename... Args>
auto apply(Op&& op, Args&&... args)
  -> typename std::result_of<Op(Args...)>::type
{
  return op(args...);
}

It is straight-forward for this reasson: every argument from the second passed to apply is in the Args parameter pack in the order and form it needs to be. This allowed the arguments to be expanded in the return statement as-is. The only remaining item to address here is to use std::forward on those arguments:

#include <tuple>
#include <functional>
  // NOTE: g++ v4.5 needs <functional> for std::result_of
  //       g++ v4.7 does not

template <typename Op, typename... Args>
auto apply(Op&& op, Args&&... args)
  -> typename std::result_of<Op(Args...)>::type
{
  return op( std::forward<Args>(args)... );
}

to ensure perfect forwarding.


Writing apply_tuple() To Use std::tuple

A Problem

Unfortunately, the above solution cannot be trivially applied to std::tuple. When template parameter packs are expanded they don't have the ability to perform the arithmetic (e.g., N+1) needed to unroll the std::tuple elements for each function argument. The latter is needed since the std::get template function requires each index as a compile-time constant template argument, if parameter pack expansion is to be used then a way must be found to increment the value passed to std::get as each tuple element is extracted.

Although parameter pack expansions cannot do arithmetic, one can pre-compute the indices it will need. Why is this useful? If parameter pack expansions are done on expressions involving parameter pack types and variables, then they will all be expanded together as a sequence. For example, given the tuple:

std::tuple{1,2.2,'a'}

whose type is std::tuple<int,double,char> and this:

template <std::size_t...>
struct indices;
// ...
indices<0,1,2>

then it will be possible to expand both (i.e., indices and std::tuple types's template arguments, i.e., the type of each element in the tuple) at the same time using parameter pack expansion. Hopefully by solving this problem a solution similar to the non-tuple version will be possible!

Generating A List Of Indices

By defining a compile-time only container to hold the list of indices:

template <std::size_t...>
struct indices;

struct templates make_indices, make_indices, and make_indices_impl can be defined. What should they contain? For each type, a corresponding index needs to be generated that is zero-based. This implies that when processing each tuple element type from left to right the next highest index needs to be appended to the indices instance's template argument std::size_t constants. This will need to be defined recursively using make_indices as an easy-to-use type to start the indices' generation and hold the result in a typedef called type:

#include <cstddef> // for std::size_t

// Define holder for indices...
template <std::size_t...>
struct indices;

// Define adding the Nth index...
// Notice one argument, Type, is discarded each recursive call.
// Notice N is added to the end of Indices... .
// Notice N is incremented with the recursion.
template <std::size_t N, typename Indices, typename... Types>
struct make_indices_impl;

template <
  std::size_t N,
  std::size_t... Indices,
  typename Type,
  typename... Types
>
struct make_indices_impl<N, indices<Indices...>, Type, Types...>
{
  typedef
    typename make_indices_impl<
      N+1,
      indices<Indices...,N>,
      Types...
    >::type
    type
  ;
};

// Define adding the last index...
// Notice no Type or Types... are left.
// Notice the full solution is emitted into the container.
template <std::size_t N, std::size_t... Indices>
struct make_indices_impl<N, indices<Indices...>>
{
  typedef indices<Indices...> type;
};

// Compute the indices starting from zero...
// Notice indices container is empty.
// Notice Types... will be all of the tuple element types.
// Notice this refers to the full solution (i.e., via ::type).
template <std::size_t N, typename... Types>
struct make_indices
{
  typedef
    typename make_indices_impl<0, indices<>, Types...>::type
    type
  ;
};

Define apply_tuple To Work With std::tuple

Finally apply can be written but we need a helper function. Start by writing apply to do everything it can do:

template <
  typename Op,
  typename... OpArgs,
  typename Indices = typename make_indices<0, OpArgs...>::type,
  template <typename...> class T = std::tuple
>
auto apply_tuple(Op&& op, T<OpArgs...>&& t)
  -> typename std::result_of<Op(OpArgs...)>::type
{
  return apply_tuple_impl<Indices>::apply_tuple(
    std::forward<Op>(op),
    std::forward<T<OpArgs...>>(t)
  );
}

Unfortunately, Indices is not a parameter pack and the indices inside of Indices and OpArgs must both be parameter packs so that they can be expanded together:

[snip]
{
  return op( std::forward<OpArgs>(std::get<Indices>(t))... );
}

It is straight-forward to use partial template specialization with a helper class to lift the std::size_t… parameter pack out of its container and this is why apply_tuple_impl has been used above. Notice that all function arguments are simply forwarded. The "prototype" case for apply_tuple_impl must be mentioned before using the partial template specialization technique:

template <typename Indices>
struct apply_tuple_impl;

Now Indices can be lifted out of the template template parameter I<Indices...> which will be inferred to be indices –the container type:

template <
  template <std::size_t...> class I,
  std::size_t... Indices
>
struct apply_tuple_impl<I<Indices...>>
{
  template <
    typename Op,
    typename... OpArgs,
    template <typename...> class T = std::tuple
  >
  static auto apply_tuple(Op&& op, T<OpArgs...>&& t)
    -> typename std::result_of<Op(OpArgs...)>::type
  {
    return op( std::forward<OpArgs>(std::get<Indices>(t))... );
  }
};

which is the desired solution.


Testing The Solution

The following code will nicely (partially) test the solution:

//=================================================================

#include <cstddef>
#include <tuple>
#include <functional> // Only if needed

// Definition of indices
// Definition of make_indices_impl
// Definition of make_indices
// Definition of apply
// Definition of apply_tuple_impl
// Definition of apply_tuple

//=================================================================

#include <iostream>

struct add { int operator ()(int x, int y) { return x + y; } };
int add2(int x, int y) { return x + y; }

struct A1 { int operator ()() { return -1; } };
int A2() { return -2; }

struct B1 { int operator ()(int x) const { return x; } };
int B2(int x) { return x; }

int main()
{
  using namespace std;

  // Test apply() with no arguments...
  cout
    << "A1()() = "
    << apply(A1())
    << "."
    << endl
  ;
  cout
    << "A2() = "
    << apply(A2)
    << "."
    << endl
  ;

  // Test apply() with one argument...
  cout
    << "B1()(-3) = "
    << apply(B1(),-3)
    << "."
    << endl
  ;
  cout
    << "B2(-4) = "
    << apply(B2,-4)
    << "."
    << endl
  ;

/*
  // Define C1 and C2 elsewhere for two arguments and
  // test apply() with such...
  cout
    << "C1()(-3,-10) = "
    << apply(C1(),-3,-10)
    << "."
    << endl
  ;
  cout
    << "C2(-4,-10) = "
    << apply(C2,-4,-10)
    << "."
    << endl
  ;

  // etc.
*/



  // Test apply_tuple() with no arguments...
  cout
    << "A1()() = "
    << apply_tuple(A1(),std::tuple<>())
    << "."
    << endl
  ;
  cout
    << "A2() = "
    << apply_tuple(A2,std::tuple<>())
    << "."
    << endl
  ;

  // Test apply_tuple() with one argument...
  cout
    << "B1()(-3) = "
    << apply_tuple(B1(),make_tuple(-3))
    << "."
    << endl
  ;
  cout << "B2(-3) = "
    << apply_tuple(B2,make_tuple(-4))
    << "."
    << endl
  ;

/*
  // Define C1 and C2 elsewhere for two arguments and
  // test apply_tuple() with such...
  cout
    << "C1()(-3,-10) = "
    << apply_tuple(C1(),make_tuple(-3,-10))
    << "."
    << endl
  ;
  cout
    << "C2(-4,-10) = "
    << apply_tuple(C2,make_tuple(-4,-10))
    << "."
    << endl
  ;

  // etc.
*/


  return 0;
}

It is wise to use different names for apply and apply_tuple so that tuples themselves can be directly applied to functions/functors.


Closing Comments

As far as I can tell the C++11 standard did not include a make_indices utility. Fortunately, it is not hard to define such to provide a straight-forward way to extract tuple values while performing a parameter pack expansion. That said, the solution is not obvious given the norms of coding and therefore can be hard to figure out. For those of you thinking, "There just has to be a way," I wrote this post! 🙂

Should you notice any errors or issues, kindly let me know.