Using The C++ Literal Operator

Overview

The C++11 standard has added a new operator that can be overloaded. Literals in programming languages are hard-coded constants in programs. For example, writing 1.2L, "Hello World!", 4096, etc. are all literals (i.e., the first is a long double value, the second is a const char[13] value, and the third is an int). The C++11 standard allows one to define custom literal types that can be transformed at compile-time or run-time into appropriate values. This post explores such using g++ v4.7 snapshot 20111029.

Update (Dec. 16, 2011): A small update was made to the general case definition of binary_literal_impl to provide more user-friendly compiler error messages if incorrect digits are used.


The Goal

I don’t know about you, but I’ve always wanted to be able to write binary numbers into my program code and to have it store such optimally. Using C++11, we’ll be able to do exactly this using code like this:

int main()
{
  using namespace std;

  const unsigned long long bits =
    11011110101011011011111011101111_binary;

  cout << "The number is: " << hex << bits << endl;
}

provided we write a literal operator function whose suffix is _binary.

I believe I read somewhere that literal names not starting with an underscore are reserved by the C++ standard. If you know definitively or otherwise, kindly let me know. :-)

Update: In a reddit post’s comment, zootsuit notes, “From the N3291 draft/17.6.4.3.5: Literal suffix identifiers that do not start with an underscore are reserved for future standardization.” –which is likely where I read it: in one of the draft standards.

I am going to introduce an added twist: the conversion must be done at compile-time. Why? Efficiency! For any binary number hard-coded into the program, it must be encoded as a single integer value in the executable (to ensure minimum space usage and maximum efficiency) converted properly at compile time with an error if it is not proper. Certainly, no programmer wants the binary number to be stored in the executable as a string that is converted at run-time into an integer! Ack! The latter is both a waste of space and time.


Let’s Do A Simple Example First

Before messing around with template metaprogramming (which is probably bewildering until you know how to read/understand it), let’s write a literal whose suffix is _square that computes the square of a long double number it is associated with and returns the result:

#include <iostream>

// Insert literal operator _square definition here. (See below.)

int main()
{
  using namespace std;

  const long double num = 25.5_square;

  cout << num << endl;
}

which would output 650.25. To do this, the following function needs to be written:

constexpr long double operator "" _square(long double num)
{
  return num*num;
}

The literal operator’s name is operator"" and its suffix is given after it (i.e., _square). The return type can be anything but it is set to long double because that is the computed value’s type here. The constexpr keyword implies and requires that the compiler must be able to compute the result as a compile-time constant. If this is not possible, then it will fail to compile. In general, if a literal operator overload is not written as a template function and does not use constexpr, the compiler will invoke the literal operator at run-time.

That’s it! Simply compile the above code, it will store 650.25 in the executable binary as a hard-coded long double value!


Literal Operator Function Parameters

Be aware that the literal operator only allows a fixed set of function arguments:

  • const char*
  • unsigned long long int
  • long double
  • char
  • wchar_t
  • char16_t
  • char32_t
  • const char*, std::size_t
  • const wchar_t*, std::size_t
  • const char16_t*, std::size_t
  • const char32_t*, std::size_t

or if there are no arguments at all, then the literal operator must be defined as a template function whose template arguments are a char template parameter pack, i.e.,

template <char... CS>
some_return_type operator "" _some_suffix_name();

Also notice that all of the function argument types, except for the character types, are the largest-range types of their kind (i.e., unsigned long long is the largest-range integer type, long double is the largest-range floating-point type) as the compiler can easily cast any value to a smaller type at compile-time. Since there are numerous character and string literal types (including the new Unicode and raw literals in C++11) the remaining parameters listed handle these special types of literals.


Implementing The _binary Literal Operator

Recall the earlier code that permits one to write a binary number in the C++:

#include <iostream>

// Insert definition of _binary literal and associated code here.

int main()
{
  using namespace std;

  const unsigned long long bits =
    11011110101011011011111011101111_binary;
  cout << "The number is: " << hex << bits << endl;
}

i.e., the number is 0xDEADBEEF, which is what the program will output.

To ensure that the conversion occurs at compile-time and to be able to easily implement it (as it is a non-trivial function), the implementation of the _binary literal will use a class template with partial template specialization. To understand this better, let’s first start by defining the _binary literal:

template <char... Digits>
constexpr unsigned long long operator "" _binary()
{
  return binary_literal_impl<Digits...>::to_ulonglong();
}

Notice that the _binary literal operation has no arguments. This is because the char values in the string before _binary are being passed as a char template parameter pack.

Template parameter packs represent a sequence of template arguments. They are not types and to extract them they must be expanded with the template parameter pack expansion operator, ....

Within the _binary literal operator definition, the char template parameter pack needs to be expanded and processed into an unsigned long long. To accomplish this, the work will be delegated to a static function inside the class template binary_literal_impl as this will allow writing clean, recursively defined code that processes the char template parameter pack which (should be only!) composed of '0' and '1' characters.


Implementing The binary_literal_impl Class Template

The binary_literal_impl class template allows code to be written that recognizes the following properties about its template arguments:

  • when a '0' appears first, possibly followed by more characters,
  • when a '1' appears first, possibly followed by more characters, and,
  • when there are no characters.

Additionally, if any other (invalid) char values occur, then a compile-time error will be generated (as there will be no definition that exists for binary_literal_impl for such arguments. To accomplish this, partial template specialization is needed so the general case needs to be (forward) declared and not defined (as we want errors if there are no matches!) first:

template <char... Digits>
struct binary_literal_impl;

This is needed first so the compiler knows template arguments for the binary_literal_impl class template must be a parameter pack of char values. It is very important that there are no braces used here: this avoids defining what is associated with binary_literal_impl if there are no partial matches with the code that is written below. (If the compiler cannot find a matching definition, a compile-time error will occur.)

If you are used to functional programming in Miranda or Haskell, C++ requires the reverse order of what would be done in those languages when using partial specialization: the general case is written first, then the specialized cases follow.

Even better one can write the above general case of binary_literal_impl to use static_assert to trigger a very nice compiler error messages when an incorrect digit is used. (If you are new to this style of programming, I encourage you to write also try out the above definition to see the differences in compiler output.)

// Alternative user-friendly general case
// (i.e., any digits other than '0' or '1')...
template <char... Digits>
struct binary_literal_impl
{
  static constexpr unsigned long long to_ulonglong()
  {
    static_assert(false, "Digit characters must either be '0' or '1'.");
    return 0;
  }
};

If the first char (template argument) value is '0', then there is no one bit to shift and the result is simply to return the integer value computed on the rest of the characters in the char parameter pack:

// If the next digit is zero, then compute the rest...
template <char... Digits>
struct binary_literal_impl<'0', Digits...>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return binary_literal_impl<Digits...>::to_ulonglong();
  }
};

Notice that the template argument, however, is now one shorter than what it was. If this is not obvious, then know when one is using partial template specialization, what is inside < and > after the class template name is what is being matched. Thus, since '0', Digits... appears, that is what is being matched: a '0' character followed by a parameter pack called Digits with Digits defined to be char... in template <char... Digits>. Thus, Digits... represents the expansion of all char template argument values after the first one! :-)

If the first char (template argument) value is '1', then there is a one bit to shift left which must be bitwise-OR’d with the result computed on the remaining arguments. Since the binary digits are being processed from left to right, the one bit should be shifted left by the number of digits that remain to be processed. The C++11 sizeof... operator allows one to know the size of a parameter pack at compile time, so the definition of this case becomes:

// If the next digit is one, then shift 1 and compute the rest...
template <char... Digits>
struct binary_literal_impl<'1', Digits...>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return (1ULL << sizeof...(Digits))
      | binary_literal_impl<Digits...>::to_ulonglong();
  }
};

Again notice that the number of characters remaining to process becomes one shorter when recursively calling to_ulonglong().

Finally, at some point there will be no digits left to process in the recursively defined code above. When this occurs, the computed answer should be zero:

// Base case: No digits, so return 0...
template <>
struct binary_literal_impl<>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return 0;
  }
};

i.e., notice binary_literal_impl<> has no contained values.

That’s it!

Importantly, since we did not write any code to handle characters other than '0' or '1' using any other values (e.g., try putting a 2 or a in the number) will cause compilation to fail. This is a good thing because a binary number should only contain '0's and '1's!


Closing Comments

As with many template metaprogramming techniques in C++, no matter how complicated the metaprogramming code is, the use of it is often very straight-forward. With C++11 supporting literals, code will be easier to read and write since more meaningful values can now appear as literals in code instead of equivalent hard-to-understand, machine-specific, hard-coded character or integer arrays. Nicely, any literal definitions/prototypes can be hidden away in header files: the end user does not need to know the details. Why? The end user only needs to know how to use the literal operator, i.e., what is written in the documentation about it! In this instance understanding how to use the _binary literal is easy: it must be preceded by a valid binary number –one doesn’t need to see its definition at all to be able to use it: he/she only needs to see its documentation. :-)

For your convenience, this is the entire program presented above:

//===============================================================

//
// binary_literal_impl represents a compile-time function that
// computes the unsigned long long int from a list of characters
// Digits that MUST be composed of '0' or '1'...
//
template <char... Digits>
struct binary_literal_impl;

// If the next digit is zero, then compute the rest...
template <char... Digits>
struct binary_literal_impl<'0', Digits...>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return binary_literal_impl<Digits...>::to_ulonglong();
  }
};

// If the next digit is one, then shift 1 and compute the rest...
template <char... Digits>
struct binary_literal_impl<'1', Digits...>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return (1UL << sizeof...(Digits))
      | binary_literal_impl<Digits...>::to_ulonglong();
  }
};

// Base case: No digits, so return 0...
template <>
struct binary_literal_impl<>
{
  static constexpr unsigned long long to_ulonglong()
  {
    return 0;
  }
};

//===============================================================

template <char... Digits>
constexpr unsigned long long operator "" _binary()
{
  return binary_literal_impl<Digits...>::to_ulonglong();
}

//===============================================================

#include <iostream>

int main()
{
  using namespace std;

  const unsigned long long bits =
    11011110101011011011111011101111_binary;
  cout << "The number is: " << hex << bits << endl;
}

//===============================================================

13 thoughts on “Using The C++ Literal Operator”

  1. The requirement of beginning the literal name with an underscore seems to have been removed in the last draft before the definitive standard, N3242, 13.5.8

    1. FYI I’ve just checked my copy of the (final) standard, and 17.6.4.3.5 still states that “Literal suffix identifiers that do not start with an underscore are reserved for future standardization.”

  2. Not tried to compile it yet, but seems good! This looks very similar to some metaprogramming code I wrote (with the help of Google) to send an arbitrary (nested) tuple to an ostream.

    Just one thing, should that be “1ULL << sizeof…(Digits)", not 1UL?

    1. 1ULL absolutely! My mistake (and an easy one to make) and I’ve fixed it. Thanks for pointing it out!

      (I do write, compile, and test the code before placing it in an article unless it is a code fragment, e.g., if it is short or a sidebar issue with respect to the article’s topic.)

      I should add that other improvements can be made to the code in the article. One could (and probably should) stop the recursion if the binary number given is larger than what can be represented as an unsigned long long. One could use std::numeric_limits<unsigned long long>::digits and sizeof… with std::enable_if or std::conditional to accomplish this. I did not address this in the article to keep its presentation clean, easy-to-understand (especially by those who’ve never used/seen metaprogramming code before), and knowing that most programmers would never write a literal number larger than std::numeric_limits<unsigned long long>::max() in base two in their code!

  3. Hey, thanks for the confirmation and detailed reply!

    I’ve successfully tried using static_assert, together with std::numeric_limits::digits as you describe – the logic being it’s compile time checkable, and is also a fatal error to want to trap.

    1. You’re welcome! I agree it is an error one wants to trap –often nothing is harder to find and fix in the long run than silent errors. (Hmmm, perhaps for that reason I should do a follow-up post!)

  4. You can also do this with functions alone

    #include <iostream>

    template<typename OUT>
    constexpr OUT bi2int()
    {
      return 0;
    }

    template<typename OUT, char I, char... J>
    constexpr OUT bi2int()
    {
      static_assert(I == '0' || I == '1', "not a valid binary number");
      return bi2int<OUT, J...>() + ((I=='1') ? 1 : 0) * (1ULL << sizeof...(J));
    }


    template<char... J>
    constexpr unsigned long long operator "" _b()
    {
      return bi2int<unsigned long long, J...>();
    }

    int main()
    {
      signed short a = 1111111111111111_b;
      std::cout << a << std::endl;
     
      return 0;
    }

    at least this compiles perfectly fine with clang 3.4 (trunk). Let me know if there are problems with other compilers

    1. Very nice solution! :-)

      Personally my primary concern using constexpr functions to perform compile-time only evaluations is this: a compiler might defer evaluation until run-time. That said, your use of static_assert in this case ensures it is done at compile time, but, I feel this introduces another issue: when should C++11 code be considered portable when using constexpr + static_assert since a compiler might not be able to handle it at compile time? In practice, it may not be a significant issue in most cases, but in terms of software/program design I feel that this should be in the C++ programming community. This is really the same debate that involves recursively deep uses of template metaprogramming (e.g., some parts of various Boost projects) that might also not compile.

      In the post, I used class templates with static constants because (i) I found it to be an interesting use of template parameter value recognition, and (ii) I wanted to be 100% certain that it would be computed at compile-time. At the time of writing, my concerns about item (ii) caused me to tend to refrain from using constexpr functions. I am still concerned although I am more comfortable now with this than I was then.

      My favourite way to use constexpr functions involves: (a) not using static_assert (only if it would force compile-time evaluation) in order to (b) to write code that I want evaluated either at compile-time (if possible, e.g., if provided arguments that are all literals) or at run-time. This is subtly elegant since only one function definition needs to be written and maintained, and, it can be used by the compiler when optimizing at compile-time or as a normal call at run-time. Either way this results in excellent code generation by the compiler to the extent it is able to optimize such –which is all one can fairly ask of any compiler.

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>