Skip to main content

Faster meta-programs using gcc 4.5 and C++0x

One of the practical issues with C++ meta-programming is its speed. C++ programs that use heavy meta-programming can be notoriously slow to compile on contemporary compilers. Things are changing, however. Check the following comparison of gcc 4.5 against gcc 4.4.3.
The first graph is obtained from a program that creates a binary tree of template instantiations. The x-axis shows the number of instantiations when value of N goes from 8 to 17. I could not build up patience for gcc 4.4.3 beyond 16363 instantiations (N=13). On the other hand, gcc 4.5 does pretty good and its increase in compilation time is indeed linear as mentioned here. Here is the program that creates a binary tree of template instantiations.
template <int Depth, int A, typename B>
struct Binary 
  enum { value = 1 +
         Binary<depth-1, 0, Binary>::value +
         Binary<depth-1, 1, Binary>::value };

template<int a, typename B>
struct Binary<0, A, B> 
  enum { value = 1 };

int main(void) 
  static const int N = 10;
  const int instantiations = Binary<N,0,int>::value;
The second graph is obtained from a program that finds an intersection of two MPL vectors. Again gcc 4.5 shows linear increase in compilation time as opposed to gcc 4.4.3. Here is the intersection program.
template <class V1, class V2>
struct Intersection 
  typedef typename
     boost::mpl::contains<V2, boost::mpl::placeholders::_1> >::type type;
While all that is already exciting, it fades in comparison to the performance of variadic templates in C++0x. The green line in the second graph shows negligible effect on performance with the increasing number of template parameters. Here is my intersection metaprogram using variadic templates.
struct null_type {};
template <typename... Arg> struct vector {};

template <typename V> struct front;
template <typename V> struct pop_front;

template <typename Head, typename... Tail>
struct front <vector <Head, Tail...> > 
  typedef Head type;

template <>
struct front <vector <> > 
  typedef null_type type;

template <typename Head, typename... Tail>
struct pop_front <vector <Head, Tail...> > 
  typedef vector<Tail...> type;

template <>
struct pop_front <vector <> > 
  typedef vector<> type;

template <typename Vector, typename T> struct push_back;

template <typename T, typename... Args>
struct push_back < vector<Args...>, T> 
  typedef vector<Args..., T> type;

template <typename Vector> struct size;

template <typename... Args>
struct size <vector <Args...> > 
  typedef size type;
  enum { value = sizeof...(Args) };

template <typename Vector, typename What> struct contains;

template <typename What, typename Head, typename... Tail>
struct contains < vector<Head, Tail...>, What> : 
  std::conditional < std::is_same<Head, What>::value,
                     contains < vector<Tail...>, What> >::type
  typedef contains type;

template <typename What>
struct contains <vector<>, What> 
  typedef contains type;
  enum { value = 0 };

template <class V1, class V2>
struct Intersection;

template <class V1, class V2, unsigned int N>
struct Intersection_impl
  typedef typename front<V2>::type Head;
  typedef typename pop_front<V2>::type Tail;
  typedef typename Intersection<V1, Tail>::type I;

  typedef typename 
    std::conditional<contains<V1, Head>::value,
                     typename push_back<I, Head>::type,
                     I >::type type;

template <class V1, class V2>
struct Intersection_impl <V1, V2, 0> 
  typedef vector<> type;

template <class V1, class V2>
struct Intersection 
  typedef typename Intersection_impl<V1, V2, 
          size<V1>::value * size<V2>::value>::type type;

So long story short, seems like better days are ahead for C++ meta-programming!


Roger said…
I did a similar test using my implementation of the 8 queens puzzle and saw that clang was way faster than gcc 4.4. It's nice to see that gcc 4.5 now has that optimization, let's hope that cl will follow... :)
Luis said…
Good to know. Good exercise
Anonymous said…
Are you sure taht the Binary template in the post is correct ?
Sumant said…
I fixed the Binary template. It just needed a definition of N.
Seo Sydney said…
I haven’t done any serious looking into the code generated in release builds. I haven’t decked out the optimization options yet with clang and GCC to see what the real runtime differences are in the produced binaries.
Sebastian said…
This is great, I immediately tried it after reading your post and it gave a nice performance boost at compile time for my projects.

Installing g++ 4.5 on Ubunut (10.10) is pretty straight forward too:

sudo apt-get install g++-4.5
sudo rm -rf /usr/bin/g++
sudo ln -s /usr/bin/g++-4.5 /usr/bin/g++

It also supports the C++0x lambda features which I found to be a lot of fun to play with.
java tutorial said…
I was searching on google and directed to this blog. I found it very informative. There are some very useful examples. Great work, Keep it up.
ข่าว said…
Hi !! Thanks for Tricks - I will be sure to check out your blog more often
Well your strategy certainly seems to be working for you my friend. I should really think about mimicking or “at least” trying a few of the things you do more often.
I’ve seen progression in every post. Your newer posts are simply wonderful compared to your posts in the past. Keep up the good work.
You have presented your angles and analysis about the subject in such an interesting manner that it really caught interest. I support your point of view.
Rummy Online said…
Hi , i got this Article , i was searching some thing relevant to this, And i am feeling lucky, as its the perfect one for what i am looking for. I will share this link on face book.
Houses On Sale said…
Good to know. Good exercise
IT-NEWS said…
I like the blog man ;) kep it up!
femdom uk said…
I am enchanted to know your valued intelligence.
buy soma said…
This is a topic of my interest. I love reading through your blog, I wanted to leave a little comment to support you and wish you a good continuation. Wishing you the best of luck for all your blogging efforts.
Hi , i got this Article , i was searching some thing relevant to this, And i am feeling lucky, as its the perfect one for what i am looking for. I will share this link on face book.
Vision said…
You have to write a C/C++ graphics program which will take and display (Graphically) an NFA (User will input the NFA). Once user completes the input (an NFA) the program should convert that NFA to a DFA (Graphically show the DFA). You have to use the Subset Construction Algorithm for NFA to DFA conversion. Finding the Epsilon closure of each/all state of NFA should also be a component of your assignment.The Subset Construction Algorithm details has been provided in the printed format.
danial11 said…
I was searching some thing relevant to this, And i am feeling lucky, as its the perfect one for what i am looking for. I will share this. Certified Personal Trainer
xander345 said…
if you like c++ you can compile it online here:

32, 64 - windows & Linux - and more programming languages
Buy Stromectol said…
Great tips, many thanks for sharing. I have printed and will stick on the wall! I like this blog.
Althea Eno said…
wow....! Very nice and useful article. Your way of explanation is beautiful. I learn a lot of things from your article. The stuff you are using that is very useful and helpful. Thanks for sharing a very informative article.

sex positions
Anonymous said…
your parameter list of Binary seems to be broken:

template struct Binary{...};

Popular posts from this blog

Multi-dimensional arrays in C++11

What new can be said about multi-dimensional arrays in C++? As it turns out, quite a bit! With the advent of C++11, we get new standard library class std::array. We also get new language features, such as template aliases and variadic templates. So I'll talk about interesting ways in which they come together.

It all started with a simple question of how to define a multi-dimensional std::array. It is a great example of deceptively simple things. Are the following the two arrays identical except that one is native and the other one is std::array?

int native[3][4];
std::array<std::array<int, 3>, 4> arr;

No! They are not. In fact, arr is more like an int[4][3]. Note the difference in the array subscripts. The native array is an array of 3 elements where every element is itself an array of 4 integers. 3 rows and 4 columns. If you want a std::array with the same layout, what you really need is:

std::array<std::array<int, 4>, 3> arr;

That's quite annoying for two r…

Understanding Fold Expressions

C++17 has an interesting new feature called fold expressions. Fold expressions offer a compact syntax to apply a binary operation to the elements of a parameter pack. Here’s an example. template <typename... Args> auto addall(Args... args) { return (... + args); } addall(1,2,3,4,5); // returns 15. This particular example is a unary left fold. It's equivalent to ((((1+2)+3)+4)+5). It reduces/folds the parameter pack of integers into a single integer by applying the binary operator successively. It's unary because it does not explicitly specify an init (a.k.a. identity) argument. So, let add it. template <typename... Args> auto addall(Args... args) { return (0 + ... + args); } addall(1,2,3,4,5); // returns 15. This version of addall is a binary left fold. The init argument is 0 and it's redundant (in this case). That's because this fold expression is equivalent to (((((0+1)+2)+3)+4)+5). Explicit identity elements will come in handy a little la…

Folding Monadic Functions

In the previous two blog posts (Understanding Fold Expressions and Folding Functions) we looked at the basic usage of C++17 fold expressions and how simple functions can be folded to create a composite one. We’ll continue our stride and see how "embellished" functions may be composed in fold expressions.

First, let me define what I mean by embellished functions. Instead of just returning a simple value, these functions are going to return a generic container of the desired value. The choice of container is very broad but not arbitrary. There are some constraints on the container and once you select a generic container, all functions must return values of the same container. Let's begin with std::vector.
// Hide the allocator template argument of std::vector. // It causes problems and is irrelevant here. template <class T> struct Vector : std::vector<T> {}; struct Continent { }; struct Country { }; struct State { }; struct City { }; auto get_countries…