Tuesday, October 11, 2011

Multi-dimensional arrays in C++11

What new can be said about multi-dimensional arrays in C++? As it turns out, quite a bit! With the advent of C++11, we get new standard library class std::array. We also get new language features, such as template aliases and variadic templates. So I'll talk about interesting ways in which they come together.

It all started with a simple question of how to define a multi-dimensional std::array. It is a great example of deceptively simple things. Are the following the two arrays identical except that one is native and the other one is std::array?

int native[3][4];
std::array<std::array<int, 3>, 4> arr;

No! They are not. In fact, arr is more like an int[4][3]. Note the difference in the array subscripts. The native array is an array of 3 elements where every element is itself an array of 4 integers. 3 rows and 4 columns. If you want a std::array with the same layout, what you really need is:

std::array<std::array<int, 4>, 3> arr;

That's quite annoying for two reasons. First of all, it is quite verbose. Think of three or higher dimensional arrays. It's just ugly. Worse yet, the array bounds must be spelled out in reverse order.

Fear not, help is on its way! Meet template aliases! It is a way of defining synonyms for other types including partially defined templates. With C++03 typedef, you must use a fully specialized template to create an alias. C++11 still has that restriction on typedefs but template aliases will let you create "typedefs" for template partial specializations. It is a C++11 solution for the Templated Typedef idiom. Lets try to write one for a two dimensional std::array.

template <class T, size_t ROW, size_t COL<
using Matrix = std::array<std::array<T, COL>, ROW>;
Matrix<float, 3, 4> mat;

This one is much nicer. Matrix is a template alias for a partial specialization of std::array. With that, defining a two dimensional matrix (mat) is really easy. Note that order of ROW and COL. An alias for two dimensional native array can also be defined similarly.

template <class T, size_t ROW, size_t COL>
using NativeMatrix = T[ROW][COL];
NativeMatrix<float, 3, 4> mat;

So far so good. Lets move on to multi-dimensional arrays. Using template aliases for arbitrary dimension arrays would need variadic templates. Unfortunately, it is not straightforward at all.

template <class T, size_t... DIMS>
using MultiDimArray = std::array<??? ...DIMS... ???>;
MultiDimArray<float, 3, 4, 5 , 6, 7> floats;

DIMS parameter pack can be expanded where a comma separated list is expected. However, multi-dimensional std::array definitions are hierarchical, which prevents straight-forward parameter pack expansion.

One way I could make it work was using a little meta-program.

template <class T, size_t I, size_t... J>
struct MultiDimArray
{
using Nested = typename MultiDimArray<T, J...>::type;
// typedef typename MultiDimArray<T, J...>::type Nested;
using type = std::array<Nested, I>;
// typedef std::array<Nested, I> type;
};

template <class T, size_t I>
struct MultiDimArray<T, I>
{
using type = std::array<T, I>;
// typedef std::array<T, I> type;
};

MultiDimArray<float, 3, 4, 5, 6, 7>::type floats;

MultiDimArray is a pair of meta-functions to compute nested type for multi-dimensional std::array. The most general MultiDimArray is a variadic template of unsigned integers to pass an arbitrary number of dimensions. The terminating MultiDimArray specialization defines the simplest case of single dimensional std::array. I'm using the "using" syntax instead of typedefs. The equivalent typedef are written in comments.

Similarly, MultiDimNative can also defined to create a native array of arbitrary dimensions.

template <class T, size_t I, size_t... J>
struct MultiDimNative
{
using Nested = typename MultiDimNative<T, J...>::type;
using type = Nested [I];
};

template <class T, size_t I>
struct MultiDimNative<T, I>
{
using type = T [I];
};

MultiDimNative<double, 3, 4, 5, 6, 7>::type doubles;

To make sure that the layout of all the arrays is the same I wrote a small test program.

template <class T, size_t I, size_t J>
union data
{
T native[I][J];
NativeMatrix<T, I, J> native_matrix;
Matrix<T, I, J> matrix;
typename MultiDimArray<T, I, J>::type multidim_array;
typename MultiDimNative<T, I, J>::type multidim_native;
};

template <class T>
void print(T & t, size_t rows, size_t columns)
{
for(size_t i = 0;i < rows; ++i)
{
for(size_t j = 0;j < columns; ++j)
printf("%d ", t[i][j]);

printf("\n");
}
printf("\n");
}
int main()
{
constexpr size_t I = 3;
constexpr size_t J = 4;

data<int, I, J> d;
size_t val = 0;

for(size_t i = 0;i < I; ++i)
{
for(size_t j = 0;j < J; ++j)
d.native[i][j] = val++;
}

print(d.native, I, J);
print(d.native_matrix, I, J);
print(d.matrix, I, J);
print(d.multidim_array, I, J);
print(d.multidim_native, I, J);

return 0;
}

This program defines a union of 5 different arrays, all of which have the same dimensions and layout. It populates a native two dimensional array and prints the contents using all other multi-dimensional arrays defined in the same union. As of today, only clang accepts this program.

That's it for now!

Tuesday, July 26, 2011

Compile-time regex matcher using constexpr

With my growing constexpr fascination, I thought of using it for something that would be really hard using template meta-programs. How about implementing a compile-time regular expression matcher? Fortunately, a simple regular expression matcher has already been written by Rob Pike. I just rewrote it using constexpr: single return statement in functions, no modifications to the parameters, abundant ternery operators, and recursion. Here we go...

constexpr int match_c(const char *regexp, const char *text);
constexpr int matchhere_c(const char *regexp, const char *text);
constexpr int matchstar_c(int c, const char *regexp, const char *text);
constexpr int matchend_c(const char * regexp, const char * text);

constexpr int matchend_c(const char * regexp, const char * text)
{
return matchhere_c(regexp, text) ? 1 :
(*text == '\0') ? 0 : matchend_c(regexp, text+1);
}

constexpr int match_c(const char *regexp, const char *text)
{
return (regexp[0] == '^') ? matchhere_c(regexp+1, text) :
matchend_c(regexp, text);
}

/* matchhere: search for regexp at beginning of text */
constexpr int matchhere_c(const char *regexp, const char *text)
{
return (regexp[0] == '\0') ? 1 :
(regexp[1] == '*') ? matchstar_c(regexp[0], regexp+2, text) :
(regexp[0] == '$' && regexp[1] == '\0') ? (*text == '\0') :
(*text!='\0' && (regexp[0]=='.' || regexp[0]==*text)) ?
matchhere_c(regexp+1, text+1) : 0;
}

/* matchstar: search for c*regexp at beginning of text */
constexpr int matchstar_c(int c, const char * regexp, const char *text)
{
return matchhere_c(regexp, text) ? 1 :
(*text != '\0' && (*text == c || c == '.')) ?
matchstar_c(c, regexp, text+1) : 0;
}

#define TO_STR_IMPL(R) #R
#define TO_STR(R) TO_STR_IMPL(R)

int main(void)
{
static_assert(match_c(TO_STR(REGEX), TO_STR(TEXT)), "...");

return 0;
}


To compile it, as of today, you need g++ 4.6 or better. You've to pass REGEX and TEXT as #defines while compilation. For instance, -D REGEX=o$ -D TEXT=Foo It matches!

I used two macros TO_STR and To_STR_IMPL to convert the REGEX and TEXT into string literals. #R is basically using the preprocessor stringification technique. For some reason I need two separate TO_STR macros for TEXT substitution and stringification. Seems like the gcc preprocessor can't do those two things in a single macro.

Have fun!

Tuesday, July 19, 2011

Want speed? Use constexpr meta-programming!

It's official: C++11 has two meta-programming languages embedded in it! One is based on templates and other one using constexpr. Templates have been extensively used for meta-programming in C++03. C++11 now gives you one more option of writing compile-time meta-programs using constexpr. The capabilities differ, however.

The meta-programming language that uses templates was discovered accidently and since then countless techniques have been developed. It is a pure functional language which allows you to manipulate compile-time integral literals and types but not floating point literals. Most people find the syntax of template meta-programming quite abominable because meta-functions must be implemented as structures and nested typedefs. Compile-time performance is also a pain point for this language feature.

The generalized constant expressions (constexpr for short) feature allows C++11 compiler to peek into the implementation of a function (even classes) and perform optimizations if the function uses constants (literals) only. Constants can be integral, floating point, as well as string literals. The signature of constexpr functions is just like regular C++ functions but the body has several restrictions, such as only one return statement is allowed. Nevertheless, the syntax of constexpr functions is significantly friendlier than that of template-based meta-functions. Contrary to the design of templates, designers of generalized constant expressions are well-aware of its meta-programming capabilities.

In my view, the most interesting aspect of constexpr is its speed. constexpr functions can perform compile-time computations at lightening speed. To compare the performance I implemented an is_prime algorithm in 3 different ways. Here is the algorithm in regular C++:


static bool IsPrime(size_t number)
{
if (number <= 1)
return false;

for (size_t i = 2; i*i <= number; ++i)
if (number % i == 0)
return false;

return true;
}


Here is the a template version of the same algorithm. Obviously, it is implemented as a collection of meta-functions in terms of structures.


struct false_type
{
typedef false_type type;
enum { value = 0 };
};

struct true_type
{
typedef true_type type;
enum { value = 1 };
};

template<bool condition, class T, class U>
struct if_
{
typedef U type;
};

template <class T, class U>
struct if_<true, T, U>
{
typedef T type;
};

template<size_t N, size_t c>
struct is_prime_impl
{
typedef typename if_<(c*c > N),
true_type,
typename if_<(N % c == 0),
false_type,
is_prime_impl<N, c+1> >::type >::type type;
enum { value = type::value };
};

template<size_t N>
struct is_prime
{
enum { value = is_prime_impl<N, 2>::type::value };
};

template <>
struct is_prime<0>
{
enum { value = 0 };
};

template <>
struct is_prime<1>
{
enum { value = 0 };
};


The above meta-program is a recursive implementation because that's how pure functional languages work. The is_prime_impl meta-function does all the heavy lifting. To prevent infinite regression, lazy instantiation technique is used. All I can say at this point is you've to stare at the code to make sense out of it.

And that's precisely the point: C++03 meta-programs are often unreadable and hard to comprehend. Not anymore! Here is the constexpr version of the same algorithm.


constexpr bool is_prime_recursive(size_t number, size_t c)
{
return (c*c > number) ? true :
(number % c == 0) ? false :
is_prime_recursive(number, c+1);
}

constexpr bool is_prime_func(size_t number)
{
return (number <= 1) ? false : is_prime_recursive(number, 2);
}


Well, I agree, although this version uses friendlier syntax, is not quite easy to read. Just like the previous one, it is recursive. I would say just get used to it! You can't live without recursion in C++ meta-programming world. Secondly, every function has a single return statement and the codifying logic for detecting primality requires abundant use of the ternary operator.

As long as parameter number is an integral constant, this constexpr version will compute the result at compile-time (C++11 compilers only). And when the number is a run-time integer, this same function is perfectly capable of computing the result at run-time. So you don't need two different versions of the same program: one for compile-time and another for run-time. One implementation does it all!


int main(void)
{
static_assert(is_prime_func(7), "..."); // Computed at compile-time
int i = 11;
int j = is_prime_func(i)); // Computed at run-time
}


Now that we have the implementations, lets talk performance. Take 4256233. This is 30,000th prime number! How long does the template version take to check if it is prime? It cannot! Yes, g++ 4.7 fails to compile is_prime<4256233>::value because the computation exceeds the default (900) limit of the recursive template instantiations. So I used -ftemplate-depth-2100 allowing 2100 recursive instantiations! Does it work? Yes! How long does it take? About 1 second! That's not bad at all, you say. How fast is constexpr? About 0.154 seconds!

Seems like templates are not too far behind. Wrong! How about fifth million prime number! It is 86028121. I could not get it to work with g++ 4.7. Somewhere in the range of 9300 recursive instantiations g++ seg-faults. That's brutal! isn't it? 9000+ recursive instantiations? There has to be a better way.

So how long does the constexpr take for our fifth million prime number? 0.220 seconds! That's right! This large prime number makes almost no impact on constexpr compilation time. What could be the reason? There are no template instantiations. C++ compilers always did compile-time computations whenever it was possible. constexpr now allows user-defined abstractions to hide those computations behind functions and yet allow compilers to see through them.
By the way, I had to increase to depth of default allowed constexpr recursion using -fconstexpr-depth=9300. The compiler bails out after this limit and resorts to run-time computation. Remember, the function is perfectly good for run-time computation as well (unlike templates version).

I hope this little exercise will convince you that constexpr is the way to go for static meta-programming in C++ as long as you are dealing with literals (integral, floating point numbers, and strings). It is not clear whether constexpr functions are suitable for manipulating complex meta-programmable types, such as mpl::vector and related meta-functions, such as mpl::contains, mpl::push_back, and so on. If you are interested in playing with the above example more, here is the code: is_prime.cpp and prime-test.sh

Thursday, June 09, 2011

BoostCon'11 video on LEESA: Language for Embedded Query and Traversal

BoostCon'11, held in Aspen, Colorado, was a fantastic conference this year. Not only because I got a chance to present my work on LEESA but also because of the breadth and depth of the topics covered.

LEESA, as you may recall, is an embedded language in C++ to simplify XML programming. LEESA's programming model sits on top of the APIs generated by modern XML data binding tools. LEESA gives you XPath-like syntax (wildcards, child-axis, descendant-axis, tuples) to simplify data extraction from an XML object model.

I had a privilege to talk at length about LEESA in BoostCon'11. In the 1hr 41 minutes long video, I'm talking everything from why you need LEESA, how its implemented using cool C++ techniques, such as templates, meta-programming, compile-time and run-time performance, and what direction it may take in future. Here are the slides of the presentation.