Sunday, August 13, 2017

Binding std::function to member functions

I realized that std::function can be bound to member functions without requiring the *this object. Consider the following examples.
// std::string::empty is a const function. All variables from e1 to e5 are fine.
std::function<bool(std::string)> e1 = &std::string::empty;
std::function<bool(std::string &)> e2 = &std::string::empty;
std::function<bool(const std::string &)> e3 = &std::string::empty;
std::function<bool(std::string *)> e4 = &std::string::empty;
std::function<bool(const std::string *)> e5 = &std::string::empty;

// std::string::push_back is not a const function. p4 and p5 don't compile.
std::function<void(std::string, char)> p1 = &std::string::push_back;
std::function<void(std::string &, char)> p2 = &std::string::push_back;
std::function<void(std::string *, char)> p3 = &std::string::push_back;

// These two don't compile because push_back is a non-const function 
std::function<void(const std::string &, char)> p4 = &std::string::push_back;
std::function<void(const std::string *, char)> p5 = &std::string::push_back;
I thought I knew how to do that but this time I found that the syntax is a little different than what I had in mind.

I used tho think (incorrectly) that just like function types for free-standing functions, one would create a member-function type. While, it's straight-forward to create function types for free-standing functions, I think it's not possible to create member function types. Don't get me wrong. One can create pointer-to-member-function type just like pointer-to-a-function type. Here's what I mean.
using F = int(const char *); // a function type of a free standing function.
F f = std::atoi; // error because a there are no instances of a function type.
F* fnew = new F(); //error because new can't be applied to a function type.
F* fptr = &std::atoi; // OK. A pointer to a function type is initialized.

// However, there's no function type for member functions
using GPTR = bool (std::string::*)() const; // OK. This is a pointer-to-member-function-type. 
GPTR gptr = &std::string::empty; // OK. This is a pointer to a member function.
string s;
std::cout << (s.*gptr)(); // OK. prints 1
using H = decltype(*gptr); // error. It makes no sense without a std:string object. Illformed.

bool (*x)(const std::string &) = gptr; // error. Incompatible types.
std::function<bool (const std::string &)> fobj = gptr; // OK! Neat!
Therefore, std::function uses the syntax of free-standing function types to bind to pointer to member functions. That's neat.

Sunday, August 06, 2017

Avoiding intermediate copies in std::accumulate

std::accumulate makes a ton of copies internally. In fact it's 2x the size of the number of elements in the iterator range. To fix, use std::ref and std::reference_wrapper for the initial state. std::shared_ptr is also a possibility if the accumulated state must be dynamically allocated for some reason. Live code on wandbox.

Update: Please see alternative solutions in the comments section.

#include <iostream>
#include <cstdlib>
#include <algorithm>
#include <vector>
#include <string>
#include <numeric>
#include <functional>

struct Vector : public std::vector<int> {
  Vector(std::initializer_list<int> il) : std::vector<int>(il){
    std::cout << "Vector(std::initializer_list)\n";
  }
  Vector() {
    std::cout << "Vector()\n";
  }
  Vector(const Vector &v) : std::vector<int>(v) {
    std::cout << "Vector(const Vector &)\n";
  }
  Vector & operator = (const Vector &v) {
    if (this != &v) {
      std::vector<int>::operator=(v);
      std::cout << "Vector& Vector::operator=(const Vector &)\n";
    }
    return *this;
  }
  Vector & operator = (Vector && v) {
    if (this != &v) {
      std::vector<int>::operator=(std::move(v));
      std::cout << "Vector& Vector::operator=(Vector&&)\n";
    }
    return *this;
  }
  Vector(Vector&& v) : std::vector<int>(std::move(v)) {
     std::cout << "Vector(Vector&&)\n";
  }
};

void copy_accumulate(Vector &v) {
    Vector v2 = std::accumulate(v.begin(), v.end(), Vector(), 
                    [](Vector & v, int i) {
                      v.push_back(i);
                      return v;
                    });
    std::cout << "size of v2 = " << v2.size() << "\n";
}

void nocopy_accumulate(Vector &v) {
    Vector init;
    Vector v2 = std::accumulate(v.begin(), v.end(), std::ref(init), 
                    [](std::reference_wrapper<Vector> v, int i) {
                      v.get().push_back(i);
                      return v;
                    });
    std::cout << "size of v2 = " << v2.size() << "\n";
}

int main()
{
    Vector v { 1, 2, 3, 4 };
    copy_accumulate(v);
    std::cout << "=============\n";
    nocopy_accumulate(v);
}

Sunday, July 09, 2017

Playing with C++ Coroutines

While looking for some old photos, I stumbled upon my own presentation on C++ coroutines, which I never posted online to a broader audience. I presented this material in SF Bay ACCU meetup and at the DC Polyglot meetup in early 2016! Yeah, it's been a while. It's based on much longer blogpost about Asynchronous RPC using modern C++. So without further ado.


Friday, January 06, 2017

Folding Monadic Functions

In the previous two blog posts (Understanding Fold Expressions and Folding Functions) we looked at the basic usage of C++17 fold expressions and how simple functions can be folded to create a composite one. We’ll continue our stride and see how "embellished" functions may be composed in fold expressions.

First, let me define what I mean by embellished functions. Instead of just returning a simple value, these functions are going to return a generic container of the desired value. The choice of container is very broad but not arbitrary. There are some constraints on the container and once you select a generic container, all functions must return values of the same container. Let's begin with std::vector.
// Hide the allocator template argument of std::vector. 
// It causes problems and is irrelevant here.
template <class T>
struct Vector : std::vector<T> {};

struct Continent { };
struct Country { };
struct State { };
struct City { }; 

auto get_countries = [](Continent& c) -> Vector<Country> { return ... } ;
auto get_states    = [](Country& c)   -> Vector<State>   { return ... };
auto get_cities    = [](State& s)     -> Vector<City>    { return ... };
The focus of this post is folding functions like get_countries, get_states, and get_cities. As you may have noted, these functions are different from what we saw in the previous post. They don’t readily compose. The return types and argument types don’t line up. But I bet you know how to get to the cities given a Continent. Don’t you? Hold on to that intuition. It will come in handy later. In essence, my attempt is going to formalize that intuition of yours while generalizing it by leaps and bounds. Possibly, beyond recognition.

Our target here is to enable folding (a.k.a. composition) of these embellished functions similar to the compose function we saw in the previous post. We’ll call it composeM though.
Continent c;
auto cities_from_continent = composeM(get_countries, get_states, get_cities);
Vector<City> cities = cities_from_continent(c);
Let’s begin by implementing composeM using a unary left fold. Let’s use the >>= operator because we can. Of course, we’ll also need to overload >>= for our embellished functions.
template <class... Args>
auto composeM(Args&&... args)
{
  return (... >>= args);
}

template <class F, class G>
auto operator >>= (F&& f, G&& g)
{
  using FArg = arg_type_t<F>;
  using GResult = result_type_t<G>;

  return [f,g] (FArg a) -> GResult {
      using FResult = result_type_t<F>;
      GResult finalvec;
      FResult const & fvec = f(a);
      for(const auto& o: fvec) {
        const GResult& gvec = g(o);
        finalvec.insert(finalvec.end(), gvec.begin(), gvec.end());
      }
      return finalvec;
  };
}
What a nasty function is that!?

Now would be a good time to recall the intuition you had about going from a Continent to it's cities. Your intuition about implementing cities_from_continent is probably as follows. This algorithm goes all three levels in one function (with three nested loops).
  1. Get a Continent from somewhere. Initialize an empty vector of cities.
  2. Call get_countries (level 1)
  3. Call get_states for each country (level 2)
  4. Call get_cities for each state (level 3) and push_back in the vector of cities
  5. Return the vector of cities
On the other hand, the overloaded >>= function implements the same intuition with only two levels at a time (as it accepts only two functions as parameters). Going three levels requires operator >>= to be called twice. C++ compiler will do that for us because we're using the fold expression in composeM. Every time compiler calls operator >>= it returns a function that's just like it received as arguments. Specifically, the returned closure is not a generic. We find out the argument type of F (FArg) and result type of G (GResult) and use them in the definition of the lambda. The returned closure is a composite because it accepts F's argument type and returns G's result type. It glues F and G together. All that ceremony is kinda important.

When operator >>= is called the first time, F is of type Continent→Vector<Countries> (i.e., get_countries) and G is of type Country→Vector<State> (i.e., get_states). The first time around, the signature of the closure is going to be Continent→Vector<State>. It's just "like" the other two. Naturally, when this closure is composed with get_cities, operator >>= works smoothly and returns a closure of type Continent→Vector<City>, which is exactly the same as cities_from_continent you intuitively had in mind.

An important difference between the direct function and the function that's the result of composition is that the composed version materializes (creates) a temporary vector every time operator >>= is called. That's an overhead. There are ways to avoid that but we may have to consider something other than std::vector. I won't discuss that here. See Range-v3.

Another Example: boost::future

So far we've looked at composition of embellished functions that accept an argument and return a vector of some type. At the beginning of the article, I suggested that there is broad set of generic containers we can choose from. Now would be a good time to select another one. How a about boost::future? We'll go through a similar exercise of compositing function that return boost::future. We'll see if there's anything common in the composition of functions that return vectors and functions that return futures.

Let's imagine there's a database that contains information about the largest countries, states, and cities in the world. As database operations are i/o bound, using futures to communicate the possibility of long execution times is a good idea. So our function look like as follows.
auto get_largest_country = [](Continent const &) -> boost::future<Country> { ... };
auto get_largest_state   = [](Country const &)   -> boost::future<State>   { ... };
auto get_largest_city    = [](State const &)     -> boost::future<City>    { ... }
If we need the largest city in the largest state of the largest country of a given continent we'll need all the three functions. As before, you have an intuition behind how to do that. May be it's something like below.
  1. Get a Continent from somewhere.
  2. Call get_largest_country (level 1). Wait to retrieve the result because db ops could take long.
  3. Call get_largest_state for that country (level 2). Wait again.
  4. Call get_largest_city for that state (level 3). Wait again.
  5. Return the city
The actual code is pretty short.
auto get_largest_city_from_continent = [](Continent const &c) {
  return get_largest_city(get_largest_state(get_largest_country(c).get()).get()).get();
};
This implementation is fully synchronous and isn't optimized to use nice properties of boost::future (yet). It may remind you of function composition we saw in the previous post. However, the .get() calls in between completely mess things for us. We can't use simple function composition because it does not call .get().

Let's rewrite operator >>= that would work with functions that return boost::future. We want to compare it with that of the Vector type.
template <class F, class G>
auto operator >>= (F&& f, G&& g)
{
  using FArg = arg_type_t<F>;
  using GResult = result_type_t<G>;
    
  return [f,g] (FArg a) -> GResult {
      return g(f(a).get());
  };
}
Just like before, this function operates only two levels at a time because we pass only two functions to it. The implementation is much simpler than before because the argument functions return a boost::future and it may contain at most one value. We retrieve it using .get() and pass it on to function g.

Needless to say this implementation of >>= does not look much like the earlier (Vector) implementation of operator >>=. Is there anything common/reusable at all? Can we refactor these two functions into something reusable? Let's observe them closely under a special microscope--generic programming microscope.

Refactoring To a Generic operator >>=

Disclaimer: This section may appear a little too abstract and hand-wavy. As I already know how the refactoring is going to look like, I may skip ahead too fast and may be too hasty about some generalizations. Let me know what you think. Here we go.

Here's what I see in the implementations of operator >>=
  1. Both functions are higher-order. I.e. they take functions as arguments and return a new composite function.
  2. The type of the composite function depends on the types of the arguments. Specifically, the composite function is "just like" the argument functions.
  3. In both cases, the argument type of the composite function is the same as that of F and the return type is the same as that of G.
  4. Function f and g accept an argument of a simple type and return a value "wrapped" in a generic container. The notable difference is of the container type (Vector and boost:future, respectively). Hmm, template template parameter?
  5. There's a dependency between f and g. f is invoked before g. Always. That's generic. This is key.
  6. The argument passed to the composite function is passed to f (i.e., f(a)). As the return value of f is a generic container, the contained value(s) are somehow extracted and passed to g. In the case of Vector we use a for loop and in the case of boost::future, we use .get(). The specific way to get to the guts of the container is dependent on the container type. The composite function "collapses" two nested levels into one. In the case of Vector, two nested loops. In the case of boost::future, two calls to the database appear as one from outside. These things are very very specific to the container. There's no hope to generalize it. This is key too and the most hand-wavy observation.
What in the world all this means? And how we might express these observations in C++ in a reusable manner? Not all is reusable as we observed. Items #1 to #5 appear something like we can write reusable code for. #6 does not as the details depend on the container type---a template (as opposed to an instantiation of it).

Sounds like we need a template template parameter. Fine. But a template template parameter of what?

The Genius of Monad

So what we are looking for is known as a monad. That's the name it gets because mathematicians reached here first--long before we programmers existed. Let's just accept the name and move on.

Based on the commonalities listed above, we need something with nearly the same capabilities except for #6. It should allow to plug an implementation that is dependent on the container type. We are in pursuit of a monad-aware implementation of operator >>=. As #1 through #5 are reusable pieces of code, let's just copy-paste them.
template <class F, class G>
auto operator >>= (F&& f, G&& g)
{
  using FArg = arg_type_t<F>;
  using GResult = result_type_t<G>;

  return [f,g] (FArg a) -> GResult {
      return get_monad<GResult>::bind(f(a), std::move(g));
  };
}
Hopefully, it does not look too unexpected.

As you can see, this implementation has the same shape but with more flexibility. It's higher-order. Returns a closure that accepts argument of the same type as F and returns the type as G. It invokes f(a). From this point on things look different. The pluggable "interface" we use is called bind. The result of f(a) is passed to bind which is dependent on type GResult. Further function g is also passed to bind because a generic implementation won't know how to peek into the value of f(a) and invoke g on it. That's the job of bind. So there must be specializations of bind for each container template. But first, we need to identify the container template. We use a template template parameter.

Here's how get_monad looks like.
template <class T>
struct get_monad;

template <template <typename> class M, class T>
struct get_monad<M<T>> : monad<M> {};
It is a simple wrapper over the real deal: monad. The job of get_monad is to pattern-match on GResult and extract the template-name and template arguments out. It simply forwards the template-name to the monad template. I.e., In case of get_monad<Vector<City>>, monad is instantiated with just Vector as in monad<Vector>. Similarly, for the function returning boost::future, monad<boost::future> is instantiated.

That brings us to the monad template, which accepts a template template parameter.
template<template <typename> class M>
struct monad 
{ 
    template <class T, class Func>
    static auto bind(M<T>& m, Func&& func) -> result_type_t<Func>;
};
This definition only serves as an "interface" and does no work. It's completely optional. Note the signature of bind though. It accepts an argument of type M<T>, whatever M is. We'll look at two examples shortly. It also accepts a function, which must also return M<T>. Bind calls one embellished function at a time. At this point, what we really need are the specializations of this template. Here's a monad specialization for Vector.
template<>
struct monad<Vector> 
{    
    template <class T, class Func>
    static auto bind(const Vector<T>& vec, Func&& func) -> result_type_t<Func>
    { 
      // Alternatively,
      // using Outer = decltype(func(std::declval<T>()));
      using Outer = result_type_t<Func>;
      Outer outer;
      for(const T& o: vec) {
        const Outer& inner = func(o);
        outer.insert(outer.end(), inner.begin(), inner.end());
      }
      return outer;
    }
};
The bind function accepts a Vector object and a function as arguments. The job of bind is to unwrap the Vector, get the values, and pass them to the function. That's just half the story. The function itself returns a Vector for every call. bind can return only one Vector. Therefore, bind needs to combine/flatten all the Vectors into one. The outer Vector is that flat Vector, which contains all the elements from the individual Vectors received from func. Note that if if the input vector is empty, the resulting vector is also empty. Likewise, if the function func returns empty Vectors every-time, the resulting Vector is also empty. This behavior is exactly same as commonly discussed list monad.

And here's a possible implementation using boost::future.
template<>
struct monad<boost::future>
{
    template <class T, class Func>
    static auto bind(boost::future<T> fut, Func&& func) -> result_type_t<Func> { 
      return func(fut.get()); 
    }
};
Like the Vector version, future version also accepts an object of future type (i.e., the container type) and a function. The bind function extracts the value out from the future by calling get. Only bind would know how to do that. It passes it on to the function. The function itself returns a future. We simply return it. This version is fully synchronous and wait for the result of the first function to arrive before calling the next function. Recall that fut is the result of calling f(a) in operator >>=.

The bind function is not unique and it may be implemented in multiple ways. A better alternative with boost::future is to chain the result one future to a subsequent function. In fact, that's what we've been doing all along. boost::future provides a nice api called .then to chain embellished functions that return futures. Therefore, an alternative implementation of monad<boost::future>::bind could be as follows.
#define BOOST_THREAD_PROVIDES_FUTURE_CONTINUATION
#define BOOST_THREAD_PROVIDES_FUTURE_UNWRAP
#define BOOST_THREAD_PROVIDES_FUTURE
#include <boost/thread/future.hpp>

template<>
struct monad<boost::future>
{
    template <class T, class Func>
    static auto bind(boost::future<T> fut, Func&& func) -> result_type_t<Func> { 
        return fut.then([func](boost::future<T> f) mutable { 
            return func(f.get()); // calling .get here does not block.
        }).unwrap();
    }
};
The boost::future::then function is very similar to bind. The main difference is that the argument lambda to .then accepts a boost::future object as opposed the value in it. This complication is to support the possibility of futures that complete with an exception. If the first method (f) returns a future that resolves into an error, we may need to know about it. If a future resolves to an error, .get() rethrows the exception skipping all the subsequent chained functions. That's exactly what we want.

There's one more wrinkle that isn't clear in the code. .then wraps the return type of the lambda in a future. In our case the lambda returns a future itself. (e.g., future<State>). Therefore, the resulting type of .then is future<future<T>>. That does not match with that of bind, which is just future<T>. There're two ways out provided by boost::future. Either you can call .unwrap on future<future<T>> or rely on the unwrapping constructor of boost::future that auto converts future<future<T>> to future<T>. I opted for the former for clarity.

Putting It All Together

With this refactoring in place, our composeM function that does a left fold of embellished functions works just fine.
void test()
{
  auto get_cities_from_continent = composeM(get_countries, get_states, get_cities);
  auto get_largest_city_from_continent = composeM(get_largest_country, get_largest_state, get_largest_city);

  Continent c;
  auto cities = get_cities_from_continent(c);
  auto largest_city = get_largest_city_from_continent(c).get();

  std::cout << cities << "\n" << largest_city << "\n";
}
That's pretty neat. The type of the container does not matter. We successfully abstracted the details of the container type in to it's respective bind functions. For new generic containers, all we need to do is to implement a specialization of bind. Bind is the function that allows us to compose (a.k.a. chain) a set of embellished functions belonging to the same container type (monad). As long as an implementation of bind satisfying the signature exists for a container, we can use it in fold expressions. Clearly, C++ fold expressions are quite versatile. As an exercise, try implementing bind for std::optional. It's fun! Find out what monad that is.

Something's Amiss?

If you are familiar with the concept of a monad from Haskell or other languages, you know already that the monad abstraction presented above is not complete. It's missing a function called return (Haskell lingo), that accepts a unwrapped value and puts it into the chosen generic container (monad). Obviously, it is highly specific to the container used. Putting a value in a future differs significantly from putting a value in a Vector.

I'm unsure if we really need a function like return because we just have constructors in C++ to do the job. For the sake of completeness, however, I will include it.
template<>
struct monad<Vector> 
{
    template <class T>
    static Vector<std::decay_t<T>> return_(T&& t) { 
        Vector<std::decay_t<T>> vec;
        vec.push_back(std::forward<T>(t));
        return vec;
        // return { t }; // gcc hates this line.
    }
    ....
};

template<>
struct monad<boost::future>
{
    template <class T>
    static boost::future<std::decay_t<T>> return_(T&& t) { 
        return boost::make_ready_future(std::forward<T>(t)); 
    }
    ....
};
Note that the type of return_ matches with that of our embellished functions. As return_ does not do anything but "put a value into the monad", we can use it as the identity for monadic function composition. Let's use a binary left fold as a demonstration.
template <class Head, class... Tail>
struct head {
    using type = Head;   
};

template <class... Args>
auto composeM(Args&&... args)
{
  static_assert(sizeof...(Args) > 0);
  using Func = typename head<Args...>::type;
  using FArg = arg_type_t<Func>;
  auto identity = [](FArg a) { 
        return get_monad<result_type_t<Func>>::return_(a); 
  };
  return (identity >>= ... >>= args);
}
This implementation of composeM works as long as there's at least one function is passed to it. That's because composeM won't know which monad you are talking about without looking at at least one of them in the return type of the passed function. identity simply forwards its argument to the return_ function. It cheats a little in the process. It finds out the argument_type of Func (FArg). That's not strictly necessary---especially if we use generic lambdas and simply forward the argument to bind. If we do that, we need a few tweaks in operator >>=.

On a closer examination, it's clear that operator >>= does not depend on the argument type at all. It only needs to figure out the container type---the template template parameter to use to pull-in the right monad. So, GResult is all we need.

Here's the final revised version of operator >>= for folding monadic functions.
template <class F, class G>
auto operator >>= (F&& f, G&& g)
{
  //using FArg = arg_type_t<F>; not needed
  using GResult = result_type_t<G>;

  return [f=std::forward<F>(f), 
          g=std::forward<G>(g)] (auto&& a) mutable -> GResult {
      return get_monad<GResult>::bind(f(std::forward<decltype(a)>(a)), std::forward<G>(g));
  };
}
Oh, I almost forgot. The choice of operator >>= is not an accident. In Haskell >>= works the same way as our bind function. On the other hand, operator >>= is like Haskell's fish operators (>=> for right fish and <=< for left fish).

That's it! If you are still here, thank you for reading. Here's live code if you take it for a spin. This concludes the three part series on C++17 Fold Expressions. See previous posts: #1 and #2.

Appendix

Here're the templates to extract argument and result types of lambdas. They don't work with generic lambdas though.
namespace detail {

template <typename Func>
struct function_traits {
  using arg_type = typename function_traits<decltype(&Func::operator())>::arg_type;
  using result_type = typename function_traits<decltype(&Func::operator())>::result_type;
};

template <typename Func>
struct function_traits<Func &> {
  using arg_type = typename function_traits<decltype(&Func::operator())>::arg_type;
  using result_type = typename function_traits<decltype(&Func::operator())>::result_type;
};

template <typename R, typename C, typename A>
struct function_traits<R (C::*)(A)> {
  using result_type = R;
  using arg_type = A;
};

template <typename R, typename C, typename A>
struct function_traits<R (C::*)(A) const> {
  using result_type = R;
  using arg_type = A;
};

template <typename R, typename A>
struct function_traits<R (*)(A)> {
  using result_type = R;
  using arg_type = A;
};

template <class T>
using result_type_t = typename function_traits<T>::result_type;

template <class T>
using arg_type_t = typename function_traits<T>::arg_type;

}

Tuesday, December 27, 2016

Folding Functions

In the last post we looked at basic usage of C++17 Fold Expressions. I found that many posts on this topic discuss simple types and ignore how folds may be applicable to more complex types as well. [Edit: Please see the comments section for some examples elsewhere in the blogosphere.] In this post I'm going to describe folding over functions.

Composing Functions

Function composition is a powerful way of creating complex functions from simple ones. Functions that accept a single argument and return a value are easily composable. Consider the following example to compose two std::functions.
template <class A, class B, class C>
std::function<C(A)> compose(std::function<B(A)> f, std::function<C(B)> g)
{
  return [=](A a) -> C { return g(f(a)); };
}

int main(void)
{
    std::function<int(std::string)> to_num = [](std::string s) { return atoi(s.c_str()); };
    std::function<bool(int)> is_even = [](int i) { return i%2==0; };
    auto is_str_even_num = compose(to_num, is_even);
    std::cout << std::boolalpha << is_str_even_num("1234"); // prints true
}
Function compose accepts two std::function arguments and returns another one. The types of these std::function arguments are important. f is a function from A->B where as g is a function from B->C. Therefore, it makes sense that compose can generate another function of type A->C. The output f goes to the input of g. The implementation of the lambda confirms that.

As std::function is kind of verbose and not very idiomatic in C++ when you want to pass functions around. I'll try to use C++11 lambdas initially. I want to stay away from generic lambdas because argument and return types are kinda important. In generic lambdas, however, it's impossible to find their argument and return types without knowing an actual argument or its type. Note that in compose function we have access to functions only and no arguments.

Lets rewrite compose to accepts lambdas.
template <class F, class G>
auto compose(F&& f, G&& g)
{
  using ArgType    = detail::arg_type_t<F>;
  using ResultType = detail::result_type_t<G>;  
  return [f,g](ArgType a) -> ResultType { return g(f(a)); };
}
F and G are generic arguments, which we expect to be non-generic lambdas. We extract the argument type of F and result type of G and return a composition of two lambdas satisfying the type signature.

This implementation is not very idiomatic. Extracting the argument and return types of functions in this style is falling out of favor. std::function::argument_type and std::function::result_type have been deprecated in C++17. A more idiomatic way would have been to return a generic lambda without bothering the argument type. C++ clearly wants to favor duck-typing at compile-time. Until we've concepts in the language, of course.

I'll skip the implementation of the detail namespace. It's in the same vein as this stackoverflow answer.

Folding Functions

Folding functions is a generalization of function composition applied to fold expressions. First, we need to pick up an operator to use fold expressions with. I like >> as it's quite intuitive. Here's the earlier function implemented as an overloaded operator.
template <class F, class G>
auto operator >>(F&& f, G&& g)
{
  using ArgType    = detail::arg_type_t<F>;
  using ResultType = detail::result_type_t<G>;  
  return [f,g](ArgType a) -> ResultType { return g(f(a)); };
}
We'll now write a new compose function that uses a fold expression over function types. Of course, it's going to use the overloaded >> operator.
template <class... Funcs>
auto compose(Funcs... funcs)
{
   return (... >> funcs);   
}
The earlier main program will work just fine with this new implementation of compose. It works with lambdas too.
auto to_num = [](std::string s) { return atoi(s.c_str()); };
auto is_even = [](int i) { return i%2==0; };
auto is_str_even_num = compose(to_num, is_even);
std::cout << std::boolalpha << is_str_even_num("1234") << "\n"; // prints true
Interestingly, this compose function works fine with a single argument as it simply returns the argument as discussed in the previous post. It does not work with empty parameter pack however. What could we return when we get an empty parameter pack? In other words what would be the identity for the function type? Well, it's just a function that returns its argument. Let's see it in action using a binary fold.
struct Identity
{
  template <class T>
  T operator()(T&& t) { return std::forward<T>(t); }
};

template <class... Funcs>
auto compose(Funcs... funcs)
{
   return (Identity() >> ... >> funcs);   
}
Only problem, however, is that it does not compile. Not that anything is wrong with binary folds but the overloaded >> for generic functions cannot digest Identity. Identity has a generic function call operator. There's no way to get it's argument_type and result_type without knowing the type of the argument. The compose function does not have it.

We're therefore forced to use a generic implementation of operator >>.
template <class F, class G>
auto operator >>(F&& f, G&& g)
{
  return [f,g](auto a) { return g(f(a)); };
}
With this final variation, functions can be folded over in a binary fold expression.

I will conclude this blog post with a bit of monoid theory.

You might wanna ask yourself if function composition is another monoid? As it turns out, it is. It makes sense intuitively. Composition of two functions give rise to another function. The composition is also associative. It does not matter if we call compose(f, compose(g,h)) or compose(compose(f,g),h). The end result is the same. Squint a little and you will realize that they are just left and right folds. Finally, there's also an identity function, which when combined with any other function makes no observable difference. Therefore, we can say that function form a monoid under composition.

Next time we'll look at even more interesting functions---those return values wrapped a generic "container".

Monday, December 26, 2016

Understanding Fold Expressions

C++17 has an interesting new feature called fold expressions. Fold expressions offer a compact syntax to apply a binary operation to the elements of a parameter pack. Here’s an example.
template <typename... Args>
auto addall(Args... args) 
{
  return (... + args);
}
addall(1,2,3,4,5); // returns 15.
This particular example is a unary left fold. It's equivalent to ((((1+2)+3)+4)+5). It reduces/folds the parameter pack of integers into a single integer by applying the binary operator successively. It's unary because it does not explicitly specify an init (a.k.a. identity) argument. So, let add it.
template <typename... Args>
auto addall(Args... args) 
{
  return (0 + ... + args);
}
addall(1,2,3,4,5); // returns 15.
This version of addall is a binary left fold. The init argument is 0 and it's redundant (in this case). That's because this fold expression is equivalent to (((((0+1)+2)+3)+4)+5). Explicit identity elements will come in handy a little later---when we have empty parameter packs or if we use user-defined types in fold expressions.

Fold expressions can be defined over a number of operators. 32 to be precise. They are + - * / % ^ & | = < > << >> += -= *= /= %= ^= &= |= <<= >>= == != <= >= && || , .* ->*.

In this post you will see an example of each and see how each one behaves. So here's the whole enchilada.
#include <iostream>
#include <iomanip>

#define UNARY_LEFT_FOLD(NAME, OP)   \
template<typename... Args>          \
auto NAME(Args&&... args) {         \
  return (... OP args);             \
}

UNARY_LEFT_FOLD(add,+);
UNARY_LEFT_FOLD(sub,-);
UNARY_LEFT_FOLD(mul,*);
UNARY_LEFT_FOLD(divide,/);
UNARY_LEFT_FOLD(mod,%);
UNARY_LEFT_FOLD(bxor,^);
UNARY_LEFT_FOLD(band,&);
UNARY_LEFT_FOLD(bor,|);
UNARY_LEFT_FOLD(assign,=);
UNARY_LEFT_FOLD(lt,<);
#ifndef __clang__ 
UNARY_LEFT_FOLD(gt,>); 
UNARY_LEFT_FOLD(rshift,>>); 
#endif
UNARY_LEFT_FOLD(lshift,<<);
UNARY_LEFT_FOLD(addassign,+=);
UNARY_LEFT_FOLD(subassign,-=);
UNARY_LEFT_FOLD(mulassign,*=);
UNARY_LEFT_FOLD(divassign,/=);
UNARY_LEFT_FOLD(modassign,%=);
UNARY_LEFT_FOLD(bxorassign,^=);
UNARY_LEFT_FOLD(bandassign,&=);
UNARY_LEFT_FOLD(borassign,|=);
UNARY_LEFT_FOLD(lshiftassign,<<=);
UNARY_LEFT_FOLD(rshiftassign,>>=);
UNARY_LEFT_FOLD(equals,==);
UNARY_LEFT_FOLD(nequals,!=);
UNARY_LEFT_FOLD(lte,<=);
UNARY_LEFT_FOLD(gte,>=);
UNARY_LEFT_FOLD(land,&&);
UNARY_LEFT_FOLD(lor,||);
UNARY_LEFT_FOLD(objptrmem,.*);
UNARY_LEFT_FOLD(ptrptrmem,->*);

template<typename... Args>
auto comma(Args&&... args) {
  return (... , args);
}

struct Phone  { int ext; };
struct Person { Phone phone;  };

int main(void) 
{
  std::cout << std::boolalpha;
  std::cout << "add "            << add(1)           << " " << add(1,2,3)        << "\n";// 1
  std::cout << "sub "            << sub(1)           << " " << sub(1,2,3)        << "\n";
  std::cout << "mul "            << mul(1)           << " " << mul(1,2,3)        << "\n";
  std::cout << "divide "         << divide(1)        << " " << divide(18,2,3)    << "\n";
  std::cout << "mod "            << mod(1)           << " " << mod(23, 3,2)      << "\n";
  std::cout << "bxor "           << bxor(1)          << " " << bxor(1,2,4)       << "\n";
  std::cout << "band "           << band(1)          << " " << band(1,3,7)       << "\n";
  std::cout << "assign "         << assign(1)        << " " << assign(1,2,4)     << "\n";
    
  auto a = 99; 
  std::cout << "assign-a "       << assign(a);
  std::cout << " "               << assign(a,2,4);
  std::cout << " "               << a << "\n";
    
  #ifndef __clang__ 
  std::cout << "gt "             << gt(1)          << " " << gt(3,2,0)         << "\n"; 
  std::cout << "rshift "         << rshift(1)        << " " << rshift(32,2,2)    << "\n"; 
  #endif

  std::cout << "lt "             << lt(1)            << " " << lt(1,2,-1)         << "\n"; 
  std::cout << "lshift "         << lshift(1)        << " " << lshift(1,2,3)     << "\n";
  std::cout << "addassign "      << addassign(1)     << " " << addassign(2,3,2)  << "\n";
  std::cout << "subassign "      << subassign(1)     << " " << subassign(7,2)    << "\n";
  std::cout << "mulassign "      << mulassign(1)     << " " << mulassign(2,3,2)  << "\n";
  std::cout << "divassign "      << divassign(1)     << " " << divassign(7,2)    << "\n";
  std::cout << "modassign "      << modassign(1)     << " " << modassign(23,3,2) << "\n";
  std::cout << "bxorassign "     << bxorassign(1)    << " " << bxorassign(7,2)   << "\n";
  std::cout << "bandassign "     << bandassign(1)    << " " << bandassign(7,6)   << "\n";
  std::cout << "borassign "      << borassign(1)     << " " << borassign(1,2,4,8) << "\n";
  std::cout << "lshiftassign "   << lshiftassign(1)  << " " << lshiftassign(8,2)  << "\n";
  std::cout << "rshiftassign "   << rshiftassign(1)  << " " << rshiftassign(16,1,2)   << "\n";
  std::cout << "equals "         << equals(1)        << " " << equals(8,3,2)     << "\n";
  std::cout << "nequals "        << nequals(1)       << " " << nequals(7,2,0)    << "\n";
  std::cout << "lte "            << lte(1)           << " " << lte(7,2,0)        << "\n";
  std::cout << "gte "            << gte(1)           << " " << gte(7,3,1)        << "\n";
  std::cout << "land "           << land()           << " " << land(7,2)         << "\n";
  std::cout << "lor "            << lor()            << " " << lor(7,2)          << "\n";
  std::cout << "comma "          << comma(1)         << " " << comma(8,3,2)      << "\n";
  
  auto phoneptr = &Person::phone;
  auto extptr = &Phone::ext;
  Person p { { 999 } };
  Person * pptr = &p;
  std::cout << "objptrmem "                   << objptrmem(p,phoneptr,extptr)       << "\n";
  std::cout << "p.*phoneptr.*extptr "         << p.*phoneptr.*extptr                << "\n";
  std::cout << "ptrptrmem(&p,phoneptr).ext "  << ptrptrmem(&p,phoneptr).ext         << "\n";  
  std::cout << "&(pptr->*phoneptr)->*extptr " << (&(pptr->*phoneptr))->*extptr      << "\n";

}
The output looks something like the following.
add 1 6
sub 1 -4
mul 1 6
divide 1 3
mod 1 0
bxor 1 7
band 1 1
assign 1 4
assign-a 99 4 4
gt 1 true
rshift 1 2
lt 1 false
lshift 1 32
addassign 1 7
subassign 1 5
mulassign 1 12
divassign 1 3
modassign 1 0
bxorassign 1 5
bandassign 1 6
borassign 1 15
lshiftassign 1 32
rshiftassign 1 2
equals 1 false
nequals 1 true
lte 1 true
gte 1 true
land true true
lor false true
comma 1 2
objptrmem 999
p.*phoneptr.*extptr 999
ptrptrmem(&p,phoneptr).ext 999
&(pptr->*phoneptr)->*extptr 999
There're a number of observations.
  1. Clang does not like > and >> operators for some reason. GCC is fine.
  2. Unary fold expressions do not like empty parameter packs except for && || and comma operators. In fact, the P0036 document describes what happens when empty parameter packs are used with these operators and why it's illegal for other operators. In short, empty parameter packs result into true, false, and void() respectively. In that sense, binary folds appear significantly superior because you can specify the identity element for fundamental and user-defined types and for all the operators.
  3. Single element parameter packs result into the value of the element type. This may be ok for some types and operators but it's very confusing for operators such as > < == != <= >= && ||. These operators return boolean result in general but not when the parameter pack has only one element. The type of the expression changes when the size of the parameter pack is greater than 1. For example, lte(1) returns a int but lte(1,3) return a boolean. That's bizarre.
  4. Multiple element parameter packs work as expected with a twist. Consider gt example on line #73. gt(3,2,0) expands to (3>2)>0, which is true>0, which is true. Similarly, lt(1,0,-1) is (1<0)<-1, which is false<-1, which is false. However, for such operators (that return a boolean result), compiler spits out copious amount of warnings saying that "comparisons like 'X<=Y<=Z' do not have their mathematical meaning". That makes sense.
  5. The assign function is curious too. Assigning to a variable makes sense. For example, assign(a,2,4) expands to (a=2)=4, which assigns 2 to a and later 4 to a. So there're two assignments. The result type is int&. The funny thing is that if you replace a with an rvalue, it still works. I don't know what the compiler is thinking at that point.
  6. Operator associativity has no consequence. For example, <<= and >>= are right-associative operators but left folds still fold from left to right. I.e., Nominally, a <<= b <<= c is equivalent to a <<= (b <<= c). With left unary fold you get (a <<= b) <<= c. If you want the former, use a unary right fold.
  7. Finally, consider the folds expressions containing pointer to members. Line #103 and below. A single, initialized pointer to member just a decays to true in a boolean context (like any other pointer). The weird thing though is that, there's no way to make sense of two or more pointers to members. I can't think of a way where they fold (a.k.a. compose) and return something meaningful. An object (of the same class as that of the member pointer) is required as the left most element in the parameter pack to deference a list of member pointers. For example, objptrmem(p,phoneptr,extptr) is the same as p.*phoneptr.*extptr. Without p, just phoneptr and extptr make no sense together.


Binary Folds

This example uses a user-defined Int type in a left binary fold. We'll also specify our own identity for our Int-based binary folds.
#include <iostream>
#include <iomanip>

struct Int {
  int value;
  explicit Int(int v=0) : value(v) {}
  explicit operator int() const { return value; }
};

std::ostream& operator << (std::ostream& o, const Int& i) {
   o << i.value;
   return o;
}

Int operator + (Int const &i, Int const &j) {
  std::cout << "Adding " << i << " " << j << "\n";
  return Int(i.value + j.value);  
};

Int operator * (Int const &i, Int const &j) {
  std::cout << "Multiplying " << i << " " << j << "\n";
  return Int(i.value * j.value);  
};

template<typename... Args>
auto addInts(Args&&... args) 
{
  return (Int{0} + ... + args);
}

template<typename... Args>
auto mulInts(Args&&... args) 
{
  return (Int{1} * ... * args);
}

int main(void)
{
  std::cout << addInts(Int{1}, Int{2}, Int{3}) << "\n"; // prints 6
  std::cout << addInts() << "\n"; // prints 0
  std::cout << mulInts(Int{1}, Int{2}, Int{3}) << "\n"; // prints 6
  std::cout << mulInts() << "\n"; // prints 1
}
Things are very much as expected in this example. For user-defined types, the operator you wish to use fold expression with must be overloaded. Int overloads binary + and *. addInts uses Int{0} as the identity element whereas mulInts uses Int{1}. Identity element is special. It's special because in case of Int addition, adding with identity element make no difference. Similarly, in Integer multiplication, multiplying with the identity element makes no difference.

I'll wrap with a quick theory about monoids.

Formally, (Int,+) is monoid with Int{0} as identity and (Int,*) is also a (different) monoid with Int{1} as identity. Two instances of the same monoid can be combined to produce a third one. In fact, Monoids can be combined arbitrarily to produce other instances of the same monoid. Left and right folds provide just 2 possible ways in which any monoid may be combined.

In the following posts, we'll create more interesting monoids and see how well fold expressions can exploit their properties.

Saturday, November 05, 2016

Dependently-typed Curried printf in C++

Just a few days ago I came across an intriguing blog-post about type-safe printf using dependent typing. The blog-post has since become inaccessible and therefore, I've copied an excerpt here. I want to thank Zesen Qian for publishing this blog-post.

.... printf originated from the C programming language and has been a headache since then because a proper call of printf requires the number and types of arguments to match the format string; otherwise, it may visit a wild memory or a wrong register. In recent versions of GCC this issue is addressed by type checks hard coded into the compiler itself, which is ugly because the compiler should not be caring about the safety of applications of a specific function....

The key issue here is that, considering the curried version, the type of the returned function of applying the format string to printf, actually depends on the value of that format string. That is to say, printf "%s" : String -> String, and printf "%d%s%d" : Int -> String -> Int -> String, and so on. This is where dependent type comes in: in dependently typed language, the type of returned values of functions can be dependent on value of the arguments; .... ---- Zesen Qian (ICFP'2106)
I thought it might be possible to achieve the same effect in C++. .

Currying

Currying is the technique of transforming a function that takes multiple arguments in such a way that it can be called as a chain of functions, each with a single argument. I've talked about currying from very basics in a previous post. I'll jump straight to an example this time.

int multiply(int i, int j) { return i*j; }

auto curry = [](auto binary) {
  return [=](int i) {
    return [=](int j) { 
      return binary(i, j);
    };
  };
};

auto mul = curry(multiply);
auto a = mul(10);
auto b = a(20);
std:: cout << b << std::endl; // prints 200

Function multiple takes both arguments at the same time. Function mul, which is a curried version, takes one argument at a time. Intermediate results, such as a, are themselves functions that take one of the remaining arguments. When all arguments are available, the original function evaluates producing a result.

Currying printf--dependently

Currying printf poses an extra challenge because (1) printf accepts a variable number of arguments and (2) the order of the types of the arguments is not fixed (past the first argument). More accurately, the order of the types of the arguments is determined by the format string. The format string of printf is a value---usually, a literal string. We want to make the types of the rest of the arguments dependent on the value of the first argument. That's pretty intriguing, imo. In effect, we need a way to codify the format string literal into a type and that's where the dependent-typing comes into play.

To codify a string literal into a type, we are going to use the C++ language feature proposed in N3599. This proposal includes an example of dependently-typed printf that accepts all arguments at once. We're going to twist it a little bit to accept one argument at a time.

The magic lies in the operator "" that converts a string literal into a type. Here's the code without further ado. Both clang and gcc support this extension. Perhaps it will be in C++17 standard soon or it already is.

#include <utility>

template <char... chars>
using CharSeq = std::integer_sequence<char, chars...>;

template <typename T, T... chars>
constexpr CharSeq<chars...> operator""_lift() { return { }; }
The CharSeq type is a synonym for std::integer_sequence<char, ...>. _lift is a function that uses C++11 user-defined literals syntax convert a string literal to an equivalent CharSeq at compile-time. For example, "cpptruths"_lift returns std::integer_sequence<char,'c','p','p','t','r','u','t','h','s'>. Check this code out.
#include <boost/core/demangle.hpp>

auto cpptruths = "cpptruths"_lift;
std::cout << boost::core::demangle(typeid(decltype(cpptruths)).name()) << "\n";
Once a string is encoded as a type, a lot of things begin to fall into place using some additional template meta-programming. First, we need to codify the type-level CharSeq into a tuple of types that directly specify the types expected by printf. For instance, "%d" expects an int and "%s" expects and const char * etc. We implement a meta-function called StringToTuple.
template <class Head, class Tuple>
struct Append;

template <class Head, class... Args>
struct Append<Head, std::tuple<Args...>>
{
  using type = std::tuple<Head, Args...>;
};

template <class CharSeq>
struct StringToTuple;

template <>
struct StringToTuple<CharSeq<>>
{
    using type = std::tuple<>;
};

template <char Any, char... chars>
struct StringToTuple<CharSeq<Any, chars...>>
{
    using type = typename StringToTuple<CharSeq<chars...>>::type;
};

template <char... chars>
struct StringToTuple<CharSeq<'%', 's', chars...>>
{
    using tail = typename StringToTuple<CharSeq<chars...>>::type;
    using type = typename Append<const char *, tail>::type;
};

template <char... chars>
struct StringToTuple<CharSeq<'%', 'd', chars...>>
{
    using tail = typename StringToTuple<CharSeq<chars...>>::type;
    using type = typename Append<int, tail>::type;
};

template <char... chars>
struct StringToTuple<CharSeq<'%', 'f', chars...>>
{
    using tail = typename StringToTuple<CharSeq<chars...>>::type;
    using type = typename Append<double, tail>::type;
};

template <char... chars>
struct StringToTuple<CharSeq<'%', 'u', chars...>>
{
    using tail = typename StringToTuple<CharSeq<chars...>>::type;
    using type = typename Append<unsigned int, tail>::type;
};

auto format = "%s%d"_lift;
StringToTuple<decltype(format)>::type FormatTuple; // std::tuple<const char *, int> 

StringToTuple meta-function uses a pattern-matching. Consider the %s specialization. When the beginning of the CharSeq is '%' followed by 's', the specialization matches recursively computes the type of the tail, which is tuple<int> in this case. The Append meta-function simply concatenates the types in a tuple at the head.

If the beginning of the CharSeq is not a '%', the first most generic version with char Any matches, which simply ignores the leading character.

Fun does not end here though. We still need to curry printf. All we have at this stage is a sequence of types and that's big leap forward.

Let's assume you have a function curried_printf_impl that accepts a format string and a CharSeq as follows.
template <class CharSeq>
auto curried_printf_impl(const char * fmt, CharSeq)
{
  using FormatType = typename StringToTuple<CharSeq>::type;
  std::cout << boost::core::demangle(typeid(FormatType).name()) << "\n";
  return curry<FormatType>::apply(fmt);
}

#define curried_printf(X) curried_printf_impl(X, X##_lift)
We've not talked about the curry template yet. Of course, it's going to use the FormatType tuple and turn it into a sequence of curried functions. The curried_printf macro helps us cleanly separate the string literal from the compile-time character sequence into two separate arguments. ## is token-pasting operator in the C preprocessor.

The target really feels within reach now. The curry template is relatively straight forward.
template <class Tuple>
struct curry;

template <class Head, class... Tail>
struct curry<std::tuple<Head, Tail...>>
{
    template<class... Args>
    static auto apply(Args&&... args) 
    {   
      return [args...](Head h) {
          return curry<std::tuple<Tail...>>::apply(args..., h); 
      };  
    }   
};

template <class Head>
struct curry<std::tuple<Head>>
{
    template <class... Args>
    static auto apply(Args&&... args) {
        return [args...](Head h) { 
            return printf(args..., h); 
        };  
    }   
};

template <>
struct curry<std::tuple<>>
{
    static auto apply(const char * fmt) {
       return printf(fmt); 
    }   
};
The general case of the curry template has an apply function that accepts arbitrary number of arguments and returns a closure that captures all those arguments (from apply) and takes exactly one more argument of Head type. As soon as it has the Head argument, it forwards it with all previous arguments to the subsequent curry<Tail...>::apply to accept and retain remaining arguments one by one. The single argument curry (the one with just Head), terminates the recursion and returns a lambda that upon receiving the last argument calls printf. Note that the format string literal is always at the beginning of args... as curried_printf_impl passes it along. If format string is the only argument, curry::apply calls printf right-away in the last no-argument specialization.

Here's the main driver program. Also on github.
int main(void)
{
  curried_printf("C++ Rocks%s %d %f\n")("!!")(10)(20.30);
  curried_printf("C++ Rocks!!\n");

  return 0;
}
If you mess up with the argument types, the error is short and relatively direct.

Avoiding Copying Arguments

The previous example makes a massive assumption that all arguments are fundamental types. That they are cheap to copy. The lambda inside the apply function captures the arguments by value and passes them on by value. The arguments are copied O(N*N) times approximately. That's gonna hurt for large types that are expensive to copy.

The Remedy is to std::move the arguments as much as possible. However, forwarding variadic arguments requires us to take some library help: std::tuple.

template <class Head, class... Tail>
struct curry<std::tuple<Head, Tail...>>
{
    template<class... Args>
    static auto apply(Args&&... args) 
    {   
      return [t=std::make_tuple(std::move(args)...)](Head h) {
          // Move each element of t and h to curry<std::tuple<Tail...>>::apply somehow.
      };  
    }   
};
It got complicated real fast. For each argument, we'll have to wrap them in a tuple and unwrap them before passing to curry::apply. Wrapping is easy. There's the code. Unwrapping is rather complicated because all arguments are not together in a tuple. Head comes separately. std::apply and std::invoke did not appear particularly useful in this case. We perhaps need a direct syntax to expand tuple into function arguments. Secondly, there's at least one copy of each Head argument anyway because the function should be type-safe and accept only Head type argument in the lambda. I thought this is more trouble than it's worth.

Currying Arbitrary Functions

To work around this problem I'm simply going to use a dynamically allocated tuple that will store the arguments as they come in. As curried function may be copied multiple times, this scheme should work out quite efficiently in such cases.
// In C++17, std::experimental::apply can replace the following execute function.

template <size_t... Indices, class Tuple, class Func>
auto execute(std::integer_sequence<size_t, Indices...>,
             Tuple&& tuple,
             Func&& func)
{
  return func(std::get<Indices>(std::forward<Tuple>(tuple))...);
}

template <int I, class AllArgs, class Tuple>
struct dyn_curry;

template <int I, class AllArgs, class Head, class... Tail>
struct dyn_curry<I, AllArgs, std::tuple<Head, Tail...>>
{
    enum { Index = std::tuple_size<AllArgs>::value - I };

    template <class Func>
    static auto apply(std::shared_ptr<AllArgs> shptr, Func&& func)
    {   
      return [shptr, func=std::move(func)](const Head &h) mutable {
        std::get<Index>(*shptr) = h;
        return dyn_curry<I-1, AllArgs, std::tuple<Tail...>>::apply(shptr, std::move(func));
      };  
    }    
};

template <class AllArgs, class Head>
struct dyn_curry<1, AllArgs, std::tuple<Head>>
{
    enum { Index = std::tuple_size<AllArgs>::value - 1 };
    using IntSeq = std::make_index_sequence<std::tuple_size<AllArgs>::value>;

    template <class Func>
    static auto apply(std::shared_ptr<AllArgs> shptr, Func&& func)
    {   
      return [shptr, func=std::move(func)](const Head &h) mutable {
        std::get<Index>(*shptr) = h;
        return execute(IntSeq(), sd::move(*shptr), std::move(func));
      };  
    }
};

template <class Ret, class... Args>
auto arb_curry(Ret (&func) (Args...))
{
  using AllArgs = std::tuple<std::decay_t<Args>...>;
  std::cout << boost::core::demangle(typeid(AllArgs).name()) << "\n";
  std::shared_ptr<AllArgs> shptr(new AllArgs);

  return dyn_curry<std::tuple_size<AllArgs>::value, AllArgs, AllArgs>::apply(shptr, func);
}

template <class Ret>
Ret arb_curry(Ret (&func) ()) { return func(); }

int print_add(std::string &msg, int &j, int k) { std::cout << msg; return j+k;   }

int identity(int i) { return i; }

int foo() { return printf("foo\n"); }

int main(void)
{
  arb_curry(foo);
  std::cout << arb_curry(identity)(99) << std::endl;
  auto a = arb_curry(print_add);
  auto b = a("Adding two integers: ");
  auto c = b(20);
  auto d = c(30);
  std::cout << d << std::endl; //prints 60.

  return 0;
}
There are three main differences in this more general implementation than the previous example.
  1. This implementation uses an explicit compile-time index to copy arguments in to the right slot in the tuple of arguments. 
  2. There's more type related noise here because each call to apply passes the shared_ptr of the tuple type to the inner lambda. 
  3. The final dispatch to the function is implemented in the execute function that expands all the arguments in the tuple as function arguments. In C++17, std::experimental::apply can replace the execute function.
Here's live code and also on github.

Conclusion

While currying C++ functions is fun, lifting C++ string literals to type-level opens up a whole new level of meta-programming in C++. constexpr functions can operate on string literals and compute integral results at compile-time. See this for an example. With constexpr function, however, we can't construct new types at compile-time depending upon the argument value. N3599 allows us to cross the string-to-type barrier at compile-time. That's pretty neat. I can already think of some intriguing applications of N3599 in serialization/deserialization of user-defined types.