Saturday, November 05, 2016

Dependently-typed Curried printf in C++

Just a few days ago I came across an intriguing blog-post about type-safe printf using dependent typing. The blog-post has since become inaccessible and therefore, I've copied an excerpt here. I want to thank Zesen Qian for publishing this blog-post.

.... printf originated from the C programming language and has been a headache since then because a proper call of printf requires the number and types of arguments to match the format string; otherwise, it may visit a wild memory or a wrong register. In recent versions of GCC this issue is addressed by type checks hard coded into the compiler itself, which is ugly because the compiler should not be caring about the safety of applications of a specific function....

The key issue here is that, considering the curried version, the type of the returned function of applying the format string to printf, actually depends on the value of that format string. That is to say, printf "%s" : String -> String, and printf "%d%s%d" : Int -> String -> Int -> String, and so on. This is where dependent type comes in: in dependently typed language, the type of returned values of functions can be dependent on value of the arguments; .... ---- Zesen Qian (ICFP'2106)
I thought it might be possible to achieve the same effect in C++. .

Currying

Currying is the technique of transforming a function that takes multiple arguments in such a way that it can be called as a chain of functions, each with a single argument. I've talked about currying from very basics in a previous post. I'll jump straight to an example this time.

int multiply(int i, int j) { return i*j; }

auto curry = [](auto binary) {
  return [=](int i) {
    return [=](int j) { 
      return binary(i, j);
    };
  };
};

auto mul = curry(multiply);
auto a = mul(10);
auto b = a(20);
std:: cout << b << std::endl; // prints 200

Function multiple takes both arguments at the same time. Function mul, which is a curried version, takes one argument at a time. Intermediate results, such as a, are themselves functions that take one of the remaining arguments. When all arguments are available, the original function evaluates producing a result.

Currying printf--dependently

Currying printf poses an extra challenge because (1) printf accepts a variable number of arguments and (2) the order of the types of the arguments is not fixed (past the first argument). More accurately, the order of the types of the arguments is determined by the format string. The format string of printf is a value---usually, a literal string. We want to make the types of the rest of the arguments dependent on the value of the first argument. That's pretty intriguing, imo. In effect, we need a way to codify the format string literal into a type and that's where the dependent-typing comes into play.

To codify a string literal into a type, we are going to use the C++ language feature proposed in N3599. This proposal includes an example of dependently-typed printf that accepts all arguments at once. We're going to twist it a little bit to accept one argument at a time.

The magic lies in the operator "" that converts a string literal into a type. Here's the code without further ado. Both clang and gcc support this extension. Perhaps it will be in C++17 standard soon or it already is.

#include <utility>

template <char... chars>
using CharSeq = std::integer_sequence<char, chars...>;

template <typename T, T... chars>
constexpr CharSeq<chars...> operator""_lift() { return { }; }
The CharSeq type is a synonym for std::integer_sequence<char, ...>. _lift is a function that uses C++11 user-defined literals syntax convert a string literal to an equivalent CharSeq at compile-time. For example, "cpptruths"_lift returns std::integer_sequence<char,'c','p','p','t','r','u','t','h','s'>. Check this code out.
#include <boost/core/demangle.hpp>

auto cpptruths = "cpptruths"_lift;
std::cout << boost::core::demangle(typeid(decltype(cpptruths)).name()) << "\n";
Once a string is encoded as a type, a lot of things begin to fall into place using some additional template meta-programming. First, we need to codify the type-level CharSeq into a tuple of types that directly specify the types expected by printf. For instance, "%d" expects an int and "%s" expects and const char * etc. We implement a meta-function called StringToTuple.
template <class Head, class Tuple>
struct Append;

template <class Head, class... Args>
struct Append<Head, std::tuple<Args...>>
{
  using type = std::tuple<Head, Args...>;
};

template <class CharSeq>
struct StringToTuple;

template <>
struct StringToTuple<CharSeq<>>
{
    using type = std::tuple<>;
};

template <char Any, char... chars>
struct StringToTuple<CharSeq<Any, chars...>>
{
    using type = typename StringToTuple<CharSeq<chars...>>::type;
};

template <char... chars>
struct StringToTuple<CharSeq<'%', 's', chars...>>
{
    using tail = typename StringToTuple<CharSeq<chars...>>::type;
    using type = typename Append<const char *, tail>::type;
};

template <char... chars>
struct StringToTuple<CharSeq<'%', 'd', chars...>>
{
    using tail = typename StringToTuple<CharSeq<chars...>>::type;
    using type = typename Append<int, tail>::type;
};

template <char... chars>
struct StringToTuple<CharSeq<'%', 'f', chars...>>
{
    using tail = typename StringToTuple<CharSeq<chars...>>::type;
    using type = typename Append<double, tail>::type;
};

template <char... chars>
struct StringToTuple<CharSeq<'%', 'u', chars...>>
{
    using tail = typename StringToTuple<CharSeq<chars...>>::type;
    using type = typename Append<unsigned int, tail>::type;
};

auto format = "%s%d"_lift;
StringToTuple<decltype(format)>::type FormatTuple; // std::tuple<const char *, int> 

StringToTuple meta-function uses a pattern-matching. Consider the %s specialization. When the beginning of the CharSeq is '%' followed by 's', the specialization matches recursively computes the type of the tail, which is tuple<int> in this case. The Append meta-function simply concatenates the types in a tuple at the head.

If the beginning of the CharSeq is not a '%', the first most generic version with char Any matches, which simply ignores the leading character.

Fun does not end here though. We still need to curry printf. All we have at this stage is a sequence of types and that's big leap forward.

Let's assume you have a function curried_printf_impl that accepts a format string and a CharSeq as follows.
template <class CharSeq>
auto curried_printf_impl(const char * fmt, CharSeq)
{
  using FormatType = typename StringToTuple<CharSeq>::type;
  std::cout << boost::core::demangle(typeid(FormatType).name()) << "\n";
  return curry<FormatType>::apply(fmt);
}

#define curried_printf(X) curried_printf_impl(X, X##_lift)
We've not talked about the curry template yet. Of course, it's going to use the FormatType tuple and turn it into a sequence of curried functions. The curried_printf macro helps us cleanly separate the string literal from the compile-time character sequence into two separate arguments. ## is token-pasting operator in the C preprocessor.

The target really feels within reach now. The curry template is relatively straight forward.
template <class Tuple>
struct curry;

template <class Head, class... Tail>
struct curry<std::tuple<Head, Tail...>>
{
    template<class... Args>
    static auto apply(Args&&... args) 
    {   
      return [args...](Head h) {
          return curry<std::tuple<Tail...>>::apply(args..., h); 
      };  
    }   
};

template <class Head>
struct curry<std::tuple<Head>>
{
    template <class... Args>
    static auto apply(Args&&... args) {
        return [args...](Head h) { 
            return printf(args..., h); 
        };  
    }   
};

template <>
struct curry<std::tuple<>>
{
    static auto apply(const char * fmt) {
       return printf(fmt); 
    }   
};
The general case of the curry template has an apply function that accepts arbitrary number of arguments and returns a closure that captures all those arguments (from apply) and takes exactly one more argument of Head type. As soon as it has the Head argument, it forwards it with all previous arguments to the subsequent curry<Tail...>::apply to accept and retain remaining arguments one by one. The single argument curry (the one with just Head), terminates the recursion and returns a lambda that upon receiving the last argument calls printf. Note that the format string literal is always at the beginning of args... as curried_printf_impl passes it along. If format string is the only argument, curry::apply calls printf right-away in the last no-argument specialization.

Here's the main driver program. Also on github.
int main(void)
{
  curried_printf("C++ Rocks%s %d %f\n")("!!")(10)(20.30);
  curried_printf("C++ Rocks!!\n");

  return 0;
}
If you mess up with the argument types, the error is short and relatively direct.

Avoiding Copying Arguments

The previous example makes a massive assumption that all arguments are fundamental types. That they are cheap to copy. The lambda inside the apply function captures the arguments by value and passes them on by value. The arguments are copied O(N*N) times approximately. That's gonna hurt for large types that are expensive to copy.

The Remedy is to std::move the arguments as much as possible. However, forwarding variadic arguments requires us to take some library help: std::tuple.

template <class Head, class... Tail>
struct curry<std::tuple<Head, Tail...>>
{
    template<class... Args>
    static auto apply(Args&&... args) 
    {   
      return [t=std::make_tuple(std::move(args)...)](Head h) {
          // Move each element of t and h to curry<std::tuple<Tail...>>::apply somehow.
      };  
    }   
};
It got complicated real fast. For each argument, we'll have to wrap them in a tuple and unwrap them before passing to curry::apply. Wrapping is easy. There's the code. Unwrapping is rather complicated because all arguments are not together in a tuple. Head comes separately. std::apply and std::invoke did not appear particularly useful in this case. We perhaps need a direct syntax to expand tuple into function arguments. Secondly, there's at least one copy of each Head argument anyway because the function should be type-safe and accept only Head type argument in the lambda. I thought this is more trouble than it's worth.

Currying Arbitrary Functions

To work around this problem I'm simply going to use a dynamically allocated tuple that will store the arguments as they come in. As curried function may be copied multiple times, this scheme should work out quite efficiently in such cases.
// In C++17, std::experimental::apply can replace the following execute function.

template <size_t... Indices, class Tuple, class Func>
auto execute(std::integer_sequence<size_t, Indices...>,
             Tuple&& tuple,
             Func&& func)
{
  return func(std::get<Indices>(std::forward<Tuple>(tuple))...);
}

template <int I, class AllArgs, class Tuple>
struct dyn_curry;

template <int I, class AllArgs, class Head, class... Tail>
struct dyn_curry<I, AllArgs, std::tuple<Head, Tail...>>
{
    enum { Index = std::tuple_size<AllArgs>::value - I };

    template <class Func>
    static auto apply(std::shared_ptr<AllArgs> shptr, Func&& func)
    {   
      return [shptr, func=std::move(func)](const Head &h) mutable {
        std::get<Index>(*shptr) = h;
        return dyn_curry<I-1, AllArgs, std::tuple<Tail...>>::apply(shptr, std::move(func));
      };  
    }    
};

template <class AllArgs, class Head>
struct dyn_curry<1, AllArgs, std::tuple<Head>>
{
    enum { Index = std::tuple_size<AllArgs>::value - 1 };
    using IntSeq = std::make_index_sequence<std::tuple_size<AllArgs>::value>;

    template <class Func>
    static auto apply(std::shared_ptr<AllArgs> shptr, Func&& func)
    {   
      return [shptr, func=std::move(func)](const Head &h) mutable {
        std::get<Index>(*shptr) = h;
        return execute(IntSeq(), sd::move(*shptr), std::move(func));
      };  
    }
};

template <class Ret, class... Args>
auto arb_curry(Ret (&func) (Args...))
{
  using AllArgs = std::tuple<std::decay_t<Args>...>;
  std::cout << boost::core::demangle(typeid(AllArgs).name()) << "\n";
  std::shared_ptr<AllArgs> shptr(new AllArgs);

  return dyn_curry<std::tuple_size<AllArgs>::value, AllArgs, AllArgs>::apply(shptr, func);
}

template <class Ret>
Ret arb_curry(Ret (&func) ()) { return func(); }

int print_add(std::string &msg, int &j, int k) { std::cout << msg; return j+k;   }

int identity(int i) { return i; }

int foo() { return printf("foo\n"); }

int main(void)
{
  arb_curry(foo);
  std::cout << arb_curry(identity)(99) << std::endl;
  auto a = arb_curry(print_add);
  auto b = a("Adding two integers: ");
  auto c = b(20);
  auto d = c(30);
  std::cout << d << std::endl; //prints 60.

  return 0;
}
There are three main differences in this more general implementation than the previous example.
  1. This implementation uses an explicit compile-time index to copy arguments in to the right slot in the tuple of arguments. 
  2. There's more type related noise here because each call to apply passes the shared_ptr of the tuple type to the inner lambda. 
  3. The final dispatch to the function is implemented in the execute function that expands all the arguments in the tuple as function arguments. In C++17, std::experimental::apply can replace the execute function.
Here's live code and also on github.

Conclusion

While currying C++ functions is fun, lifting C++ string literals to type-level opens up a whole new level of meta-programming in C++. constexpr functions can operate on string literals and compute integral results at compile-time. See this for an example. With constexpr function, however, we can't construct new types at compile-time depending upon the argument value. N3599 allows us to cross the string-to-type barrier at compile-time. That's pretty neat. I can already think of some intriguing applications of N3599 in serialization/deserialization of user-defined types.

Saturday, November 14, 2015

Covariance and Contravariance in C++ Standard Library

Covariance and Contravariance are concepts that come up often as you go deeper into generic programming. While designing a language that supports parametric polymorphism (e.g., templates in C++, generics in Java, C#), the language designer has a choice between Invariance, Covariance, and Contravariance when dealing with generic types. C++'s choice is "invariance". Let's look at an example.
struct Vehicle {};
struct Car : Vehicle {};

std::vector<Vehicle *> vehicles;
std::vector<Car *> cars;

vehicles = cars; // Does not compile
The above program does not compile because C++ templates are invariant. Of course, each time a C++ template is instantiated, the compiler creates a brand new type that uniquely represents that instantiation. Any other type to the same template creates another unique type that has nothing to do with the earlier one. Any two unrelated user-defined types in C++ can't be assigned to each-other by default. You have to provide a copy-constructor or an assignment operator.

However, the fun starts when you realize that it's just a choice and there are other valid choices. In fact, C++ makes a different choice for pointers and references. For example, it's common knowledge that pointer of type Car is assignable to pointer of type Vehicle. That's because Car is a subtype of Vehicle. More accurately, the Car struct inherits from the Vehicle struct and the compiler allows us to use Car pointers in places where Vehicle pointer is expected. I.e., subtyping is activated through inheritance. Later in the post we will use subtyping without using inheritance.

If you think about pointers as a shortcut for the Pointer template below, it becomes apparent that the language has some special rules for them. Don't let the special * syntax confuse you. It is just a shortcut to avoid the ceremony below.
template <class T>
using Pointer = T *;

Pointer<Vehicle> vehicle;
Pointer<Car> car;

vehicle = car; // Works!
So what choices are available? The question we want to ask ourselves is, "What relationship do I expect between instantiations of a template with two different types that happen to have a subtype relationship?"
  • The first choice is no relationship. I.e., the template instantiations completely ignore the relationship between parameter types. This is C++ default. It's called invariance. (a.k.a. C++ templates are invariant)
  • The second choice is covariant. I.e., the template instantiations have the same subtype relationship as the parameter types. This is seen in C++ pointers and also in std::shared_ptr, std::unique_ptr because they want to behave as much like pointers as possible. You have write special code to enable that because the language does not give it to you by default.
  • The third choice is contravariance. I.e., the template instantiations have the opposite subtype relationship to that of the parameter types. I.e., TEMPLATE<base> is subtype of TEMPLATE<derived>. We'll come back to contravariance in much more detail later in the post.
All C++ standard library containers are invariant (even if they contain pointers).

Covariance

As said earlier, with covariance, the templated type maintains the relationship between argument types. I.e., if argument types are unrelated, the templated types shall be unrelated. If derived is a sub-type of base (expressed as inheritance) then TEMPLATE<derived> shall be sub-type of TEMPLATE<base>. I.e., any place where TEMPLATE<base> is expected, TEMPLATE<derived> can be substituted and everything will work just fine. The other way around is not allowed.

There are some common examples of covariance in C++ standard library.
std::shared_ptr<Vehicle> shptr_vehicle;
std::shared_ptr<Car> shptr_car;
shptr_vehicle = shptr_car; // Works
shptr_car = shptr_vehicle' // Does not work.

std::unique_ptr<Vehicle> unique_vehicle;
std::unique_ptr<Car> unique_car;
unique_vehicle = std::move(unique_car); // Works
unique_car = std::move(unique_vehicle); // Does not work
One (formal) way to think about covariance is that "the type is allowed to get bigger upon assignment". I.e., Vehicle is broader/bigger type than Car. Here's a quick rundown of some of the commonly used C++ standard library types and their covariance/contravariance properties.

TypeCovariantContravariant
STL containersNoNo
std::initializer_list<T *>NoNo
std::future<T>NoNo
boost::optional<T>No (see note below)No
std::shared_ptr<T>YesNo
std::unique_ptr<T>YesNo
std::pair<T *, U *>YesNo
std::tuple<T *, U *>YesNo
std::atomic<T *>YesNo
std::function<R *(T *)>Yes (in return)Yes (in arguments)

The boost::optional<T> appears to be covariant but it really isn't because it slices the object underneath. The same thing happens with std::pair and std::tuple. Therefore, they behave covariantly correctly only when the parameter type itself behaves covariantly.

Finally, Combining one covariant type with another (e.g., std::shared_ptr<std::tuple<T *>>) does not necessarily preserve covariance because it is not built into the language. It is often implemented as a single-level direct convertibility. I.e., std::tuple<Car *> * is not directly convertible to std::tuple<Vehicle *> *. It would have been if the language itself enforced subtyping between std::tuple<Car*> and std::tuple<Vehicle *> but it does not. On the other hand, std::tuple<std::shared_ptr<T>> behaves covariantly.

By "single-level direct convertibility", I mean the following conversion of U* to T*. Convertibility is poor man's test for subtyping in C++.

A covariant SmartPointer might be implemented as follows.

template <class T>
class SmartPointer
{
public:
    template <typename U>
    SmartPointer(U* p) : p_(p) {}

    template <typename U>
    SmartPointer(const SmartPointer<U>& sp,
                 typename std::enable_if<std::is_convertible<U*, T*>::value, void>::type * = 0) 
      : p_(sp.p_) {}

    template <typename U>
    typename std::enable_if<std::is_convertible<U*, T*>::value, SmartPointer<T>&>::type 
    operator=(const SmartPointer<U> & sp)
    {
        p_ = sp.p_;
        return *this;
    }

   T* p_;
};

Contravariance

Contravariance, as it turns out, is quite counter-intuitive and messes up with your brain. But it is a very valid choice when it comes to selecting how generic types behave. Before we deal with contravariance, lets quickly revisit a very old C++ feature: covariant return types.

Consider the following class hierarchy.
class VehicleFactory {
  public:
    virtual Vehicle * create() const { return new Vehicle(); }
    virtual ~VehicleFactory() {}
};

class CarFactory : public VehicleFactory {
public:
    virtual Car * create() const override { return new Car(); }
};
Note that the return value of VehicleFactory::create function is Vehicle * where as CarFactory::create is Car *. This is allowed. The CarFactory::create function overrides its parent's virtual function. This feature is called overriding with covariant return types.

What happens when you change the raw pointers to std::shared_ptr? Is it still a valid program?....

As it turns out, it's not. std::shared_ptr (or any simulated covariant type for that matter) can't fool the compiler into believing that the two functions have covariant return types. The compiler rejects the code because as far as it knows, only the pointer types (and references too) have built-in covariance and nothing else.

Lets look a these two factories from the substitutability perspective. The client of VehicleFactory (which has no knowledge of CarFactory) can use VehicleFactory safely even if the create function gets dispatched to CarFactory at run-time. After all, the create function return something that can be treated like a vehicle. No concrete details about Car are necessary for the client to work correctly. That's just classic Object-oriented programming.

Covariance appears to work fine for return types of overridden functions. How about the argument? Is there some sort of variance possible? Does C++ support it? Does it make sense outside C++?

Let's change the create function to accept Iron * as raw material. Obviously, the CarFactory::create must also accept an argument of type Iron *. It is supposed to work and it does. That's old hat.

What if CarFactory is so advanced that it takes any Metal and creates a Car? Consider the following.
struct Vehicle {};
struct Car : Vehicle {};

struct Metal {};
struct Iron : Metal {};

class VehicleFactory {
  public:
    virtual Vehicle * create(Iron *) const { return new Vehicle(); }
    virtual ~VehicleFactory() {}
};

class CarFactory : public VehicleFactory {
public:
    virtual Car * create(Metal *) const override { return new Car(); }
};
The above program is illegal C++. The CarFactory::create does not override anything in its base class and therefore due to the override keyword compiler rejects the code. Without override, the program compiles but you are looking at two completely separate functions marked virtual but really they won't do what you expect.

More interesting question is whether it makes sense to override a function in a way that the argument in the derived function is broader/larger than that of the bases's?...

Welcome to Contravariance...

It totally does make sense and this language feature is called contravariant argument types. From the perspective of the client of VehicleFactory, the client needs to provide some Iron. The CarFactory not only accepts Iron but any Metal to make a Car. So the Client works just fine.

Note the reversed relationship in the argument types. The derived create function accepts the broader type because it must do at least as much as the base's function is able to do. This reverse relationship is the crux of contravariance.

C++ does not have built-in support for contravariant argument types. So that's how it ends for C++? Of course not!

Covariant Return Types and Contravariant Argument Types in std::function

OK, the heading gives it away so lets get right down to an example.
template <class T>
using Sink = std::function<void (T *)>;

Sink<Vehicle> vehicle_sink = [](Vehicle *){ std::cout << "Got some vehicle\n"; };
Sink<Car> car_sink = vehicle_sink; // Works!
car_sink(new Car());

vehicle_sink = car_sink; // Fails to compile
Sink is a function type that accepts any pointer of type T and return nothing. car_sink is a function that accepts only cars and vehicle_sink is a function that accepts any vehicle. Intuitively, it makes sense that if the client needs a car_sink, a vehicle_sink will work just fine because it is more general. Therefore, substitutability works in the reverse direction of parameter types. As a result, Sink is contravariant in its argument type.

std::function is covariant in return type too.
std::function<Car * (Metal *)> car_factory = 
  [](Metal *){ std::cout << "Got some Metal\n"; return new Car(); };

std::function<Vehicle * (Iron *)> vehicle_factory = car_factory;

Vehicle * some_vehicle = vehicle_factory(new Iron()); // Works
Covariance and Contravariance of std::function works with smart pointers too. I.e., std::function taking a shared_ptr of base type is convertible to std::function taking a shared_ptr of derived type.

std::cout << std::is_convertible<std::function<void (std::shared_ptr<Vehicle>)>, 
                                 std::function<void (std::shared_ptr<Car>)>>::value 
          << "\n"; // prints 1.


Sink of a Sink is a Source!

I hope the examples so far have helped build an intuition behind covariance and contravariance. So far it looks like types that appear in argument position should behave contravariantly and types that appear in return position, should behave covariantly. It's a good intuition only until it breaks!
template <class T>
using Source = std::function<void (Sink<T>)>;

Source<Car> source_car = [](Sink<Car> sink_car){ sink_car(new Car()); };

source_car([](Car *){ std::cout << "Got a Car!!\n"; });

Source<Vehicle> source_vehicle = source_car; // covariance!

Type T occurs at argument position in Source. So is Source contravariant in T?...

It's not! It's still covariant in T.

However, Source<T> is contravriant in Sink<T> though.... Afterall, Source is a Sink of a Sink<T>!

OK, still with me?

Let's get this *&%$# straight!

Source<Car> does not really take Car as an argument. It takes Sink<Car> as an argument. The only thing you can really do with it is sink/pass a car into it. Therefore, the lambda passes a new car pointer to sink_car. Again on the next line, calling source_car you have to pass a Sink<Car>. That of course is a lambda that accepts Car pointer as input and simply prints a happy message.

Source<Car> indeed works like a factory of Cars. It does not "return" it. It uses a callback to give you your new car. It's equivalent to returning a new Car. After all, the direction of dataflow is outward. From Callee to the Caller. As the data is flowing outwards, it's covariant.

More formally, type of Source is (T->())->(). A function that takes a callback as an input and returns nothing (i.e., read () as void). As T appears on the left hand side of even number of arrows, it's covariant with respect to the entire type. As simple as that!

Generalizing with Multiple Arguments and Currying

The covariance and contravariance of std::function works seamlessly with multiple argument functions as well as when they are curried.
struct Metal {};
struct Iron : Metal {};
struct Copper : Metal {};

// multiple contravariant position arguments
std::function<Vehicle * (Iron *, Copper *)> vehicle_ic; 
std::function<Car * (Metal *, Metal *)> car_mm = [](Metal *, Metal *) { return new Car(); };
vehicle_ic = car_mm;
vehicle_ic(new Iron(), new Copper());

// Curried versions
std::function<std::function<Vehicle * (Copper *)> (Iron *)> curried_vehicle;
std::function<std::function<Car * (Metal *)> (Metal *)> curried_car;
curried_car = [](Metal *m) { 
  return std::function<Car * (Metal *)>([m](Metal *) { return new Car(); }); 
};  
curried_vehicle = curried_car;
curried_vehicle(new Iron())(new Copper());

The car_mm function can be substituted where vehicle_ic is expected because it accepts wider types and returns narrower types (subtypes). The difference is that these are two argument functions. Each argument type must be at least the same as what's expected by the client or broader.

As every multi-argument function can be represented in curried form, we don't want to throw way our nice co-/contra-variant capabilities of the function-type while currying. Of course, it does not as can be seen from the next example.

The curried_vehicle function accepts a single argument and returns a std::function. curried_car is a subtype of curried_vehicle only if it accepts equal-or-broader type and returns equal-or-narrower type. Clearly, curried_car accepts Metal*, which is broader than Iron*. On the return side, it must return a function-type that is a subtype of the return type of curried_vehicle. Applying the rules of function subtyping again, we see that the returned function type is also a proper subtype. Hence currying is oblivious to co-/contra-variance of argument/return types.

So that's it for now on co-/contra-variance. CIAO until next time!

Live code tested on latest gcc, clang, and vs2015.

For comments see reddit/r/cpp and Hacker News.

Sunday, November 08, 2015

CppCon'15 and Silicon Valley Code Camp Presentations

In last couple of months I did a couple of presentations about my recent projects in C++. Session videos, slides, and code for all the presentations are now available online. Both projects have functional programming at their heart. I've found exploring functional programming in modern C++ quite a fun ride. Without further ado, here's the content

CppCon'15: Reactive Stream Processing in Industrial IoT using DDS and RxCpp


Topic: 50 billion devices will be connected to the Internet by 2020. Many of them will belong to national critical infrastructure (smart power grids, smart roads, smart hospitals, smart cities) – forming the Industrial Internet of Things (IIoT). These devices will generate data streams that will need to be correlated, merged, filtered, and analyzed in real-time at the edge. This talk will explore an elegant solution to this problem that is productive, composable, concurrency-friendly, and scales well. We utilize OMG’s Data Distribution Service for Real-Time Systems (DDS) standard for connectivity, and Reactive Extensions (Rx) for functional-style composable asynchronous data processing in modern C++.

Rx is a generalization of futures and can be thought of as the async equivalent of C++ ranges. It helps create asynchronous data processing pipelines by chaining reusable higher-order functions (map, filter, flatmap, zip etc.) that rely on a common abstraction called an Observable (a continuation monad). RxCpp makes wonderful use of functional programming features in modern C++ including generic lambdas, type inference, variadic templates, and more. Rx is one of the best libraries that truly highlights the power of functional design principles applied in a (primarily) object-oriented programming languages.

DDS and Rx work great together because they are both reactive, use the publish-subscribe paradigm, and facilitate loose coupling between components. This presentation will discuss Rx4DDS, which is a research library that integrates Rx with RTI Connext DDS. Rx4DDS enables a clean, distributed, asynchronous dataflow architecture for stream processing and is available in C#, C++, and JavaScript.

Slides



More reading

  • Data-Centric Stream Processing in the Fog is an RTI blog post with detailed description of one of the demonstrations and code I showed at CppCon'15. If you know what I mean by "The finalization actions are baked into each data pipeline at the time of creation" you can skip right ahead.

  • Rx4DDS home page includes all the demonstrations and code I showed at CppCon. The description is somewhat sparse and assumes that you have seen the earlier resources listed here.


Silicon Valley Code Camp: Composable Generators and Property-based Testing in C++14  


Topic: C++14 has an enviable collection of functional programming features such as generic lambdas, type inference, variadic templates, function types with co-/contra-variance and so on. With mature compiler support, designing and implementing performant functional-style libraries has become very pleasant in modern C++. Tools and techniques (e.g., property-based testing) enjoyed by the programmers in only elite functional languages (Haskell, Scala) now appear to be within C++'s reach.

This presentation will discuss two classic techniques from the functional domain -- composable data generators and property-based testing -- implemented in C++14 for testing a generic serialization and deserialization library (RefleX). We will look at techniques of constructing complex generators using a random number generator and a tolerable dose of monoids, functors, and of course, monads. We won't stop there though! We will look at automatic type generators using C++ TMP. Equipped with data and type generators, we'll take property-based testing to a whole new level where lazy programmers don't have to do anything to test their programs beyond just compilation and running the test over and over.

Code on github: generators

Slides 




Bonus Content: Channel9 Interview at CppCon'15

Here's my really short interview recorded at CppCon'15 by Channel9. Yes, it's about functional programming! Skip ahead to 45m36s into the video to checkout my segment. Alternatively, click here.


Sunday, June 28, 2015

Fun with Lambdas: C++14 Style (part 4)

This is part 4 in the series of Fun with Lambdas: C++14 Style. The previous posts are part 3, part 2, and part 1.

C++14 has a number of features that support functional-style design. By "functional-style" I mean heavy use of higher-order functions (functions that take other functions as arguments). Quite often arguments to the higher-order functions are lambdas (closures, to be precise). With automatic return type deduction for normal functions, writing higher-order function becomes very easy and seamless in C++14.

This time, I have chosen a "text-book" example to show you the power of C++14: Composable Data Generators

What is a Generator?

A Generator<T> produces values of type T randomly. There is already a random number generator defined in the C library: random(). It produces long ints.

We can use this basic generator to create higher-level generators, such as bool, character, floating point numbers, etc. Even random sequence and structure generators are possible.

But first, lets add some structure around the C library function so that we can compose generators.

#include <cstdlib>

struct RootRandomGen
{
  long int operator () () const 
  {
    return random();
  }
};

RootRandomGen is a very simple function-object that when called produces a random number between 0 and RAND_MAX.

Let's create a Generator template from which we can create other generators.
template <class T, class GenFunc>
class Gen 
{
    GenFunc genfunc;

  public:
    explicit Gen(GenFunc func) 
      : genfunc(std::move(func)) 
    { } 
    
    T generate() 
    {   
      return genfunc();
    }   
};

The Gen class template allows us to pass any function-object or closure and a make a "generator" out of it. Of course, the function must not take any arguments and must produce a value.

To simplify creation of Generators from just lambdas, we create a helper factory function. This is where the power of C++14 starts becoming apparent.
template <class GenFunc>
auto make_gen_from(GenFunc&& func)
{
  return Gen<decltype(func()), GenFunc>(std::forward<GenFunc>(func));
}

make_gen_from is a higher-order function that takes a closure as an argument and creates a Gen<T> object. GenFunc is the type of the closure. The type T is deduced using decltype(func()), which is C++14 syntax to say whatever the type of the return value of func is. Rest of it is perfect-forwarding of the func argument to the Gen<T> object.

To create many more generators, such as for bool, char, string, etc, a function like make_gen<T> might be quite useful. So, let's add one.
template <class T>
auto make_gen();

template <>  
auto make_gen<long int>()
{
  return make_gen_from(RootRandomGen()); 
  //return make_gen_from([]() { return random(); }); 
}

The long int generator simply uses the "Root" generator. Alternatively, RootRandomGen can be defined in-place using a lambda as shown above. I.e., RootRandomGen is superfluous.

Let's test what we've so far.

void init_random() 
{
  time_t t;
  time(&t);
  srandom(t);
}

int main(void)
{
  init_random();
  auto gen = make_gen<long int>();
  std::cout << gen.generate(); // expect a random value.
}

We can create many more generators by explicitly specializing make_gen for a number of types. But before we do that let's observe the core properties of Gen<T>.

The Generator<T> Functor

In functional programming literature, Gen<T> is a functor, which means you can "map over it". I.e., you can write a function named map that takes a generator and a function and returns another generator that applies the function to the values generated by the argument generator. It's much easier to look at code.
template <class Gen, class Func>
auto map (Gen gt, Func func)
{
  return make_gen_from([gt, func]() { 
                          return func(gt.generate()); 
                      });
}

First, the lambda captures gt and func by value. When called, it first generates a value from gt and passes it to the function and simply returns the value produced by the function. We've already seen that make_gen_from converts any lambda (with right signature) to a generator. So we now have a very general-purpose facility to create arbitrarily many generators simply by passing functions to map.

Let's look at an example.
int main(void)
{
  init_random();
  auto gen = make_gen<long int>();
  auto boolgen = map(gen, [](long int i) { return bool(i % 2); });
  std::cout << std::boolalpha << boolgen.generate(); // expect a random boolean.
}

The only problem, however, is that it does not work.

The problem is that Gen<T> is designed to support stateful generators that might mutate state between two successive calls to generate. That's why the generate function is not const. But the lambda in the map function is by default const. Therefore, gt is also const, which prevents us from calling gt.generate() as Gen<T>::generate() is a non-const function.

The solution is to make the lambda in map function mutable. With that, the program compiles but there are more things that can be improved about map.

First, gt and func arguments are passed by value and the lambda captures them by value. That may be potentially quite wasteful. We can improve efficiency by using perfect forwarding. Adding perfect forwarding, however, adds a lot of noise to the otherwise simple map function. This noise has become my pet peeve regarding functional-style programming in C++14.
template <class Gen, class Func>
auto map (Gen&& gt, Func&& func)
{
  return make_gen_from([gt=std::forward<Gen>(gt), 
                        func=std::forward<Func>(func)]() mutable { 
                          return func(gt.generate()); 
                      });
}

I think this map function is a well-behaved citizen of the C++14 world. It's using the generalized lambda capture syntax and perfect-forwarding in combination.

Using this map function is slightly awkward because it's a free function. To support more fluent style of API, I would like to "upgrade" the map function to the Gen<T> class. As I said before, every generator supports mapping. So here's the new Get<T> template.
template <class T, class GenFunc>
class Gen 
{
    GenFunc genfunc;

  public:
    explicit Gen(GenFunc func) 
      : genfunc(std::move(func)) 
    { } 
    
    T generate() 
    {   
      return genfunc();
    }  
 
    template <class Func>
    auto map (Func&& func)
    {
      return make_gen_from([gt=*this, 
                            func=std::forward<Func>(func)]() mutable { 
                              return func(gt.generate()); 
                          });
    }
};

Note that map makes a full copy of this in the lambda so that every generator becomes self-sufficient.

We can create a number of other generators using the built-in map function. For instance, an consider Gen<int> below.
template <>  
auto make_gen<int>()
{
  return make_gen<long int>().map([](long int i) { return static_cast<int>(i); });
}

A range generator that produces a random value in the specified range may be created as follows. Like in the iterator semantics, hi is one past the desirable range.
template <class Integer>
auto make_range_gen(Integer lo, Integer hi) 
{
  return make_gen<long int>().map( 
          [lo, hi](long int x) { return static_cast<Integer>(lo + x % (hi - lo)); });
}

Using the range generator, a generator for uppercase characters is quite simple.
auto uppercase_gen = make_range_gen('A', 'Z'+1);
std::cout << uppercase_gen.generate(); // expect a random uppercase character.

Combinators

Many more helper functions can be added to the Gen<T> class that produce new generators from argument generators. In functional literature they are called combinators.

Here's the zip2 combinator: Zip works just like a zipper. It takes 2 generators and produces another generator that combines the values generated by the argument generators. To combine the values, it needs a function that accepts two arguments and return a value. The user must provide the function.

template <class T, class GenFunc>
class Gen 
{
    // ....

    template <class UGen, class Zipper2>
    auto zip2(UGen&& ugen, Zipper2&& func)
    {
      return this->map(
                [ugen=std::forward<UGen>(ugen),
                 func=std::forward<Zipper2>(func)](auto&& t) mutable {
                    return func(std::forward<decltype(t)>(t), ugen.generate());
                });
    }
};

auto uppergen = make_range_gen<char>('A', 'Z'+1);
auto lowergen = make_range_gen<char>('a', 'z'+1);
auto pairgen  = 
       uppergen.zip2(lowergen, 
                     [](char up, char low) { return std::make_pair(up, low); });

The example above shows how a pair of random characters can be produced by zipping an uppercase generator with a lowercase generator. The zipper function simply constructs the pair from two characters. Alternatively, &std::make_pair<char, char> would have been sufficient.

The zip2 function looks significantly more verbose than a comparable implementation in most other languages that support lambdas. A lot of code is devoted to perfect-forwarding of arguments, which is quite necessary for highly composable libraries such as this one. We'll see later that C++ compilers are smart enough to inline the call-chain completely.

Another example of zip is string generator. A string generator zips a bool generator and int generator where the bool value indicates whether string is empty or not and int generator determines the length of the string. Of course, string generator also needs a char generator to populate the string. Here's one way of doing it.
template <>
auto make_gen<std::string>()
{
  auto char_gen = make_range_gen(32, 127); // printable characters.
  auto length_gen = make_range_gen(1, 256);

  return make_gen<bool>().zip2(
                      length_gen,
                      [char_gen](bool empty, int length) mutable {
                        std::string str;
                        if(!empty)
                        {
                          str.reserve(length);
                          for(int i = 0; i < length; ++i)
                            str.push_back(char_gen.generate());
                        }
                        return str;
                      });
}

There are many more combinators. The single generator would always produce the same value. The oneOf generator selects one of the elements from a given array non-deterministically. Finally, the amb combinator will use of the two input combinators to produce value. Here's a couple of them.
template <class T>
auto make_single_gen(T&& t)
{
    return make_gen_from([t=std::forward<T>(t)]() { return t; });
}

template <class T>
auto make_oneof_gen(std::initializer_list<T> list)
{
    return make_range_gen(0ul, list.size()).map([list](int idx) { return *(list.begin()+idx); }); 
}

Stateful Generators

The examples we've seen so far are stateless generators. I.e., between two successive calls to generate, no state is updated. Let's look at a stateful generator: fibonacciGen. This generator must maintain at least two integers (a and b) for its computation.
auto fiboGen()
{
  int a = 0;
  int b = 1;
  return make_gen_from([a, b]() mutable {
                          int c = a;
                          a = b;
                          b = c+b;
                          return c;
                       });
}

The Cost of Functional Design

It is quite interesting how complex generators can be created from simple generators. But is there a cost to this high level of abstraction? Is the code as fast as it can be?

Here are two different algorithmically identical implementations of bool generator. The reason I chose this algorithm because I wanted make use of zip2, which in turn uses map. I wanted to include multiple levels of indirection.
extern "C" bool random_bool1()
{
  return (random()-random()) > 0;
}

extern "C" bool random_bool2()
{
  auto boolgen = 
    make_gen<long int>()
           .zip2(make_gen<long int>(),
                 [](long int i, long int j) { return (i-j) > 0; });

  return boolgen.generate();
}

The screenshot below shows the compiler's assembly output for both the functions. The amazing fact is that it is exactly identical! The compiler is able to see through the layers and layers of indirections (invocations of lambdas) and is able to produce optimal code for the random_bool functions. That's quite a remarkable feat achieved by g++ 5.1 in this case. Perhaps it is the same with other major C++ compilers.

Generator size

The performance story does not end here though. Note that producing a random boolean does not need any state. I.e., it is just a function. However, RootRandomGen take one byte because it's a class. Every object in C++ must have a unique identity. To ensure that's the case, C++ compiler gives minimal possible size to each object. As we compose higher-level generators from smaller generators, we are clearly creating objects, which have non-zero sizes. But how much memory do they need exactly? What is the size of boolgen in random_bool2?

The size of boolgen is 3 bytes on my machine. The reason for the state is lambda captures. Both map and zip combinators use lambdas with one or more captures. As higher-level generators are built from lower level generators, the state adds up. The problem is that in most generators we've seen so far, there is no real reason to maintain state between two successive calls to the generate function. I.e, the next value is completely unrelated to the previous values. In fact, as we saw before, the compiler did not refer to any state in the implementation of random_bool2. Of course, for truly stateful generators such as the the fibonacci generator, maintaining state from the prior computation is necessary.

The build-up of unnecessary state is quite fast though. For instance, the size of the string generator is whopping 28 bytes! The compiler maintains 28 bytes of state and does not serve any obvious purpose to the user! A generator of printable strings implemented as a simple function would require no persistent state at all. As the size of the generators get larger and larger, pretty soon they won't fit in the cache line and will start to degrade performance, especially if truly stateful generators are mixed with only accidently stateful generators. I hope compiler writers will figure something out about this problem.

This concludes the part 4 in the series of Fun with Lambdas: C++14 Style. I hope you enjoyed it. See Live Example.

Sunday, September 28, 2014

Fun with C++14 Lambdas at Silicon Valley Code Camp

Believe it or not, but the 9th Silicon Valley Code Camp is less than 2 weeks away and I can't wait to be at the largest software technology conference setup by developers for developers---and here is the best part---at no cost to the attendees. So far, there are 234 registered sessions, 7 technical tracks, and over 3100 registrations. So mark your calendar--it's October 11th and 12th, Saturday and Sunday, as always.



C++ is hot again at SVCC and third year in a row there is a dedicated track for modern C++. There are 11 sessions covering a wide variety of topics related to modern C++ programming.

I wanna thank SVCC organizers who generously allowed me to present two sessions: The first one is titled: Fun with Lambdas: C++14 Style[video]. You may be following the Fun with Lambdas series on this blog and hopefully having some fun too! I'll present a sampling of the content discussed here with new insights. Check out part 1, part 2, and part 3 if you haven't already. Come see how functional programming techniques are going to change the face of C++ programming beyond recognition.

Fun with Lambdas: C++14 Style from Sumant Tambe on Vimeo.


The second sessions is about Reactive Programming with DDS and Rx[video]. It's about functional programming again but this time it's going to be C#. Reactive Extensions (Rx) is a fascinating new technique to compose asynchronous and event-based programs using observables and LINQ-style query operators. It fits extremely well with DDS--a data distribution technology for networked real-time systems. I'll demo commonly used Rx operators with real data coming off of a toy DDS example. More on that here.

Reactive Stream Processing Using DDS and Rx from Sumant Tambe on Vimeo.

All in all, I'm anticipating the SVCC'14 to be a pretty busy weekend once again with a lot of learning and sharing. If you are in the area and decide to attend, stop by and say hi!

Saturday, September 20, 2014

Short-circuiting overloaded && and || using expression templates

This blog post is just a quick note that C++ offers (at least) two distinct ways to represent lazy computation that is lexically in the same scope but may execute lazily at a later time. In doing so, the computation must capture the local context (i.e., variables) so that it can be used later when needed. Clearly, lambda expressions are a direct language supported mechanism for that. Closures that come out of a lambda expression often capture the context and of course some behavior to be run later. The second mechanism is about 20 years old (as of this writing): Expression Templates.

Lets take an example of short-circuiting overloaded && and || operators. Regular overloaded && and || do not short circuit in C++. The reason is that before calling the overloaded operator &&, both the left-hand-side and the right-hand-side arguments of the overloaded function are evaluated. A function call is a sequence-point and therefore all the computations and the side-effects are complete before making the function call. This is eager strategy.

Expression Templates is a library-only approach to defer computation at a later time while keeping the context of the original expression around. Sounds a lot like lambda expressions.

Consider the struct S below. I would like to implement short-circuiting && and || for this type.

struct S
{
  bool val;
  explicit S(bool b) : val(b) {}

  bool is_true () const 
  {
    return val;
  }
};

S operator && (const S & s1, const S & s2)
{
  return s1.is_true()? S{s1.val && s2.val} : s1;
}

int main(void)
{
  S s1{false}, s2{true}, s3{true};
  S s4 = s1 && s2 && s3; // false
}
There is hardly any optimization at all. The overloaded && operator is called twice no matter what. Although the result of the expression s1 && s2 && s3 is known just by looking at s1. An opportunity for optimization is wasted (if you ever wanted to optimize that way!).

So let's use expression templates. The trick is to convert the expression into a tree of recursively nested instantiations of the Expr template. The tree is evaluated separately after construction.

The following code implements short-circuited && and || operators for S as long as it provides logical_and and logical_or free functions and it is convertible to bool. The code is in C++14 but the idea is applicable in C++98 also.

#include <iostream>

struct S
{
  bool val;

  explicit S(int i) : val(i) {}  
  explicit S(bool b) : val(b) {}

  template <class Expr>
  S (const Expr & expr)
   : val(evaluate(expr).val)
  { }

  template <class Expr>
  S & operator = (const Expr & expr)
  {
    val = evaluate(expr).val;
    return *this;
  }

  bool is_true () const 
  {
    return val;
  }
};

S logical_and (const S & lhs, const S & rhs)
{
    std::cout << "&& ";
    return S{lhs.val && rhs.val};
}

S logical_or (const S & lhs, const S & rhs)
{
    std::cout << "|| ";
    return S{lhs.val || rhs.val};
}


const S & evaluate(const S &s) 
{
  return s;
}

template <class Expr>
S evaluate(const Expr & expr) 
{
  return expr.eval();
}

struct LazyAnd 
{
  template <class LExpr, class RExpr>
  auto operator ()(const LExpr & l, const RExpr & r) const
  {
    const auto & temp = evaluate(l);
    return temp.is_true()? logical_and(temp, evaluate(r)) : temp;
  }
};

struct LazyOr 
{
  template <class LExpr, class RExpr>
  auto operator ()(const LExpr & l, const RExpr & r) const
  {
    const auto & temp = evaluate(l);
    return temp.is_true()? temp : logical_or(temp, evaluate(r));
  }
};


template <class Op, class LExpr, class RExpr>
struct Expr
{
  Op op;
  const LExpr &lhs;
  const RExpr &rhs;

  Expr(const LExpr& l, const RExpr & r)
   : lhs(l),
     rhs(r)
  {}

  auto eval() const 
  {
    return op(lhs, rhs);
  }
};

template <class LExpr>
auto operator && (const LExpr & lhs, const S & rhs)
{
  return Expr<LazyAnd, LExpr, S> (lhs, rhs);
}

template <class LExpr, class Op, class L, class R>
auto operator && (const LExpr & lhs, const Expr<Op,L,R> & rhs)
{
  return Expr<LazyAnd, LExpr, Expr<Op,L,R>> (lhs, rhs);
}

template <class LExpr>
auto operator || (const LExpr & lhs, const S & rhs)
{
  return Expr<LazyOr, LExpr, S> (lhs, rhs);
}

template <class LExpr, class Op, class L, class R>
auto operator || (const LExpr & lhs, const Expr<Op,L,R> & rhs)
{
  return Expr<LazyOr, LExpr, Expr<Op,L,R>> (lhs, rhs);
}

std::ostream & operator << (std::ostream & o, const S & s)
{
  o << s.val;
  return o;
}

S and_result(S s1, S s2, S s3)
{
  return s1 && s2 && s3;
}

S or_result(S s1, S s2, S s3)
{
  return s1 || s2 || s3;
}

int main(void) 
{
  for(int i=0; i<= 1; ++i)
    for(int j=0; j<= 1; ++j)
      for(int k=0; k<= 1; ++k)
        std::cout << i << j << k << " " << and_result(S{i}, S{j}, S{k}) << std::endl;

  for(int i=0; i<= 1; ++i)
    for(int j=0; j<= 1; ++j)
      for(int k=0; k<= 1; ++k)
        std::cout << i << j << k << " " << or_result(S{i}, S{j}, S{k}) << std::endl;

  return 0;
}
Let's break it apart piece by piece.

Type S has new conversion and assignment operators that convert a generic Expr argument that is convertible to S. The expression is not evaluated until it is actually assigned to another S. We just call evaluate on the expression to begin execution of the computation wrapped inside Expr. logical_and and logical_or are free functions that perform the non-short-circuiting logical operations because we're going to hijack the overloaded && and || for short-circuiting.

The evaluate free functions take care of the trivial base case when Expr happens to just another S and all other cases when Expr is a compound expression.

struct LazyAnd and LazyOr are the short-circuiting && and ||. They always evaluate the left-hand-side but may not evaluate the right-hand-side if it is not required.

Expr template enables construction of so called expression templates. It is meant to be instantiated recursively. for example, an expression template for (s1 && s2) looks like Expr<LazyAnd, S, S> whereas for (s1 && s2 && s3) it is Expr<LazyAnd, Expr<LazyAnd, S , S>, S>. One last example: (s1 && (s2 && s3)) becomes Expr<LazyAnd, S, Expr<LazyAnd, S , S>>.

Of course, creating the nested Expr instantiations manually is berserk. So we use overloaded && and || operators that instead of computing the result eagerly, produce and expression that we can evaluate later. I've avoided writing overly generic && and || operator by using the second argument that is either S or and Expr. So the operator does not match with types outside those. Take a look at the examples above. It is fairly straightforward to see how an expressions turns into a tree. Note that construction of tree does not involve calling logical_and and logical_or functions

Finally, the assignment operator and copy-ctor of S take care of executing the expression. LazyAnd and LazyOr do the least possible work while ensuring that left-hand-side is always evaluated. Here is the output of the program. Checkout the live example here.
000 0
001 0
010 0
011 0
100 && 0
101 && 0
110 && && 0
111 && && 1
000 || || 0
001 || || 1
010 || 1
011 || 1
100 1
101 1
110 1
111 1
Bottom line: Expression templates and lambdas are both suitable for passing lazy computations to functions. They both can capture local context (variables) and don't extend the life-cycle of the captured argument. Their type is not meant to be observed (it is often unpronounceable). Expression templates, however, are very specific because they appear only in the context of overloaded operators and as a result they may be lot more expressive.

This blog post is motivate by this question on Stackoverflow. Also see comments on reddit/r/cpp.

Tuesday, August 26, 2014

Fun with Lambdas: C++14 Style (part 3)

Now that we have C++14, it has opened up doors for truly mind-bending uses of lambdas--more specifically--generic lambdas. This blog post is the third installment in the series of "Fun with Lambdas: C++14 Style". Check out part 1 and part 2 if you have not already.

This post is about "monadic tuples".

Monad--a simple but powerful abstraction, however, considered quite difficult to understand in the imperative circles. We will look into what's know as the "continuation monad". As it turns out, in C++14, you need just a couple of lines of code to create an instance of a continuation monad.

I'm fairly new to the world of monads. So, things did not begin with great clarity for me. It all started with an intriguing question on Stackoverflow. As it turns out the same "trick" is also used in Boost.Hana and discussed on boost mailing list here.

What you see below is more or less how I came to understand the idiom as an instance of a monad. Some background in functional programming may be helpful in reading this post. A good understanding of nested generic lambdas is a must. If you are wondering if you should read the part 1 first, then you probably should.

Ok, lets cut to the chase.
auto List = [](auto ...xs) { 
    return [=](auto access) { return access(xs...); }; 
}; 

auto head = [](auto xs) { 
    return xs([](auto first, auto ...rest) { return first; }); 
}; 

auto tail = [](auto xs) { 
    return xs([](auto first, auto ...rest) { return list(rest...); }); 
}; 

auto length = [](auto xs) { 
    return xs([](auto ...z) { return sizeof...(z); }); 
}; 

int len = length(list(1, '2', "3"));  // 3

list is a generic lambda that accepts a variable number of arguments and returns a closure (an instance of the inner lambda) that captures the arguments by value. The inner lambda accepts a parameter (called access) that must be callable with an arbitrary number of arguments. The inner lambda simply expands the parameter pack while calling the callable. That way it provides "access" to the captured parameter pack.

If you squint a little, you will probably realize that list is like a constructor of a tuple. As a matter of fact, if you were to implement the inner lambda using a good old class template, you will most likely resort to using a std::tuple member.

head, tail, and length are examples of operations that you may perform on a list. head returns the first element, tail returns the list excluding the first element and length returns the size of the parameter pack. For example, a three element list is passed to the length lambda. As every list itself is a closure, it is called with an "accessor" function. The accessor simply does a sizeof... and returns the result, which propagates all the way out.

It is probably immediately apparent that this idiom adds life to otherwise drab variadic parameter packs. Don't get me wrong, variadic parameter packs are cool and we won't have other cool things like std::tuple without them. However, the point is that the language allows very few operations on a parameter pack. In general, you can't "store" them. Pretty much, you can expand a parameter pack, ask for its size, and unwind it using the car/cdr recursive style. And that's about it. Until now, To store a parameter pack you have to put it in a std::tuple.

But now there is an alternative. You can capture it using a lambda and provide access to it as done in the list lambda. As it turns out, this seemingly innocuous and perhaps needlessly convoluted approach to "accessing" parameter packs is phenomenally powerful.

WHY? ... the list lambda and the closure inside are special. Together, they form an implementation of a Continuation Monad.

A great introduction for continuation monad for C++ programmers is here. In essence, the list lambda above takes a value (a variadic parameter-pack) and returns a simple "continuator" (the inner closure). This continuator, when given a callable (called access), passes the parameter pack into it and returns whatever that callable returns.

Borrowing from the FPComplete blogpost, a continuator is more or less like the following.
template<class R, class A>
struct Continuator {
   virtual ~Continuator() {}
   virtual R andThen(std::function<R(A)> access) = 0;
};
The Continuator above is abstract--does not provide an implementation. So, here is a simple one.
template<class R, class A>
struct SimpleContinuator : Continuator<R, A> 
{
   SimpleContinuator(A x) : _x(x) {}
   R andThen(std::function<R(A)> access) {
       return access(_x);
   }
   A _x;
};
The SimpleContinuator accepts one value of type A and passes it on to access when andThen is called. The closure returned by the list lambda is conceptually the same. It is more general. Instead of a single value, the inner closure captures a parameter-pack and passes it to the access function. Neat!

Hopefully that explains what it means to be a continuator. but what does it mean to be a monad? Here is a good introduction using pictures.

The inner closure returned by the list lambda is also a list monad, which is implemented as a continuation monad. Note that continuation monad is the mother of all monads. I.e., you can implement any monad with a continuation monad. Of course, list monad is not out of reach.

As a parameter-pack is quite naturally a "list" (often of heterogeneous types), it makes sense for it to work like a list/sequence monad, where operations can be chained one after another. The list lambda above is a very interesting way of converting C++ parameter-packs to a monadic structure.

The head and length lambdas above, however, are a bit disappointing because they break the monad and the nested lambda inside simply returns a non-monadic value (something you can't chain more operations to). There is arguably a better way to write a chain of "processing" operations as shown below.

Functor

Before we can say that the list lambda is a monad constructor, we have to show that it is a functor. I.e., fmap must be written for the inner closure. Note that "functor" is a category theoretic term. It has no direct correlation with a C++ functor (i.e., a function object)

The list lambda above serves as the creator of the functor from a parameter pack---essentially it serves as the "return" in Haskell. That created functor keeps the parameter-pack with itself (capture) and it allows access to it provided you give a callable that accepts a variable number of arguments. Note that the callable is called EXACTLY-ONCE.

Lets write fmap for such a functor.
    
auto fmap = [](auto func) {
    return [func] (auto alist) {
        return alist([func](auto... xs) { return List(func(xs)...); });
    };
};
The type of the func must be (a -> b). I.e., in C++ speak,
    template <class a, class b>
    b func(a);
The type of fmap is "fmap: (a -> b) -> list[a] -> list[b]" I.e., in C++ speak,
    

    template <class Func, class a, class b>
    list<b> fmap(Func, list<a>);
I.e., when fmap is given a function from a to b, it simply returns another function that maps list-of-a to a list-of-b.
Now you can do
    
    auto twice = [](auto i) { return 2*i; };
    auto print = [](auto i) { std::cout << i << " "; return i; };
    auto l1 = List(1, 2, 3, 4);
    auto l2 = fmap(twice)(l1);
    auto l3 = fmap(print)(l2); // prints 2 4 6 8 on clang (g++ in reverse)
Therefore, it is a functor.

Monad

Now, lets try to write a flatmap (a.k.a. bind, selectmany)

Type of flatmap is "flatmap: (a -> list[b]) -> list[a] -> list[b]"

I.e., given a function that maps a to a list-of-b and a list-of-a, flatmap return a list-of-b. Essentially, it takes each element from list-of-a, calls func on it, receives (potentially empty) list-of-b one-by-one, then concatenates all the list-of-b, and finally returns the concatenated list-of-b.

Here's an implementation of flatmap for List.
 
    auto concat = [](auto l1, auto l2) {
        auto access1 = [=](auto... p) {
          auto access2 = [=](auto... q) {
            return List(p..., q...);
          };
          return l2(access2);
        };
        return l1(access1);
    };

    template <class Func>
    auto flatten(Func)
    {
      return List(); 
    }
    
    template <class Func, class A, class... B>
    auto flatten(Func f, A a, B... b)
    {
      return concat(f(a), flatten(f, b...));
    }
    
    auto flatmap = [](auto func) {
       return [func](auto alist) {
           return alist([func](auto... xs) { return flatten(func, xs...);  });
    };
Now you can do a lot of powerful things with a list. For example,
 
    auto pair     = [](auto i) { return list(-i, i); };
    auto count    = [](auto... a) { return list(sizeof...(a)); };
    
    auto l1 = List(1, 2, 3);
    auto l2 = flatmap(pair)(l1);
    auto l3 = fmap(print)(l2); // prints -1, 1, -2, 2, -3, 3 on clang (g++ in reverse)
    auto l4 = l3(count);    
    auto l5 = fmap(print)(l4); // prints 6.
The count function is a monad-perserving operation because it returns a List of single element. If you really want to get the length (not wrapped in a List) you have to terminate the monadic chain and get the value as follows.
 
    auto len = [](auto ...z) { return sizeof...(z); }; 

    auto l1 = List(10, 20, 30);
    auto l2 = flatmap(pair)(l1);
    std::cout << l2(len); // prints 6
If done right, the collection pipeline pattern (e.g., filter, reduce) can now be applied to C++ parameter-packs. So lets try to do that.

You might have noticed that we're doing only one operation per line and giving names to each intermediate result (i.e., l1, l2, l3 etc). Naming the intermediate results is unnecessary but if we don't, readability of code goes out the window.

Lets try to rewrite the previous program where we print 1, 1, -2, 2, -3, 3.
    
    auto l3 = 
      fmap(print)(flatmap(pair)(List(1, 2, 3))); 
    // prints -1, 1, -2, 2, -3, 3 on clang (g++ in reverse)
The above code is pretty much incomprehensible and at this point you probably want to click away. But bear with me for just one moment. There's a pattern here and we can factor that out. I'm going to use C++ operator overloading so that the code looks significantly more readable.
 
template <class LIST, class Func>
auto operator > (LIST l, Func f)
{
  return fmap(f)(l);   
}

template <class LIST, class Func>
auto operator >= (LIST l, Func f)
{
  return flatmap(f)(l);   
}
Operator > accepts our special list as the left hand side argument and a function from a->b as the right hand side argument. It uses fmap internally. The Operator >= is similar but it takes a function that goes from a->List[b] and uses flatmap internally. Remember, both functions return the special list (monadic tuple).

And now's the show time!
 
  auto l3 = 
     List(1, 2, 3) >= pair > print;  
  // prints -1, 1, -2, 2, -3, 3 on clang (g++ in reverse)
Suddenly, you can read the program from left to right and all the fmap/flatmap boilerplate is hidden inside the overloaded operators. You are looking at a tiny Domain-Specific Language (DSL) for piping operations on collections. The chain can be arbitrarily extended to the right.

Before we celebrate though, lets verify the monad laws.

Monad Laws

Let's make sure the list monad satisfies all three monad laws.
    
  template <class M1, class M2>
  void assert_equal(M1 m1, M2 m2)
  {
    auto to_vector = [](auto... a) { return std::vector<int> { a... }; };
    assert(m1(to_vector) == m2(to_vector));   
  }
  
  auto triplet(int i)  { return List(-i, 0, i); }

  {
    auto M = List(11);
    std::cout << "Monad law (left identity)\n";
    assert_equal(flatmap(pair)(M), pair(11));
    assert_equal(M >= pair, pair(11));
    
    std::cout << "Monad law (right identity)\n";
    assert_equal(flatmap(List)(M), M);
    assert_equal(M >= List, M);
     
    std::cout << "Monad law (associativity)\n";
    assert_equal(flatmap(triplet)(flatmap(pair)(M)),
                 flatmap([=](auto x) { return flatmap(triplet)(pair(x)); })(M));
    assert_equal(M >= pair >= triplet, 
                 M >= [=](auto x) { return pair(x) >= triplet; });
  }
All assertions are satisfied.

Collection Pipeline

Although the above list lambda is provably a monad and shares characteristics of the proverbial list-monad, it is quite unpleasant to work with as a collection pipeline. Especially because the behavior of a common collection pipeline combinator filter (a.k.a where) does not meet common expectations.

The reason is just how C++ lambdas work. Each lambda expression produces a function object of a unique type. Therefore, list(1,2,3) produces a type that has nothing to do with list(1) and an empty list, which in this case would be list().

The straight-forward implementation of `where` fails compilation because in C++ a function can not return two different types.
    
   auto where_broken = [](auto func) {
      return flatmap([func](auto i) { 
          return func(i)? list(i) : list(); // broken :-(
      }); 
    };
In the above implementation, func returns a boolean. It's a predicate that says true or false for each element. The ?: operator does not compile because the types of list(i) and list() (empty list) are different.

So, a different trick can be used to allow continuation of the collection pipeline. Instead of actually filtering the elements, they are simply flagged as such---and that's what makes it unpleasant.
    
  auto where_unpleasant = [](auto func) {
    return [=](auto i) { 
        return std::make_pair(func(i), i);
    }; 
  };
The where_unpleasant gets the job done but unpleasantly... For example, this is how you can filter negative elements.
    
    auto positive = [](auto i) { return i >= 0; };
    auto pair_print = [](auto pair) { 
      if(pair.first) 
         std::cout << pair.second << " "; 
      return pair; 
    };
    List(10, 20) >= pair > where_unpleasant(positive) > pair_print; 
    // prints 10 and 20 in some order


Heterogeneous Tuples

So far the discussion was about homogeneous tuples. Now lets generalize it to true tuples. Note that fmap, flatmap, where take only one callback lambda. To provide multiple lambdas each working on one type, we can overload them. For example,
    template <class A, class... B>
    struct overload : overload<A>, overload<B...> {
      overload(A a, B... b) 
          : overload<A>(a), overload<B...>(b...) 
      {}  
      using overload<A>::operator ();
      using overload<B...>::operator ();
    };
     
    template <class A>
    struct overload<A> : A {
      overload(A a) 
          : A(a) {} 
      using A::operator();
    };
    
    template <class... F>
    auto make_overload(F... f) {
      return overload<F...>(f...);   
    }
    
    auto test = 
       make_overload([](int i) { std::cout << "int = " << i << std::endl; },
                     [](double d) { std::cout << "double = " << d << std::endl; });
    test(10); // int 
    test(9.99); // double    
Let's use the overloaded lambda technique to process a heterogeneous tuple continuator.
    
        auto int_or_string = 
            make_overload([](int i) { return 5*i; },
                          [](std::string s) { return s+s; });
        List(10, "ab") > int_or_string >  print; // prints 50 and abab (gcc in reverse)

Finally, Here is the complete live example. For more relevant reading, also see the lambda-over-lambda.

P.S. Why is the order of output not the same across compilers? The order of variadic pack expansion is defined in the standard which corresponds to the original order of the pack. The order of evaluating function argument expressions is, however, not standardized. For example, checkout the implementation of fmap. func(z) is called as many time as there are arguments. However, the order in which multiple calls to func are evaluated is not guaranteed. As the calls to func print the values out to the console, the output is unpredictable across compilers. See more discussion on reddit/r/cpp.