19 Dec 2012

Perfect Forwarding to Async Lambdas (Part 2)

In Part 1, the “value of rvalues” was discussed, as well as their use in the context of std::async. A problem was observed with how lvalues are handled, so in this part I will introduce perfect forwarding to deal with rvalues and lvalues generically and try to provide an optimal approach in that context.

Perfect Forwarding

When writing a library, or very generic functions, all bases should be covered, and you may want to consider the case when a temporary object is passed in. With C++11, you may now follow a “recipe” that accepts any argument, be it lvalue or rvalue, const or non-const:

template<typename T>
wrapper(T&& arg) {
    foo(std::forward<T>(arg));
}

This is useful if you want to forward arguments exactly as they were passed in to another function, which is accomplished using std::forward.

You may be asking, “Wait a minute… Isn’t that an rvalue reference? Why does this recipe work?”. Well, without opening a whole can of worms, here is the summary of an enlighting talk on universal references by Scott Meyers:

When automatic type deduction is involved (followed by &&), such as auto&& or T&& in a function template, it can be interpreted as a “universal reference” and binds to everything.

The truth behind why this happens is beyond the scope of this article, and Scott’s talk does a much better job of explaining it than I can. Knowing this new information, we can look at the different combinations of T and && in function templates, and what they imply:

// pros: accepts both lvalues and rvalues,
// cons: but makes copies
template <typename T>
void printContents(T val);

// pros: no extra copies
// cons: does not accept const lvalues, or rvalues
template <typename T>
void printContents(T& val);

// pros: no extra copies, also accepts rvalues
// cons: cannot mutate val
template <typename T>
void printContents(T const& val);

// pros: accepts everything!
template <typename T>
void printContents(T&& val);

As we will see below, the difficulty of writing function templates that forward the arguments, is that one has to keep track of all the different variations that T can be.

Lambdas and perfect forwarding

Before handling perfect forwarding in the context of std::async, lets consider lambdas. Lambdas allow definitions of anonymous closures, that can capture anything in lexical scope. Can we capture a forwarded argument? Unfortunately, the C++11 standard does not include capture by move or capture by forwarding, so we have to manually pass them as arguments:

// forward our move_checker to count moves/copies
template <typename T>
int forwardToLambda(T&& checker)
{
    auto lambda =
        // T&& here is the same type
        // as was deduced above
        [](T&& checker) mutable
        {
            return checker.payload[0];
        };

    // forward to the lambda
    return lambda(std::forward<T>(checker));
}

Taking it for a quick test drive with an lvalue:

move_checker checker;

assert( checker.copies() == 0 );
assert( checker.moves() == 0 );

forwardToLambda(checker);

// no copies or moves!
assert( checker.copies() == 0 );
assert( checker.moves() == 0 );

I’ll spare you all the other combinations, but rest assured all of them have the optimal number of copies/moves. In summary:

Perfect forwarding to lambdas can be accomplished by explicitly passing the forwarded arguments, as they cannot be captured optimally. The type for the lambda parameter must match that of the wrapping function template, i.e. T&&.

The Problem

Alright! Phew! We covered rvalues, how std::async snuck an extra copy in without us noticing, perfect fowarding, and lambdas. Now all concepts conspire together to create a problem: How do we forward arguments optimally, through a function template, then through an async call, and finally through a lambda, and write only one function definition to cover all cases? Also, why the hell would anyone want to do that?

During the development of my library Plumbing++ I needed to apply an arbitrary function to an iterable on a separate thread. It amounted to the above problem, so here’s a rough skeleton for the implementation:

template <typename InputIterable, typename Func>
std::future<void> connect(InputIterable&& input, Func func)
{
    // launch async, and apply func to every element in input
    return std::async(std::launch::async,
            // ??? What do we pass to async?
            // Want to forward input, and capture func
            // to use in a for loop:
            //
            // for (auto&& e : input) {
            //     func(e);
            // }
    );
}

The first part of our requirements is already satisfied: Use T&& to forward arguments through a function template. Let’s take an initial stab at the whole problem, just by using lambda capture:

// SUB-OPTIMAL. lambda makes a copy
template <typename InputIterable, typename Func>
std::future<void> connect(InputIterable&& input, Func func)
{
    return std::async(std::launch::async,
            // input gets copied into lambda
            [func, input]() mutable
            {
                for (auto&& e : input) {
                    func(e);
                }
            }
    );
}

The above works, but introduces an unnecessary copy. We cannot capture by reference, since that would not work with rvalues. Let’s use what we learnt from the lambda section, and try to forward to the lambda. Of course, that would require forwarding through the async call:

// DOES NOT COMPILE for lvalues. see reasoning below.
template <typename InputIterable, typename Func>
std::future<void> connect(InputIterable&& input, Func func)
{
    return std::async(std::launch::async,
            // trying to forward to lambda through async
            [func](InputIterable&& input) mutable
            {
                for (auto&& e : input) {
                    func(e);
                }
            },
            std::forward<InputIterable>(input)
    );
}

The extra copy made by std::async rears its head. This is the problem that got me started on this journey. Here is a breakdown of what happens to an lvalue that gets passed in:

  • The type of input gets deduced correctly as an lvalue reference InputIterable&, and binds to an lvalue
  • We forward input to std::async, passing in an lvalue reference.
  • Then, std::async make a copy internally, creating a temporary rvalue.
  • Finally, std::async forwards this temporary rvalue to the lambda, and the compilation fails because it cannot be bound to an lvalue reference as we expect.

So what can we do? If you still remember part one, we have to wrap the value using std::ref to bypass the extra copy by std::async. However, std::ref cannot be constructed from an rvalue, and thus cannot be used to do perfect forwarding.

We need a unified solution, so we write our function template once.

Solution

The solution I came up with is a thin wrapper, that I can use to forward arguments through a std::async call, that is a wrapper for the underlying type, only in some cases it wraps a reference, in other it holds a full blown value. Let’s call this structure async_forwarder

/**
 * If T   move it inside forwarder.
 * If T&, just wrap it like std::ref
 */
template <typename T> struct async_forwarder;

To give you an idea of its use, let’s implement the connect function that we’ve been struggling with above.

template <typename InputIterable, typename Func>
std::future<void> connect(InputIterable&& input, Func func)
{
    return std::async(std::launch::async,
            // the forwarder will automatically convert
            // to the apropriate type.
            [func](InputIterable&& input) mutable
            {
                for (auto&& e : input) {
                    func(e);
                }
            },
            async_forwarder<InputIterable>(std::forward<InputIterable>(input))
    );
}

Voila! So knowing that we always want to implicitly convert back to what it was constructed from, let’s specialize async_forwarder for lvalue references:

// This particular specialization
// is essentially std::ref
template <typename T>
class async_forwarder<T&>
{
    T& val_;

public:
    /**
     * Wrap the reference when passed an lvalue reference,
     * to fool std::async
     */
    async_forwarder(T& t) : val_(t) { }

    // ensure no copies are made
    async_forwarder(async_forwarder const& other) = delete;

    // move constructor
    async_forwarder(async_forwarder&& other)
        : val_(other.val_) { }

    // User-defined conversion that automatically
    // converts to the appropriate type
    operator T&       ()       { return val_; }
    operator T const& () const { return val_; }
};

This gets rid of the extra copy incurred by std::async by wrapping the reference just like std::ref! To wrap things up, let’s specialize for rvalues, to complete the solution:

template <typename T>
class async_forwarder
{
    // Store value directly
    T val_;

public:
    /**
     * Move an rvalue of T into the wrapper,
     * incurring no copies.
     */
    async_forwarder(T&& t) : val_(std::move(t)) { }

    // ensure no copies are made
    async_forwarder(async_forwarder const& other) = delete;

    // move constructor
    async_forwarder(async_forwarder&& other)
        : val_(std::move(other.val_)) { }

    // Move the value out.
    // Note: can only occur once!
    operator T&& ()       { return std::move(val_); }
    operator T&& () const { return std::move(val_); }
};

In effect, async_forwarder allows us to forward optimally through a std::async call while retaining a single interface:

  • Lvalues are stored as wrapped references
  • Rvalues are moved inside
  • The value is automatically converted back

Further Work

There might be some savings possible if async_forwarder is declared constexpr, but I am not well versed in that feature of C++11, so perhaps others can suggest its effectiveness.

Conclusion

When trying to optimize performance for a particular problem, the standard solution is not always the best, requiring one to “open the hood” and muck about with the internals. In the case where an lvalue is guaranteed or expected to outlive a thread’s lifetime, copying the object into the thread is unnecessary, and can be wrapped using std::ref. A unified interface in the form of an async_forwarder is provided to handle perfect forwarding to these async functions.

  • c++
  • c++11
  • advanced