In Part 1, the “value of rvalues” was discussed, as well as their use in the context of std::async
. A problem was observed with how lvalues are handled, so in this part I will introduce perfect forwarding to deal with rvalues and lvalues generically and try to provide an optimal approach in that context.
Perfect Forwarding
When writing a library, or very generic functions, all bases should be covered, and you may want to consider the case when a temporary object is passed in. With C++11, you may now follow a “recipe” that accepts any argument, be it lvalue or rvalue, const or non-const:
template<typename T>
wrapper(T&& arg) {
foo(std::forward<T>(arg));
}
This is useful if you want to forward arguments exactly as they were passed in
to another function, which is accomplished using std::forward
.
You may be asking, “Wait a minute… Isn’t that an rvalue reference? Why does this recipe work?”. Well, without opening a whole can of worms, here is the summary of an enlighting talk on universal references by Scott Meyers:
When automatic type deduction is involved (followed by
&&
), such asauto&&
orT&&
in a function template, it can be interpreted as a “universal reference” and binds to everything.
The truth behind why this happens is beyond the scope of this article, and
Scott’s talk does a much better job of explaining it than I can. Knowing this
new information, we can look at the different combinations of T
and
&&
in function templates, and what they imply:
// pros: accepts both lvalues and rvalues,
// cons: but makes copies
template <typename T>
void printContents(T val);
// pros: no extra copies
// cons: does not accept const lvalues, or rvalues
template <typename T>
void printContents(T& val);
// pros: no extra copies, also accepts rvalues
// cons: cannot mutate val
template <typename T>
void printContents(T const& val);
// pros: accepts everything!
template <typename T>
void printContents(T&& val);
As we will see below, the difficulty of writing function templates that forward
the arguments, is that one has to keep track of all the different variations
that T
can be.
Lambdas and perfect forwarding
Before handling perfect forwarding in the context of std::async
,
lets consider lambdas. Lambdas allow definitions of anonymous closures, that can
capture anything in lexical scope. Can we capture a forwarded argument?
Unfortunately, the C++11 standard does not include capture by move or capture
by forwarding, so we have to manually pass them as arguments:
// forward our move_checker to count moves/copies
template <typename T>
int forwardToLambda(T&& checker)
{
auto lambda =
// T&& here is the same type
// as was deduced above
[](T&& checker) mutable
{
return checker.payload[0];
};
// forward to the lambda
return lambda(std::forward<T>(checker));
}
Taking it for a quick test drive with an lvalue:
move_checker checker;
assert( checker.copies() == 0 );
assert( checker.moves() == 0 );
forwardToLambda(checker);
// no copies or moves!
assert( checker.copies() == 0 );
assert( checker.moves() == 0 );
I’ll spare you all the other combinations, but rest assured all of them have the optimal number of copies/moves. In summary:
Perfect forwarding to lambdas can be accomplished by explicitly passing the forwarded arguments, as they cannot be captured optimally. The type for the lambda parameter must match that of the wrapping function template, i.e.
T&&
.
The Problem
Alright! Phew! We covered rvalues, how std::async
snuck an extra
copy in without us noticing, perfect fowarding, and lambdas. Now all concepts
conspire together to create a problem: How do we forward arguments optimally,
through a function template, then through an async
call, and
finally through a lambda, and write only one function definition to cover all
cases? Also, why the hell would anyone want to do that?
During the development of my library Plumbing++ I needed to apply an arbitrary function to an iterable on a separate thread. It amounted to the above problem, so here’s a rough skeleton for the implementation:
template <typename InputIterable, typename Func>
std::future<void> connect(InputIterable&& input, Func func)
{
// launch async, and apply func to every element in input
return std::async(std::launch::async,
// ??? What do we pass to async?
// Want to forward input, and capture func
// to use in a for loop:
//
// for (auto&& e : input) {
// func(e);
// }
);
}
The first part of our requirements is already satisfied: Use
T&&
to forward arguments through a function template. Let’s
take an initial stab at the whole problem, just by using lambda capture:
// SUB-OPTIMAL. lambda makes a copy
template <typename InputIterable, typename Func>
std::future<void> connect(InputIterable&& input, Func func)
{
return std::async(std::launch::async,
// input gets copied into lambda
[func, input]() mutable
{
for (auto&& e : input) {
func(e);
}
}
);
}
The above works, but introduces an unnecessary copy. We cannot capture by reference, since that would not work with rvalues. Let’s use what we learnt from the lambda section, and try to forward to the lambda. Of course, that would require forwarding through the async call:
// DOES NOT COMPILE for lvalues. see reasoning below.
template <typename InputIterable, typename Func>
std::future<void> connect(InputIterable&& input, Func func)
{
return std::async(std::launch::async,
// trying to forward to lambda through async
[func](InputIterable&& input) mutable
{
for (auto&& e : input) {
func(e);
}
},
std::forward<InputIterable>(input)
);
}
The extra copy made by std::async
rears its head. This is the
problem that got me started on this journey. Here is a breakdown of what
happens to an lvalue that gets passed in:
- The type of
input
gets deduced correctly as an lvalue referenceInputIterable&
, and binds to an lvalue - We forward input to
std::async
, passing in an lvalue reference. - Then,
std::async
make a copy internally, creating a temporary rvalue. - Finally,
std::async
forwards this temporary rvalue to the lambda, and the compilation fails because it cannot be bound to an lvalue reference as we expect.
So what can we do? If you still remember part one, we have to wrap the value
using std::ref
to bypass the extra copy by std::async
.
However, std::ref
cannot be constructed from an rvalue, and thus cannot be used
to do perfect forwarding.
We need a unified solution, so we write our function template once.
Solution
The solution I came up with is a thin wrapper, that I can use to forward
arguments through a std::async
call, that is a wrapper for the
underlying type, only in some cases it wraps a reference, in other it holds a
full blown value. Let’s call this structure async_forwarder
/**
* If T move it inside forwarder.
* If T&, just wrap it like std::ref
*/
template <typename T> struct async_forwarder;
To give you an idea of its use, let’s implement the connect
function that we’ve been struggling with above.
template <typename InputIterable, typename Func>
std::future<void> connect(InputIterable&& input, Func func)
{
return std::async(std::launch::async,
// the forwarder will automatically convert
// to the apropriate type.
[func](InputIterable&& input) mutable
{
for (auto&& e : input) {
func(e);
}
},
async_forwarder<InputIterable>(std::forward<InputIterable>(input))
);
}
Voila! So knowing that we always want to implicitly convert back to what it was
constructed from, let’s specialize async_forwarder
for lvalue
references:
// This particular specialization
// is essentially std::ref
template <typename T>
class async_forwarder<T&>
{
T& val_;
public:
/**
* Wrap the reference when passed an lvalue reference,
* to fool std::async
*/
async_forwarder(T& t) : val_(t) { }
// ensure no copies are made
async_forwarder(async_forwarder const& other) = delete;
// move constructor
async_forwarder(async_forwarder&& other)
: val_(other.val_) { }
// User-defined conversion that automatically
// converts to the appropriate type
operator T& () { return val_; }
operator T const& () const { return val_; }
};
This gets rid of the extra copy incurred by std::async
by wrapping
the reference just like std::ref
! To wrap things up, let’s
specialize for rvalues, to complete the solution:
template <typename T>
class async_forwarder
{
// Store value directly
T val_;
public:
/**
* Move an rvalue of T into the wrapper,
* incurring no copies.
*/
async_forwarder(T&& t) : val_(std::move(t)) { }
// ensure no copies are made
async_forwarder(async_forwarder const& other) = delete;
// move constructor
async_forwarder(async_forwarder&& other)
: val_(std::move(other.val_)) { }
// Move the value out.
// Note: can only occur once!
operator T&& () { return std::move(val_); }
operator T&& () const { return std::move(val_); }
};
In effect, async_forwarder allows us to forward optimally through a
std::async
call while retaining a single interface:
- Lvalues are stored as wrapped references
- Rvalues are moved inside
- The value is automatically converted back
Further Work
There might be some savings possible if async_forwarder
is
declared constexpr
, but I am not well versed in that feature of
C++11, so perhaps others can suggest its effectiveness.
Conclusion
When trying to optimize performance for a particular problem, the standard
solution is not always the best, requiring one to “open the hood” and muck about
with the internals. In the case where an lvalue is guaranteed or expected to
outlive a thread’s lifetime, copying the object into the thread is unnecessary,
and can be wrapped using std::ref
. A unified interface in the form
of an async_forwarder
is provided to handle perfect forwarding to
these async
functions.