In Part 1, the “value of rvalues” was discussed, as well as their use in the context of std::async
. A problem was observed with how lvalues are handled, so in this part I will introduce perfect forwarding to deal with rvalues and lvalues generically and try to provide an optimal approach in that context.
Perfect Forwarding
When writing a library, or very generic functions, all bases should be covered, and you may want to consider the case when a temporary object is passed in. With C++11, you may now follow a “recipe” that accepts any argument, be it lvalue or rvalue, const or non-const:
This is useful if you want to forward arguments exactly as they were passed in
to another function, which is accomplished using std::forward
.
You may be asking, “Wait a minute… Isn’t that an rvalue reference? Why does this recipe work?”. Well, without opening a whole can of worms, here is the summary of an enlighting talk on universal references by Scott Meyers:
When automatic type deduction is involved (followed by
&&
), such asauto&&
orT&&
in a function template, it can be interpreted as a “universal reference” and binds to everything.
The truth behind why this happens is beyond the scope of this article, and
Scott’s talk does a much better job of explaining it than I can. Knowing this
new information, we can look at the different combinations of T
and
&&
in function templates, and what they imply:
As we will see below, the difficulty of writing function templates that forward
the arguments, is that one has to keep track of all the different variations
that T
can be.
Lambdas and perfect forwarding
Before handling perfect forwarding in the context of std::async
,
lets consider lambdas. Lambdas allow definitions of anonymous closures, that can
capture anything in lexical scope. Can we capture a forwarded argument?
Unfortunately, the C++11 standard does not include capture by move or capture
by forwarding, so we have to manually pass them as arguments:
Taking it for a quick test drive with an lvalue:
I’ll spare you all the other combinations, but rest assured all of them have the optimal number of copies/moves. In summary:
Perfect forwarding to lambdas can be accomplished by explicitly passing the forwarded arguments, as they cannot be captured optimally. The type for the lambda parameter must match that of the wrapping function template, i.e.
T&&
.
The Problem
Alright! Phew! We covered rvalues, how std::async
snuck an extra
copy in without us noticing, perfect fowarding, and lambdas. Now all concepts
conspire together to create a problem: How do we forward arguments optimally,
through a function template, then through an async
call, and
finally through a lambda, and write only one function definition to cover all
cases? Also, why the hell would anyone want to do that?
During the development of my library Plumbing++ I needed to apply an arbitrary function to an iterable on a separate thread. It amounted to the above problem, so here’s a rough skeleton for the implementation:
The first part of our requirements is already satisfied: Use
T&&
to forward arguments through a function template. Let’s
take an initial stab at the whole problem, just by using lambda capture:
The above works, but introduces an unnecessary copy. We cannot capture by reference, since that would not work with rvalues. Let’s use what we learnt from the lambda section, and try to forward to the lambda. Of course, that would require forwarding through the async call:
The extra copy made by std::async
rears its head. This is the
problem that got me started on this journey. Here is a breakdown of what
happens to an lvalue that gets passed in:
- The type of
input
gets deduced correctly as an lvalue referenceInputIterable&
, and binds to an lvalue - We forward input to
std::async
, passing in an lvalue reference. - Then,
std::async
make a copy internally, creating a temporary rvalue. - Finally,
std::async
forwards this temporary rvalue to the lambda, and the compilation fails because it cannot be bound to an lvalue reference as we expect.
So what can we do? If you still remember part one, we have to wrap the value
using std::ref
to bypass the extra copy by std::async
.
However, std::ref
cannot be constructed from an rvalue, and thus cannot be used
to do perfect forwarding.
We need a unified solution, so we write our function template once.
Solution
The solution I came up with is a thin wrapper, that I can use to forward
arguments through a std::async
call, that is a wrapper for the
underlying type, only in some cases it wraps a reference, in other it holds a
full blown value. Let’s call this structure async_forwarder
To give you an idea of its use, let’s implement the connect
function that we’ve been struggling with above.
Voila! So knowing that we always want to implicitly convert back to what it was
constructed from, let’s specialize async_forwarder
for lvalue
references:
This gets rid of the extra copy incurred by std::async
by wrapping
the reference just like std::ref
! To wrap things up, let’s
specialize for rvalues, to complete the solution:
In effect, async_forwarder allows us to forward optimally through a
std::async
call while retaining a single interface:
- Lvalues are stored as wrapped references
- Rvalues are moved inside
- The value is automatically converted back
Further Work
There might be some savings possible if async_forwarder
is
declared constexpr
, but I am not well versed in that feature of
C++11, so perhaps others can suggest its effectiveness.
Conclusion
When trying to optimize performance for a particular problem, the standard
solution is not always the best, requiring one to “open the hood” and muck about
with the internals. In the case where an lvalue is guaranteed or expected to
outlive a thread’s lifetime, copying the object into the thread is unnecessary,
and can be wrapped using std::ref
. A unified interface in the form
of an async_forwarder
is provided to handle perfect forwarding to
these async
functions.