Perfect forwarding allows template functions to forward the arguments “as is” to any other function they call. This helps minimize the number of unnecessary copies and conversions when delegating information to other functions. In a quest to get rid of copying completely in a library I was writing, I came across the problem of perfect forwarding to functions launched on a separate thread.
Overview
Before tackling the problem of perfect forwarding in the next part, I will
quickly overview rvalues, present a way to measure copies/moves, and finally
observe a “problem” with std::async
. In the second part of the
article we will cover perfect forwarding, and overview the problem in that
context. Finally a generic solution will be presented.
This article is geared towards library writers and those who write generic template code in C++11. Most of the code from both parts of the article can be found in this gist.
Lvalues and Rvalues
Let’s start with a quick recap of lvalues and rvalues. ( You can skip this section if you have a firm grasp of the concept )
With C++11 we have to be able to distinguish lvalues and rvalues. The names hint at the fact that lvalues are values that tend to appear on the left hand side of an assignment expression, whereas rvalues are those that are on the right. In other words, lvalues are values that you can freely refer to in their scope because they are bound to an identifier- such as variables, and functions.
std::vector<int> createVector(std::string param);
// createVector is an lvalue. It can be referred to
std::vector<int> myVec = createVector("hello world");
// myVec is also an lvalue, it is on the left side of the expression
// and can be referred to
Rvalues, cannot be directly referred to because they are temporaries and are not bound to an identifier- such as values returned by a function, or the direct result of an inline construction:
std::vector<int> createVector(std::string param);
std::vector<int> myVec = createVector("hello world");
// ^ ^^^^^^^^^^^^^
// | temporary std::string
// |
// temporary std::vector<int>
//
// There are two rvalues in the expression above.
With C++11, one can now create functions that accept lvalue references
(T&
), as well as rvalue references
(T&&
). Since rvalues are temporaries, it allows us to
transfer ownership of that temporary, instead of performing a needless copy.
The new notation allows a function to distinguish between lvaluesT&
that already exist in an outer scope, vs rvaluesT&&
that are yet to be bound to a scope.
To facilitate this transfer of ownership, the function std::move
is available. Here is a quick example of its usage in the context of
object construction:
class vector_holder
{
std::vector<int> vec_;
public:
// copy a vector into this object.
vector_holder(std::vector<int> const& vec)
: vec_(vec)
{
std::cout << "copied!" << std::endl;
}
// move a vector into this object.
// essentially a transfer of ownership
vector_holder(std::vector<int>&& vec)
: vec_(std::move(vec))
{
std::cout << "moved!" << std::endl;
}
// move constructor.
// instead of copying, we transfer
// the internals into this object
vector_holder(vector_holder&& other)
: vec_(std::move(other.vec_))
{ }
};
Given this quick overview, it should be apparent that moves help save unnecessary copies- essential if you want to write an efficient library. This is the “value of rvalues”. For a more focused overview, you could also look at a larger article by Alex Allain about move semantics.
Profiling copies/moves
In order to ensure optimal performance I wrote tests to count how many copies or moves occurred during the invocations of various API calls. To carry out these tests, I created a simple class that kept a shared count of the amount of moves and copies that were performed on it:
class move_checker
{
// shared counters of copies and moves
std::shared_ptr<int> copies_;
std::shared_ptr<int> moves_;
public:
// expensive payload
std::vector<int> payload;
typedef std::vector<int>::const_iterator const_iterator;
// construct a new checker, with counters reset to 0,
// and a non-empty payload.
move_checker()
: copies_(new int(0)),
moves_(new int(0)),
payload({1, 2, 3, 4, 5, 6, 7})
{ }
// copy constructor. counts copy operations
move_checker(move_checker const& other)
: copies_(other.copies_),
moves_(other.moves_),
payload(other.payload)
{
*copies_ += 1;
}
// copy assignment is similar to copy constructor
// move constructor. counts move operations
move_checker(move_checker&& other)
: copies_(other.copies_),
moves_(other.moves_),
payload(std::move(other.payload))
{
*moves_ += 1;
}
// move assignment is similar to move constructor
const_iterator begin() const { return payload.begin(); }
const_iterator end() const { return payload.end(); }
// methods to report on the number of copies/moves
int copies() const { return *copies_; }
int moves() const { return *moves_; }
};
In case you just skipped the above block of code, the usage is simple:
move_checker checker;
assert( checker.copies() == 0 );
assert( checker.moves() == 0 );
move_checker copy(checker);
assert( copy.copies() == 1 ); // increased
assert( copy.moves() == 0 );
move_checker moved(std::move(checker));
assert( moved.copies() == 1 );
assert( moved.moves() == 1 ); // increased
Armed with the move_checker
I was able to profile my code, and make
sure there were no extraneous copies. During the rest of the article, I will
provide asserts with the actual number of copy/move counts a particular piece of
code produces.
Std::thread and async
The next piece of the puzzle is launching functions on other threads. Thankfully
C++11 comes with its own standard implementation of threads, allowing for easy
execution of functions, and passing of arguments to other threads. Here I run a
function that prints the contents of an iterable on another thread using std::async
:
// prints the contents of an iterable
template <typename Iterable>
void printContents(Iterable const& iterable)
{
for (auto e : iterable)
{
std::cout << e << std::endl;
}
}
move_checker checker;
assert( checker.copies() == 0 );
assert( checker.moves() == 0 );
// print the contents on another thread
std::future<void> task =
std::async(
std::launch::async,
printContents<move_checker>,
std::move(checker) // rvalue
);
// wait for the task to complete
task.wait();
// two moves are performed
assert( checker.copies() == 0 );
assert( checker.moves() == 2 );
Notice, to run the function on another thread, the arguments have to be
available on the other thread. I can move the checker
into another
thread if I don’t need it. As expected, no copies are performed. The two moves
are accounted for:
- One move into the
std::async
function itself - Another move into the newly created thread.
Since our printContents
function takes an object by
const&
it is normal to expect that no copies or moves are
performed- we just access the object through a reference from another thread.
Let’s try it:
move_checker checker;
assert( checker.copies() == 0 );
assert( checker.moves() == 0 );
// (hopefully) pass by reference
std::future<void> task =
std::async(
std::launch::async,
printContents<move_checker>,
checker // lvalue
);
// wait for the task to complete
task.wait();
// a copy occured!
assert( checker.copies() == 1 );
assert( checker.moves() == 1 );
Woops! Where did that copy come from? This manifested as a cryptic compilation error when I was doing perfect forwarding (we’ll get to that shortly). As a result I posted a question on stack overflow, and the answer is relevant here.
… async will always make a copy of [ non-const lvalue references ] internally … to ensure they exist and are valid throughout the running time of the thread created. jogojapan
To ensure users don’t shoot themselves in the foot, the async function preemptively copies an lvalue argument in the event that the lvalue goes out of scope, and is destroyed before the thread completes its function. The breakdown of the numbers above is thus:
- A local copy is created in the
std::async
function - The copy is then moved into the new thread.
This is the safe route and minimizes unintended errors for the average user of
the async api. However, if you’re library writer, you may want to choose not
to make an expensive copy, and in that case you can either pass a pointer, or
wrap the reference with std::ref
(as suggested in a comment by tshino below).
move_checker checker;
assert( checker.copies() == 0 );
assert( checker.moves() == 0 );
// pass using std::ref wrapper
std::future<void> task =
std::async(
std::launch::async,
printContents<move_checker>,
std::ref(checker)
);
// wait for the task to complete
task.wait();
// no copies or moves!
assert( checker.copies() == 0 );
assert( checker.moves() == 0 );
Note, this would not work for rvalues, as std::ref
cannot hold an
rvalue reference. To summarize the local solution:
To avoid an extra copy when passing lvalue references as arguments that you know will outlive the thread through
std::async
, you can wrap them withstd::ref
.
Continued in Part 2
Now that we have come upon the problem and seen a simple local solution, we’ll consider it in a more generic context of perfect forwarding in Part 2 of the article.