Perfect Forwarding to Async Lambdas (Part 1)

18 Dec 2012

Perfect Forwarding to Async Lambdas (Part 1)

Perfect forwarding allows template functions to forward the arguments “as is” to any other function they call. This helps minimize the number of unnecessary copies and conversions when delegating information to other functions. In a quest to get rid of copying completely in a library I was writing, I came across the problem of perfect forwarding to functions launched on a separate thread.

Overview

Before tackling the problem of perfect forwarding in the next part, I will quickly overview rvalues, present a way to measure copies/moves, and finally observe a “problem” with std::async. In the second part of the article we will cover perfect forwarding, and overview the problem in that context. Finally a generic solution will be presented.

This article is geared towards library writers and those who write generic template code in C++11. Most of the code from both parts of the article can be found in this gist.

Lvalues and Rvalues

Let’s start with a quick recap of lvalues and rvalues. ( You can skip this section if you have a firm grasp of the concept )

With C++11 we have to be able to distinguish lvalues and rvalues. The names hint at the fact that lvalues are values that tend to appear on the left hand side of an assignment expression, whereas rvalues are those that are on the right. In other words, lvalues are values that you can freely refer to in their scope because they are bound to an identifier- such as variables, and functions.

std::vector<int> createVector(std::string param);
// createVector is an lvalue. It can be referred to

std::vector<int> myVec = createVector("hello world");
// myVec is also an lvalue, it is on the left side of the expression
// and can be referred to

Rvalues, cannot be directly referred to because they are temporaries and are not bound to an identifier- such as values returned by a function, or the direct result of an inline construction:

std::vector<int> createVector(std::string param);

std::vector<int> myVec = createVector("hello world");
//                      ^             ^^^^^^^^^^^^^
//                      |         temporary std::string
//                      |
//            temporary std::vector<int>
// 
// There are two rvalues in the expression above.

With C++11, one can now create functions that accept lvalue references (T&), as well as rvalue references (T&&). Since rvalues are temporaries, it allows us to transfer ownership of that temporary, instead of performing a needless copy.

The new notation allows a function to distinguish between lvalues T& that already exist in an outer scope, vs rvalues T&& that are yet to be bound to a scope.

To facilitate this transfer of ownership, the function std::move is available. Here is a quick example of its usage in the context of object construction:

class vector_holder
{
    std::vector<int> vec_;

public:
    // copy a vector into this object.
    vector_holder(std::vector<int> const& vec)
        : vec_(vec)
    {
        std::cout << "copied!" << std::endl;
    }

    // move a vector into this object.
    // essentially a transfer of ownership
    vector_holder(std::vector<int>&& vec)
        : vec_(std::move(vec))
    {
        std::cout << "moved!" << std::endl;
    }

    // move constructor.
    // instead of copying, we transfer
    // the internals into this object
    vector_holder(vector_holder&& other)
        : vec_(std::move(other.vec_))
    { }
};

Given this quick overview, it should be apparent that moves help save unnecessary copies- essential if you want to write an efficient library. This is the “value of rvalues”. For a more focused overview, you could also look at a larger article by Alex Allain about move semantics.

Profiling copies/moves

In order to ensure optimal performance I wrote tests to count how many copies or moves occurred during the invocations of various API calls. To carry out these tests, I created a simple class that kept a shared count of the amount of moves and copies that were performed on it:

class move_checker
{
    // shared counters of copies and moves
    std::shared_ptr<int> copies_;
    std::shared_ptr<int> moves_;

public:
    // expensive payload
    std::vector<int> payload; 

    typedef std::vector<int>::const_iterator const_iterator;

    // construct a new checker, with counters reset to 0,
    // and a non-empty payload.
    move_checker()
        : copies_(new int(0)),
          moves_(new int(0)),
          payload({1, 2, 3, 4, 5, 6, 7})
    { }

    // copy constructor. counts copy operations
    move_checker(move_checker const& other)
        : copies_(other.copies_),
          moves_(other.moves_),
          payload(other.payload)
    {
        *copies_ += 1;
    }
    // copy assignment is similar to copy constructor

    // move constructor. counts move operations
    move_checker(move_checker&& other)
        : copies_(other.copies_),
          moves_(other.moves_),
          payload(std::move(other.payload))
    {
        *moves_ += 1;
    }
    // move assignment is similar to move constructor

    const_iterator begin() const { return payload.begin(); }
    const_iterator end()   const { return payload.end(); }

    // methods to report on the number of copies/moves
    int copies() const { return *copies_; }
    int moves()  const { return *moves_; }
};

In case you just skipped the above block of code, the usage is simple:

move_checker checker;

assert( checker.copies() == 0 );
assert( checker.moves() == 0 );

move_checker copy(checker);

assert( copy.copies() == 1 ); // increased
assert( copy.moves() == 0 );

move_checker moved(std::move(checker));

assert( moved.copies() == 1 );
assert( moved.moves() == 1 ); // increased

Armed with the move_checker I was able to profile my code, and make sure there were no extraneous copies. During the rest of the article, I will provide asserts with the actual number of copy/move counts a particular piece of code produces.

Std::thread and async

The next piece of the puzzle is launching functions on other threads. Thankfully C++11 comes with its own standard implementation of threads, allowing for easy execution of functions, and passing of arguments to other threads. Here I run a function that prints the contents of an iterable on another thread using std::async:

// prints the contents of an iterable
template <typename Iterable>
void printContents(Iterable const& iterable)
{
    for (auto e : iterable)
    {
        std::cout << e << std::endl;
    }
}

move_checker checker;

assert( checker.copies() == 0 );
assert( checker.moves() == 0 );

// print the contents on another thread
std::future<void> task =
    std::async(
        std::launch::async,
        printContents<move_checker>,
        std::move(checker) // rvalue
    );

// wait for the task to complete
task.wait();

// two moves are performed
assert( checker.copies() == 0 );
assert( checker.moves() == 2 );

Notice, to run the function on another thread, the arguments have to be available on the other thread. I can move the checker into another thread if I don’t need it. As expected, no copies are performed. The two moves are accounted for:

One move into the std::async function itself
Another move into the newly created thread.

Since our printContents function takes an object by const& it is normal to expect that no copies or moves are performed- we just access the object through a reference from another thread. Let’s try it:

move_checker checker;

assert( checker.copies() == 0 );
assert( checker.moves() == 0 );

// (hopefully) pass by reference
std::future<void> task =
    std::async(
        std::launch::async,
        printContents<move_checker>,
        checker // lvalue
    );

// wait for the task to complete
task.wait();

// a copy occured!
assert( checker.copies() == 1 );
assert( checker.moves() == 1 );

Woops! Where did that copy come from? This manifested as a cryptic compilation error when I was doing perfect forwarding (we’ll get to that shortly). As a result I posted a question on stack overflow, and the answer is relevant here.

… async will always make a copy of [ non-const lvalue references ] internally … to ensure they exist and are valid throughout the running time of the thread created. jogojapan

To ensure users don’t shoot themselves in the foot, the async function preemptively copies an lvalue argument in the event that the lvalue goes out of scope, and is destroyed before the thread completes its function. The breakdown of the numbers above is thus:

A local copy is created in the std::async function
The copy is then moved into the new thread.

This is the safe route and minimizes unintended errors for the average user of the async api. However, if you’re library writer, you may want to choose not to make an expensive copy, and in that case you can either pass a pointer, or wrap the reference with std::ref (as suggested in a comment by tshino below).

move_checker checker;

assert( checker.copies() == 0 );
assert( checker.moves() == 0 );

// pass using std::ref wrapper
std::future<void> task =
    std::async(
        std::launch::async,
        printContents<move_checker>,
        std::ref(checker)
    );

// wait for the task to complete
task.wait();

// no copies or moves!
assert( checker.copies() == 0 );
assert( checker.moves() == 0 );

Note, this would not work for rvalues, as std::ref cannot hold an rvalue reference. To summarize the local solution:

To avoid an extra copy when passing lvalue references as arguments that you know will outlive the thread through std::async, you can wrap them with std::ref.

Continued in Part 2

Now that we have come upon the problem and seen a simple local solution, we’ll consider it in a more generic context of perfect forwarding in Part 2 of the article.

c++
c++11
advanced