Multithreading in C++11 - part 1

Concurrency and multithreading is all about running multiple pieces of code in parallel. If you have the hardware for it in the form of a nice shiny multi-core CPU or a multi-processor system then this code can run truly in parallel, otherwise it is interleaved by the operating system — a bit of one task, then a bit of another. This is all very well, but somehow you have to specify what code to run on all these threads. Let’s get started with std::thread

#include <thread>
#include <iostream>

void function()
{
    std::cout << "From thread 1" << std::endl;
}

int main()
{
    std::thread t(function);
    t.join();

    std::cout << "From main thread" << std::endl;

    std::cin.ignore();
    return 0;
}

First two lines include thread and iostream header files. Then we define a function thread_function that will be executed by the thread, for now it will just print something to shell. In main function we first create a thread object t and pass the function to be executed by this thread, in this case it is thread_function. thread object construction launches a thread and executes the function but in the meanwhile main thread continues with the next statement. The main thread continues on irrespective of progress of thread it spawned. If main thread returns while thread t is still working then application terminates, thereby killing the spawned thread t. That’s where the state t.join() comes into picture, it ensures that main thread does not progress further till thread t returns. The statement std::cin.ignore(); ensures that shell waits for you for keyboard prompt before it terminates.

Here is how the application progresses:


In previous example we saw thread was created with function to be executed as a parameter. Actually what thread really wants is a callable entity. Now we know that any entity that has operator() is a callable entity:

#include <thread>
#include <iostream>

struct Functor
{
    void operator()() const
    {
        std::cout << "From functor" << std::endl;
    }
};

void function()
{
    std::cout << "From thread 1" << std::endl;
}

int main()
{
    std::thread t1(function);

    Functor functor;
    std::thread t2(functor);

    auto lambdaFunc = []()->void{std::cout << "From lambda function" << std::endl;};
    std::thread t3(lambdaFunc);

    t1.join();
    t2.join();
    t3.join();

    std::cin.ignore();
    return 0;
}

Here in this code snippet we are creating three threads:

  1. Thread t1 is passed a function to execute
  2. Thread t2 is passed a functor to execute
  3. Thread t3 is passed a lambda to execute

All these work fine, as they are callable entities. All these examples have been simple as the function to execute are not passed any data. Here is another example where you can pass some parameters to executing thread:

#include <thread>
#include <iostream>

struct Functor
{
    void operator()(double x, double y) const
    {
        std::cout << "From functor -- sum of x & y is:" << (x+y) << std::endl;
    }
};

int main()
{

    Functor functor;
    std::thread t(functor, 10,12);
    t.join();

    std::cin.ignore();
    return 0;
}

All of this has been just demonstration code, lets write something that is bit more useful, like computing dotproduct:

Update: This problem has been resolved in the latest version of Visual Studio (VS 2011 Beta at this point of time)

#include <thread>
#include <iostream>

struct DotProduct
{
    double* dp;
    double* a;
    double* b;
    size_t numElems;

    DotProduct(double* result, double* a, double* b, size_t elems) :
                dp(result), a(a), b(b), numElems(elems)
    {}

    void operator()() const
    {
        for(decltype(numElems) idx = 0; idx < numElems; ++idx )
        {
            *dp += a[idx] * b[idx];
        }
    }
};

int main()
{
    static const size_t NumElems = 100000;
    double* a = new double[NumElems];
    double* b = new double[NumElems];

    for( size_t idx = 0; idx < NumElems; ++idx )
    {
        a[idx] = idx;
        b[idx] = NumElems - idx;
    }

    //  for now we are going to have 4 threads
    //
    size_t increment = NumElems/4;

    //  we ensure that each DotProduct object holds onto separate range
    //  so these can be executed in parallel
    //  as the computation in each thread is fairly predictable
    //  we can go with equal distribution
    //
    double dp1_sum = 0;
    DotProduct dp1(&dp1_sum, a+0*increment,b+0*increment,increment);

    double dp2_sum = 0;
    DotProduct dp2(&dp2_sum, a+1*increment,b+1*increment,increment);

    double dp3_sum = 0;
    DotProduct dp3(&dp3_sum, a+2*increment,b+2*increment,increment);

    double dp4_sum = 0;
    DotProduct dp4(&dp4_sum, a+3*increment,b+3*increment,increment);

    //  create four threads and assign each dotproduct
    //  evaluation job to each
    //
    std::thread t1(dp1);
    std::thread t2(dp2);
    std::thread t3(dp3);
    //std::thread t4(dp4);
    dp4();

    //  ensure that main thread does not proceed further
    //  till all the threads have completed execution
    //
    t1.join();
    t2.join();
    t3.join();
    //t4.join();

    //  at the end just add-up all the dot-products computed by
    //  each thread
    //
    double dotprod = dp1_sum + dp2_sum + dp3_sum+ dp4_sum;

    std::cout << "Dotproduct is " << dotprod << std::endl;

    delete[] a;
    delete[] b;

    std::cin.ignore();
    return 0;
}

Well Microsoft’s thread library is not entirely bug-free, with above code I ran into various threading issues with mutex lock and unlock, here is the bug report in case if you are interested.

I intend to explore some more thread enhancements with Visual Studio 2011 Developer Preview. Stay tuned…