Implicit Threading | Sahithyan's Notes

Implicit threading means the language runtime, library, or OS automatically handles the threading behind the scenes. Many approaches are used for this. 5 of them are included below.

Thread Pools

Fixed number of worker threads, waiting for tasks. Faster and memory-efficient because threads are reused instead of re-creating.

Fork–Join

Tasks are recursively split (forked) and results are combined (join).

OpenMP

A compiel+runtime library. Directives are used to define parallel regions. Compiler translates them into runtime library calls. Threads are automatically created.

#pragma omp parallel
{
    // parallel code
}

In the runtime, the above segment runs on as many threads as there are cores, in parallel.

#include <iostream>
#include <vector>
#include <algorithm>
#include <omp.h>

int main(int argc, char** argv) {
    long n = 1000000;
    if (argc > 1) n = std::stol(argv[1]);

    std::vector<double> a(n), b(n), c(n);
    for (long i = 0; i < n; ++i) {
        a[i] = static_cast<double>(i);
        b[i] = static_cast<double>(n - i);
    }

    double t0 = omp_get_wtime();

    // the below for loop is parallelized
    #pragma omp parallel for
    for (long i = 0; i < n; ++i) {
        c[i] = a[i] + b[i];
    }

    double t1 = omp_get_wtime();

    std::cout << "Added " << n << " elements in " << (t1 - t0) << " seconds\n";

    long m = std::min(n, 10L);
    for (long i = 0; i < m; ++i) {
        std::cout << "c[" << i << "] = " << c[i] << '\n';
    }

    return 0;
}

Grand Central Dispatch

Aka. GCD. Apple’s system-level concurrency framework. Uses task-based dispatch queues. Tasks (blocks defined by ^{...} or functions) are submitted to queues. GCD decides how to schedule them on available CPU cores.

2 type of queues:

Serial
Aka. main queue. Blocks removed in FIFO order.
Concurrent
Removed in FIFO order but several may be removed at a time. There are 4 system wide queues divided by quality of service:
- QOS_CLASS_USER_INTERACTIVE
- QOS_CLASS_USER_INITIATED
- QOS_CLASS_USER_UTILITY
- QOS_CLASS_USER_BACKGROUND

Both type of queues are, per-process. More queues can be created programmatically.

Used in macOS and iOS.

Intel TBB

Short for Intel Threading Building Blocks. A C++ template+runtime library. TBB schedules tasks automatically based on available cores. Provides high-level parallel constructs such as parallel_for, parallel_reduce, and task-based decomposition. The runtime system manages the thread pool, load balancing, and work stealing.