Implicit threading means the language runtime, library, or OS automatically handles the threading behind the scenes. Many approaches are used for this. 5 of them are included below.
Thread Pools
Fixed number of worker threads, waiting for tasks. Faster and memory-efficient because threads are reused instead of re-creating.
Fork–Join
Tasks are recursively split (forked) and results are combined (join).
OpenMP
A compiel+runtime library. Directives are used to define parallel regions. Compiler translates them into runtime library calls. Threads are automatically created.
#pragma omp parallel
{
// parallel code
}
In the runtime, the above segment runs on as many threads as there are cores, in parallel.
#include <iostream>
#include <vector>
#include <algorithm>
#include <omp.h>
int main(int argc, char** argv) {
long n = 1000000;
if (argc > 1) n = std::stol(argv[1]);
std::vector<double> a(n), b(n), c(n);
for (long i = 0; i < n; ++i) {
a[i] = static_cast<double>(i);
b[i] = static_cast<double>(n - i);
}
double t0 = omp_get_wtime();
// the below for loop is parallelized
#pragma omp parallel for
for (long i = 0; i < n; ++i) {
c[i] = a[i] + b[i];
}
double t1 = omp_get_wtime();
std::cout << "Added " << n << " elements in " << (t1 - t0) << " seconds\n";
long m = std::min(n, 10L);
for (long i = 0; i < m; ++i) {
std::cout << "c[" << i << "] = " << c[i] << '\n';
}
return 0;
}
Grand Central Dispatch
Aka. GCD. Apple’s system-level concurrency framework. Uses task-based dispatch queues. Tasks (blocks defined by ^{...} or functions) are submitted to queues. GCD decides how to schedule them on available CPU cores.
2 type of queues:
- Serial
Aka. main queue. Blocks removed in FIFO order. - Concurrent
Removed in FIFO order but several may be removed at a time. There are 4 system wide queues divided by quality of service:- QOS_CLASS_USER_INTERACTIVE
- QOS_CLASS_USER_INITIATED
- QOS_CLASS_USER_UTILITY
- QOS_CLASS_USER_BACKGROUND
Both type of queues are, per-process. More queues can be created programmatically.
Used in macOS and iOS.
Intel TBB
Short for Intel Threading Building Blocks. A C++ template+runtime library. TBB schedules tasks automatically based on available cores. Provides high-level parallel constructs such as parallel_for, parallel_reduce, and task-based decomposition. The runtime system manages the thread pool, load balancing, and work stealing.