Compiler Optimizations | Sahithyan's Notes

Optimization techniques are employed to improve the translated code in terms of performance.

Flags

Otions provided by compilers to control the optimization level. These flags instruct the compiler on how aggressively it should optimise, and in what manner, to potentially achieve better runtime performance, reduced code size, or a balance between the two.

For example, gcc provides a variety of optimization flags that can be used to control the optimization process. Some of the most commonly used flags include:

O0: No optimization. Default. Fastest. Generates the most straightforward, often least performant, machine code.
O1: Basic optimization. Reduces code size, execution time. Does not take an excessive amount of compilation time.
O2: Further optimization. Includes almost all recommended optimizations that do not involve a space-speed trade-off.
O3: Full optimization. Including those that might increase the generated code size.
Os: Optimise for size. Prioritises reducing the code size over execution speed.

Instruction Compression

In many modern architectures, including RISC designs, there’s often a fixed instruction length, ensuring simplicity in fetching and decoding operations. However, not all instructions need the full width provided, leading to potential inefficiencies in memory usage.

Instruction compression aims to address this by:

Identifying Common Patterns
By analysing frequently used instruction sequences or patterns, these can be represented in a compressed form.
Variable-length Encoding
Instead of having a fixed length for every instruction, compressed instructions might use variable-length encoding, where frequent instructions are represented using fewer bits.
Decompression Mechanism
For execution, compressed instructions need to be decompressed. This decompression happens either in hardware (before the instruction is executed) or via specialised software routines.

Instruction Level Optimization

The process of enhancing the efficiency and performance of individual instructions in a program, often within the context of a particular ISA. Directly impacts the speed, power consumption, and overall efficiency of code execution on a hardware platform.

Multiple techniques can be used for this.

Static Scheduling

Reordering instructions at compile-time to reduce pipeline hazards.

What it tries to achieve: • No two instructions fight for the same resource in the same cycle. • No instruction reads a value before it’s produced. • Pipeline bubbles are minimized.

Loop Unrolling

Increasing the loop body’s size by replicating its content multiple times, reducing the overhead of loop control. For smaller loops, the loop control can be removed entirely. Can be paired with pipeline scheduling for better results.

Reduce branch frequency and stalls. Increases register pressure.

Strip Mining

Dividing a loop (with unknown iteration count) into smaller loops. Each smaller loop operates on a smaller subset of data. Improves cache locality and reduce the number of iterations required to complete the loop.

Split loop into:

Small leftover loop (n mod k)
Main big loop (n/k) unrolled version.

Function Inlining

Replacing a function call with the actual body of the function. Avoids the overhead of function calling. Causes duplicated code; bigger executable file.

Precompute values

Replacing computations with their values at compile time. Beneficial when dealing with invariant values inside loops or frequently called functions.