Today’s processors can do a lot of work in parallel on a single core. Programs need to be specifically designed for that, as this functionality is exposed via specialized instructions such as SSE or AVX instructions on x86 processors. By operating on wide registers storing vectors instead of scalars, a single SSE or AVX vector instruction operates on 4 or 8 numbers simultaneously.
Most programming languages do not expose vector types and functions to the programmer. Instead, compilers try to convert a scalar program into a vectorized program automatically in a process called automatic vectorization. Unfortunately, compilers often fail to do so, and it is often unclear when and why this happens.
The Intel SPMD program compiler (ispc) solves this issue via some small extensions to the C programming language. Motivated by a recent series of blog posts about ispc, I ported smallpt to this compiler to get a first impression of how much path tracing can profit from vectorization.
The following table shows the time for rendering one sample at a resolution of 640×480 pixels on an Intel Core i7-4770K using AVX2 (1 core):
|1 bounce||106 ms||16 ms||6.6x|
|2 bounces||154 ms||28 ms||5.5x|
|6 bounces||302 ms||80 ms||3.8x|
|6 bounces with RR||404 ms||338 ms||2.3x|
In theory, the maximum speed-up that could be achieved is 8x in case of AVX2. Unsurprisingly, the biggest improvements can be observed in camera rays, where the workload is mostly coherent.
Have a look at the code for smallpt-ispc on Github to find out more.