Vectorizing smallpt with ispc

Today’s processors can do a lot of work in parallel on a single core. Programs need to be specifically designed for that, as this functionality is exposed via specialized instructions such as SSE or AVX instructions on x86 processors. By operating on wide registers storing vectors instead of scalars, a single SSE or AVX vector instruction operates on 4 or 8 numbers simultaneously.

Most programming languages do not expose vector types and functions to the programmer. Instead, compilers try to convert a scalar program into a vectorized program automatically in a process called automatic vectorization. Unfortunately, compilers often fail to do so, and it is often unclear when and why this happens.

The Intel SPMD program compiler (ispc) solves this issue via some small extensions to the C programming language. Motivated by a recent series of blog posts about ispc, I ported smallpt to this compiler to get a first impression of how much path tracing can profit from vectorization.

Result image of smallpt-ispc

The following table shows the time for rendering one sample at a resolution of 640×480 pixels on an Intel Core i7-4770K using AVX2 (1 core):

	C	ISPC	Speed-up
1 bounce	106 ms	16 ms	6.6x
2 bounces	154 ms	28 ms	5.5x
6 bounces	302 ms	80 ms	3.8x
6 bounces with RR	404 ms	338 ms	2.3x

In theory, the maximum speed-up that could be achieved is 8x in case of AVX2. Unsurprisingly, the biggest improvements can be observed in camera rays, where the workload is mostly coherent.

Have a look at the code for smallpt-ispc on Github to find out more.

Share this:

Related

Leave a comment Cancel reply