Vectorizing smallpt with ispc

Today’s processors can do a lot of work in parallel on a single core. Programs need to be specifically designed for that, as this functionality is exposed via specialized instructions such as SSE or AVX instructions on x86 processors. By operating on wide registers storing vectors instead of scalars, a single SSE or AVX vector instruction operates on 4 or 8 numbers simultaneously.

Most programming languages do not expose vector types and functions to the programmer. Instead, compilers try to convert a scalar program into a vectorized program automatically in a process called automatic vectorization. Unfortunately, compilers often fail to do so, and it is often unclear when and why this happens.

The Intel SPMD program compiler (ispc) solves this issue via some small extensions to the C programming language. Motivated by a recent series of blog posts about ispc, I ported smallpt to this compiler to get a first impression of how much path tracing can profit from vectorization.

Result image of smallpt-ispc

The following table shows the time for rendering one sample at a resolution of 640×480 pixels on an Intel Core i7-4770K using AVX2 (1 core):

C ISPC Speed-up
1 bounce 106 ms 16 ms 6.6x
2 bounces 154 ms 28 ms 5.5x
6 bounces 302 ms 80 ms 3.8x
6 bounces with RR 404 ms 338 ms 2.3x

In theory, the maximum speed-up that could be achieved is 8x in case of AVX2. Unsurprisingly, the biggest improvements can be observed in camera rays, where the workload is mostly coherent.

Have a look at the code for smallpt-ispc on Github to find out more.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: