The SSE and AVX2 extensions of the x86 instruction set can dramatically improve the speed of a program if you can optimize ot for SIMD instructions.
I made a small test with a fractal generator, which lends itself very well to this kind of optimization. I was impressed by the results: In very little time, thanks to the intel intrisics guide, I have been able to reduce the rendering time by a factor of 6.6 using AVX2 extensions (200ms to 30ms). This is pretty close to the maximum gain you can expect when working on 8 pixels at the same time.
As a test I made a SSE version of the algorithm, which doesn’t seem to work on computers without AVX2, even if it should support it. I’ll try to debug this once I’ll have access to a computer with older hardware.
The source code available on Github, see function generateImageAVX