You should thank your compiler for `vectorizing` your loops
Vectorization (in CPU sense) is the way of performing an operation on multiple elements of a vector at once. Thus the operation is done chunk-wise instead of element-wise.
It’s either done by your compiler automagically, or explicitly if you’re writing assembly by hand (SIMD instructions).
I’m going to talk about the former case.
Let’s say that you have a loop that performs add operation on two integer arrays. In the most simplistic case, that would require your code to loop on each member of each array, add and move on to next element.
If you could vectorize that loop, your CPU would be able to process multiple elements from both of the arrays at the same time.
That will reduce the number of CPU cycles, and if done properly, reduce your binary size.
How can I see that?
If you’re using
gcc you can dump loop vectorization info with these flags:
Continue reading →