Comment (by eschnett):

 A {{{sum}}} SIMD reduction is quite expensive, much more so (ten times?)
 than a vectorized {{{add}}}. We can certainly copy the code over (I wrote
 it, so I can re-licence it), or we can switch Cactus to Vecmathlib in the
 long term.

 There might be another way to speed up an interpolation by preferring
 certain directions. I don't know whether this is already implemented --
 did you think about a particular thorn?

