[Users] cactus performance

Wed Mar 21 07:08:04 CDT 2012

On Tue, Mar 20, 2012 at 10:45 PM, Frank Loeffler <knarf at cct.lsu.edu> wrote:
> Hi,
>
> On Tue, Mar 20, 2012 at 05:14:38PM -0700, Jose Fiestas Iquira wrote:
>> Is there documentation about performance of Cactus ETK in large machines. I
>> have some questions regarding best performance according to initial
>> conditions, calculation time required, etc.
>
> Performance very much depends on the specific setup. One poorly scaling
> function can ruin the otherwise best run.
>
>> If there are performance plots like Flops vs. Number of nodes would help me
>> as well.
>
> Flops are very problem-dependent. There isn't such thing as flops/s for
> Cactus, not even for one given machine. If we talk about the Einstein
> equations and a typical production run I would expect a few percent of
> the peak performance of any given CPU, as we are most of the time bound by
> memory bandwidth.

I would like to add some more numbers to Frank's description:

One some problems (e.g. evaluating the BSSN equations with a
higher-order stencil), I have measured more than 20% of the
theoretical peak performance. The bottleneck seem to be L1 data cache
accesses, because the BSSN equation kernels require a large number of
local (temporary) variables.

If you look for parallel scaling, then e.g.
<http://arxiv.org/abs/1111.3344> contains a scaling graph for the BSSN
equations evolved with mesh refinement. This shows that, for this
benchmark, the Einstein Toolkit scales well to more than 12k cores.

-erik

-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/