[Users] cactus performance
Ian Hinder
ian.hinder at aei.mpg.de
Wed Mar 21 07:15:05 CDT 2012
On 21 Mar 2012, at 03:45, Frank Loeffler wrote:
> Hi,
>
> On Tue, Mar 20, 2012 at 05:14:38PM -0700, Jose Fiestas Iquira wrote:
>> Is there documentation about performance of Cactus ETK in large machines. I
>> have some questions regarding best performance according to initial
>> conditions, calculation time required, etc.
>
> Performance very much depends on the specific setup. One poorly scaling
> function can ruin the otherwise best run.
>
>> If there are performance plots like Flops vs. Number of nodes would help me
>> as well.
>
> Flops are very problem-dependent. There isn't such thing as flops/s for
> Cactus, not even for one given machine. If we talk about the Einstein
> equations and a typical production run I would expect a few percent of
> the peak performance of any given CPU, as we are most of the time bound by
> memory bandwidth.
Hi Frank,
Do you have some tests/numbers that support his? My recollection is that we get ~30% of the peak performance, though this wouldn't have been in production runs. I'm also a bit surprised by the memory bandwidth statement, though it is certainly possible.
> Calculation time for initial data very much depends on the type of
> initial data. Some initial data are setup in a few seconds, some may
> need a day. Scaling of initial data computation also might be quite
> different from that of the evolution, which is why sometimes it makes
> sense to checkpoint right after initial data setup and restart using a
> different number of cores.
>
>> Other main question I have is if Cactus scales in large machines.
>
> I am afraid that this question is too general. 'Cactus' itself doesn't
> even deal directly with MPI, so it would be better to ask, e.g., how
> well Carpet scales. And given that still quite general question it can
> be said that Carpet scales to at least 100k cores - however, that again
> very much depends on your setup. Using a couple of levels of mesh
> refinement bring down the scaling limit to maybe 10k cores, adding
> re-gridding and a few common analysis routines and we talk about 2k-4k
> cores practical limit. But again, these numbers very much depend on
> your setup. You might be able to perform much better for certain
> problems, and much worse for others.
I was able to scale the first few iterations of a production BBH simulation up to 2400 cores of our cluster (strong scaling) without losing too much performance. If you need firm numbers I can look them up.
>> I am using McLachlan, but any other application would give me an idea of
>> what I should expect for my runs in big machines.
>
> Do the numbers I gave help?
You can refer to the report of the XiRel project, which investigated scaling of our production Einstein codes up to large numbers of cores:
Jian Tao, Gabrielle Allen, Ian Hinder, Erik Schnetter, and Yosef Zlochower.
XiRel: Standard benchmarks for numerical relativity codes using Cactus and Carpet.
Technical Report CCT-TR-2008-5, Louisiana State University, 2008.
Unfortunately, the web is full of dead links to this project, and I can't find anything on the CCT web pages which works.
Frank, do you know what the following links have changed into?
http://www.cct.lsu.edu/xirel/
http://www.cct.lsu.edu/CCT-TR/CCT-TR-2008-5
--
Ian Hinder
http://numrel.aei.mpg.de/people/hinder
More information about the Users
mailing list