[Users] cactus performance

Wed Mar 21 07:15:05 CDT 2012

On 21 Mar 2012, at 03:45, Frank Loeffler wrote:

> Hi,
> 
> On Tue, Mar 20, 2012 at 05:14:38PM -0700, Jose Fiestas Iquira wrote:
>> Is there documentation about performance of Cactus ETK in large machines. I
>> have some questions regarding best performance according to initial
>> conditions, calculation time required, etc.
> 
> Performance very much depends on the specific setup. One poorly scaling
> function can ruin the otherwise best run.
> 
>> If there are performance plots like Flops vs. Number of nodes would help me
>> as well.
> 
> Flops are very problem-dependent. There isn't such thing as flops/s for
> Cactus, not even for one given machine. If we talk about the Einstein
> equations and a typical production run I would expect a few percent of
> the peak performance of any given CPU, as we are most of the time bound by
> memory bandwidth.

Hi Frank,

Do you have some tests/numbers that support his?  My recollection is that we get ~30% of the peak performance, though this wouldn't have been in production runs.  I'm also a bit surprised by the memory bandwidth statement, though it is certainly possible.

> Calculation time for initial data very much depends on the type of
> initial data. Some initial data are setup in a few seconds, some may
> need a day. Scaling of initial data computation also might be quite
> different from that of the evolution, which is why sometimes it makes
> sense to checkpoint right after initial data setup and restart using a
> different number of cores.
> 
>> Other main question I have is if Cactus scales in large machines.
> 
> I am afraid that this question is too general. 'Cactus' itself doesn't
> even deal directly with MPI, so it would be better to ask, e.g., how
> well Carpet scales. And given that still quite general question it can
> be said that Carpet scales to at least 100k cores - however, that again
> very much depends on your setup. Using a couple of levels of mesh
> refinement bring down the scaling limit to maybe 10k cores, adding
> re-gridding and a few common analysis routines and we talk about 2k-4k
> cores practical limit.  But again, these numbers very much depend on
> your setup. You might be able to perform much better for certain
> problems, and much worse for others.

I was able to scale the first few iterations of a production BBH simulation up to 2400 cores of our cluster (strong scaling) without losing too much performance.  If you need firm numbers I can look them up.

>> I am using McLachlan, but any other application would give me an idea of
>> what I should expect for my runs in big machines.
> 
> Do the numbers I gave help?

You can refer to the report of the XiRel project, which investigated scaling of our production Einstein codes up to large numbers of cores:

	Jian Tao, Gabrielle Allen, Ian Hinder, Erik Schnetter, and Yosef Zlochower. 
	XiRel: Standard benchmarks for numerical relativity codes using Cactus and Carpet.
	Technical Report CCT-TR-2008-5, Louisiana State University, 2008.

Unfortunately, the web is full of dead links to this project, and I can't find anything on the CCT web pages which works.  

Frank, do you know what the following links have changed into?

http://www.cct.lsu.edu/xirel/
http://www.cct.lsu.edu/CCT-TR/CCT-TR-2008-5

-- 
Ian Hinder
http://numrel.aei.mpg.de/people/hinder