[Users] cactus performance

Thu Mar 29 10:01:17 CDT 2012

Hi Jose,

look, the Einstein Toolkit team is very happy to help new users like 
you to get started and sort out specific questions regarding parts
of the toolkit. 

What we really can't do is provide you with very basic
high-performance computing training via the mailing list. This is
because many if not most people on this list actually volunteer to
help in their spare time and are not paid as consultants for general
HPC questions. You are at Berkeley lab and there are many experts that
can help you with basic HPC questions, plus there are tons of resources
available on-line, that I would kindly ask you to consult first.

Regarding your scaling question:

https://support.scinet.utoronto.ca/wiki/index.php/Introduction_To_Performance

gives a good introduction to performance measurements. There are many
more webpages like this available on the internet. The plot shown in
the Einstein Toolkit paper (arXiv:1111.3344) is a weak scaling test.

Best,

 - Christian Ott

On Wed, Mar 28, 2012 at 11:55:30PM -0700, Jose Fiestas Iquira wrote:
> Hello,
> 
> I reduced the simulation time by setting Cactus::cctk_final_time = .01 in
> order to measure performance with CrayPat. It run only 8 iterations. I used
> 16 and 24 cores for testing, and obtained almost the same performance
> (~1310 sec. simulation time, and ~16MFlops).
> 
> It remembers me Fig.2 in the reference you sent
> http://arxiv.org/abs/1111.3344
> 
> which I don't really understand. I would expect shorter times with larger
> number of cores. Why does it not happen here?
> 
> I am using McLachlan to simulate a binary system. So, all my regards are
> concerning this specific application. Do you think it will scale in the
> sense that simulation time will be shorter, the larger of number of cores I
> use?
> 
> Thanks,
> Jose
> 
> 
> 
> On Wed, Mar 21, 2012 at 5:08 AM, Erik Schnetter <schnetter at cct.lsu.edu>wrote:
> 
> > On Tue, Mar 20, 2012 at 10:45 PM, Frank Loeffler <knarf at cct.lsu.edu>
> > wrote:
> > > Hi,
> > >
> > > On Tue, Mar 20, 2012 at 05:14:38PM -0700, Jose Fiestas Iquira wrote:
> > >> Is there documentation about performance of Cactus ETK in large
> > machines. I
> > >> have some questions regarding best performance according to initial
> > >> conditions, calculation time required, etc.
> > >
> > > Performance very much depends on the specific setup. One poorly scaling
> > > function can ruin the otherwise best run.
> > >
> > >> If there are performance plots like Flops vs. Number of nodes would
> > help me
> > >> as well.
> > >
> > > Flops are very problem-dependent. There isn't such thing as flops/s for
> > > Cactus, not even for one given machine. If we talk about the Einstein
> > > equations and a typical production run I would expect a few percent of
> > > the peak performance of any given CPU, as we are most of the time bound
> > by
> > > memory bandwidth.
> >
> > I would like to add some more numbers to Frank's description:
> >
> > One some problems (e.g. evaluating the BSSN equations with a
> > higher-order stencil), I have measured more than 20% of the
> > theoretical peak performance. The bottleneck seem to be L1 data cache
> > accesses, because the BSSN equation kernels require a large number of
> > local (temporary) variables.
> >
> > If you look for parallel scaling, then e.g.
> > <http://arxiv.org/abs/1111.3344> contains a scaling graph for the BSSN
> > equations evolved with mesh refinement. This shows that, for this
> > benchmark, the Einstein Toolkit scales well to more than 12k cores.
> >
> > -erik
> >
> > --
> > Erik Schnetter <schnetter at cct.lsu.edu>
> > http://www.perimeterinstitute.ca/personal/eschnetter/
> >

> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users