Dear Christian,<div>Thank you. I understand what you are saying. I am mainly asking regarding McLachlan. Sorry if it appears I want to learn about HPC from you.</div><div>I am working with people in the lab, for sure. They are just not aware about Cactus and I am learning as well.</div>
<div>I apologize for that, and will try to avoid it in the future.</div><div>Sincerely,</div><div>Jose</div><div><br><br><div class="gmail_quote">On Thu, Mar 29, 2012 at 8:01 AM, Christian D. Ott <span dir="ltr"><<a href="mailto:cott@tapir.caltech.edu">cott@tapir.caltech.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Hi Jose,<br>
<br>
look, the Einstein Toolkit team is very happy to help new users like<br>
you to get started and sort out specific questions regarding parts<br>
of the toolkit.<br>
<br>
What we really can't do is provide you with very basic<br>
high-performance computing training via the mailing list. This is<br>
because many if not most people on this list actually volunteer to<br>
help in their spare time and are not paid as consultants for general<br>
HPC questions. You are at Berkeley lab and there are many experts that<br>
can help you with basic HPC questions, plus there are tons of resources<br>
available on-line, that I would kindly ask you to consult first.<br>
<br>
Regarding your scaling question:<br>
<br>
<a href="https://support.scinet.utoronto.ca/wiki/index.php/Introduction_To_Performance" target="_blank">https://support.scinet.utoronto.ca/wiki/index.php/Introduction_To_Performance</a><br>
<br>
gives a good introduction to performance measurements. There are many<br>
more webpages like this available on the internet. The plot shown in<br>
the Einstein Toolkit paper (arXiv:1111.3344) is a weak scaling test.<br>
<br>
Best,<br>
<br>
- Christian Ott<br>
<div class="HOEnZb"><div class="h5"><br>
<br>
<br>
<br>
On Wed, Mar 28, 2012 at 11:55:30PM -0700, Jose Fiestas Iquira wrote:<br>
> Hello,<br>
><br>
> I reduced the simulation time by setting Cactus::cctk_final_time = .01 in<br>
> order to measure performance with CrayPat. It run only 8 iterations. I used<br>
> 16 and 24 cores for testing, and obtained almost the same performance<br>
> (~1310 sec. simulation time, and ~16MFlops).<br>
><br>
> It remembers me Fig.2 in the reference you sent<br>
> <a href="http://arxiv.org/abs/1111.3344" target="_blank">http://arxiv.org/abs/1111.3344</a><br>
><br>
> which I don't really understand. I would expect shorter times with larger<br>
> number of cores. Why does it not happen here?<br>
><br>
> I am using McLachlan to simulate a binary system. So, all my regards are<br>
> concerning this specific application. Do you think it will scale in the<br>
> sense that simulation time will be shorter, the larger of number of cores I<br>
> use?<br>
><br>
> Thanks,<br>
> Jose<br>
><br>
><br>
><br>
> On Wed, Mar 21, 2012 at 5:08 AM, Erik Schnetter <<a href="mailto:schnetter@cct.lsu.edu">schnetter@cct.lsu.edu</a>>wrote:<br>
><br>
> > On Tue, Mar 20, 2012 at 10:45 PM, Frank Loeffler <<a href="mailto:knarf@cct.lsu.edu">knarf@cct.lsu.edu</a>><br>
> > wrote:<br>
> > > Hi,<br>
> > ><br>
> > > On Tue, Mar 20, 2012 at 05:14:38PM -0700, Jose Fiestas Iquira wrote:<br>
> > >> Is there documentation about performance of Cactus ETK in large<br>
> > machines. I<br>
> > >> have some questions regarding best performance according to initial<br>
> > >> conditions, calculation time required, etc.<br>
> > ><br>
> > > Performance very much depends on the specific setup. One poorly scaling<br>
> > > function can ruin the otherwise best run.<br>
> > ><br>
> > >> If there are performance plots like Flops vs. Number of nodes would<br>
> > help me<br>
> > >> as well.<br>
> > ><br>
> > > Flops are very problem-dependent. There isn't such thing as flops/s for<br>
> > > Cactus, not even for one given machine. If we talk about the Einstein<br>
> > > equations and a typical production run I would expect a few percent of<br>
> > > the peak performance of any given CPU, as we are most of the time bound<br>
> > by<br>
> > > memory bandwidth.<br>
> ><br>
> > I would like to add some more numbers to Frank's description:<br>
> ><br>
> > One some problems (e.g. evaluating the BSSN equations with a<br>
> > higher-order stencil), I have measured more than 20% of the<br>
> > theoretical peak performance. The bottleneck seem to be L1 data cache<br>
> > accesses, because the BSSN equation kernels require a large number of<br>
> > local (temporary) variables.<br>
> ><br>
> > If you look for parallel scaling, then e.g.<br>
> > <<a href="http://arxiv.org/abs/1111.3344" target="_blank">http://arxiv.org/abs/1111.3344</a>> contains a scaling graph for the BSSN<br>
> > equations evolved with mesh refinement. This shows that, for this<br>
> > benchmark, the Einstein Toolkit scales well to more than 12k cores.<br>
> ><br>
> > -erik<br>
> ><br>
> > --<br>
> > Erik Schnetter <<a href="mailto:schnetter@cct.lsu.edu">schnetter@cct.lsu.edu</a>><br>
> > <a href="http://www.perimeterinstitute.ca/personal/eschnetter/" target="_blank">http://www.perimeterinstitute.ca/personal/eschnetter/</a><br>
> ><br>
<br>
</div></div><div class="HOEnZb"><div class="h5">> _______________________________________________<br>
> Users mailing list<br>
> <a href="mailto:Users@einsteintoolkit.org">Users@einsteintoolkit.org</a><br>
> <a href="http://lists.einsteintoolkit.org/mailman/listinfo/users" target="_blank">http://lists.einsteintoolkit.org/mailman/listinfo/users</a><br>
<br>
</div></div></blockquote></div><br></div>