[Users] scaling test for the einstein toolkit

Mon May 15 15:45:08 CDT 2017

Helvi, Miguel

I find that many (most)? users of the Einstein Toolkit define their problem
size based on physics and accuracy requirements, and then adapt the number
of nodes they uses to make this problem run most efficiently. This usually
requires a certain (fixed) amount of work per core. Correspondingly, an
informative benchmarking chart is a "weak scaling test". Strong scalability
is also interesting, but is more difficult to interpret in the presence of
adaptive mesh refinement with a complex system of equations since using too
few nodes leads to out-of-memory situations, and using too many nodes
quickly leads to inefficiencies if you use higher-order derivatives.

Efficiency depends also on the number of OpenMP threads vs. the number of
MPI processes, and (obviously) the amount of I/O you're doing, which is
governed by very different characteristics (the file system used, number of
file servers, etc.) We thus typically benchmark weak scalability on setups
that perform very little I/O, and where we exclude initial data generation.

When setting up a benchmark, it is important to monitor as many performance
characteristics as possible to ensure that one isn't limited by I/O, or by
any other "accidental" feature that could easily be removed in a production
simulation such as horizon finding, constraint evaluation, too frequent
regridding, etc.

It is surprisingly easy to accidentally enable a feature in the Einstein
Toolkit that adversely affects performance, in particular when running on
1000+ cores. Correspondingly, one has to take great care when defining a
benchmark that no such features are present. Timer output will be valuable
here to understand how much

A QC-0 setup should be a good test case if you want to study BBH scenarios.
Or maybe GW150914 <http://einsteintoolkit.org/gallery/bbh/index.html> would
be more interesting and relevant? You would obviously crank up the
resolution to turn this into a weak scaling test. In my experience, running
for a few minutes (less than ten) after initial data setup will suffice to
get good numbers.

The benchmarks I am running are usually simpler since I tend to focus on a
particular features that I want to optimize (e.g. RHS evaluation, grid
structure management). I think it's been some time since we ran
production-scale BBH benchmarks (with puncture tracking, regridding
tracking the black holes, etc.), I believe we focussed on hydrodynamics
benchmarks recently.

The final quantity against which I report performance is usually "grid
point updates per second", plotted against the number of nodes used. This
is the time required to evaluate the RHS for a single grid point, amortized
over and including all overhead such as scheduling, regridding,
prolongation, synchronization, etc. (Since this measures time, smaller
numbers are better.) On a fast machine, and if the overhead is low, this
number can be as low as a few microseconds. Mesh refinement and
parallelization will add a non-negligible overhead to this, and a weak
scaling test will show when the parallelization overhead becomes
prohibitive.

-erik

On Fri, May 12, 2017 at 11:41 AM, helvi witek <hwitek at icc.ub.edu> wrote:

> Hi Ian, Carlos,
>
> thanks a lot for your feedback.
> Together with Miguel Zilhao we are planning to perform these scaling tests
> using the Einstein Toolkit together with the McLachlan and/or our own
> evolution thorns on the Cosmos cluster at the University of Cambridge.
> If you are interested, we would be happy to make the results available to
> the community and to the public, e.g., on the wiki.
>
> We were thinking about evolving a head-on collision of two black holes to
> avoid contamination by the initial data construction. We would restrict the
> output to "carpet-timing..asc" and "AllTimers*". Would you recommend any
> other output to monitor the performance during these tests?
> Please let us know if you have any other comments.
>
> Best wishes,
> Helvi & Miguel
>
> ===========================================
> Dr. Helvi Witek
> Marie-Curie Research Fellow
> Dep. Fisica Quantica i Astrofisica & ICCUB
> Universitat de Barcelona
> ===========================================
>
> On Wed, May 10, 2017 at 4:40 PM, Carlos Lousto <colsma at rit.edu> wrote:
>
>> Agreed, but the available machines in Xsede change every 3 years or so.
>> It would be nice if we had a way to update/add to such "live" paper with a
>> supplementary repo(?)
>>
>> Carlos Lousto
>>
>> On May 10, 2017, at 10:29 AM, Ian Hinder <ian.hinder at aei.mpg.de> wrote:
>>
>>
>> On 10 May 2017, at 15:31, helvi witek <hwitek at icc.ub.edu> wrote:
>>
>> Hi everyone,
>>
>> we are going to apply for HPC time using the Lean code which is largely
>> based on the Einstein Toolkit. Among the required technical information are
>> scaling tests. While we will perform our own tests I also checked for
>> "official" information for the ET. I noticed that there is very little
>> public information, e.g. on the wiki, about recent (say, within the last
>> five years) scaling tests aside from Eloisa's recent paper
>> http://inspirehep.net/record/1492289
>> and this tracker
>> http://lists.einsteintoolkit.org/pipermail/users/2013-Februa
>> ry/002815.html
>>
>> Did I miss anything? It might be a good idea to add a standardized test
>> or references to the wiki page.
>>
>>
>> It would probably be very useful for many groups if we were to write a
>> short paper describing the scaling of the ET in various cases on current
>> HPC machines.  For someone running simulations very similar to those used,
>> referring to such a paper may be sufficient to demonstrate scaling of the
>> code for a proposal.  If the code used was quite different, then if the
>> parameter files and any required scripts from such a paper were made
>> public, it would be easier for each group to adapt them to their own code.
>>
>> --
>> Ian Hinder
>> http://members.aei.mpg.de/ianhin
>>
>> _______________________________________________
>> Users mailing list
>> Users at einsteintoolkit.org
>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>
>>
>
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
>
>

-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20170515/73e21872/attachment-0001.html