[Users] Possible performance issue

Haas, Roland rhaas at illinois.edu
Fri Oct 4 09:35:53 CDT 2019


Hello Vaishak,

I do not see anything obviously wrong with the setup.

It uses 128 MPI ranks for the 4 nodes which fits with there being 2x16
cores per node. 

Lookin at the timer tree output at iteration 1024 (search for
"gettimeof " and you will find the spot) out of 5977s spend during
Evolve about 2143s were spent in "syncs" which is communication and
about the same amount of time in "thorns" that is doing computation.
While this ratio is not great (spending more time sending data than
doing computation) it is also not unheard of.

Getting the original output files for the gallery data from Zenodo
(link is on the gallery page):

wget
https://zenodo.org/record/155394/files/GW150914_28.tar.xz?download=1

you can see (in GW150914_28/output-0000/GW150914_28.out) that that one
took about 137s for syncs and 198s for thorns, so the same ratio but
about a factor of 10 faster.

I am reaching for straws here, but sometimes having too many MPI ranks
can be detrimental if there is not enough work to split up (OpenMP can
be a bit more forgiving in that respect, the original gallery run
used 120 cores on 10 nodes using 6 OpenMP threads per MPI rank).

Since each node has lots of RAM (more than the 96GB required to run the
simulation), can you try and see what would happen if you were to run
on only a single node?

Also if you could add the parameter:

Carpet::output_timers_every = 1024

then provide the files carpet-timing-statistics*.asc that would let us
know in even more detail where the time is spent.

Running for a short time (2048 iterations) is enough to get data to
compare.

Yours,
Roland

> Dear All,
> 
> I am running the simulation GW150914 using the parameter file available at
> the ETK gallery at (GW150914-ETK gallery
> <https://einsteintoolkit.org/gallery/bbh/index.html>) using 128 cores.
> 
> Each compute node consists of 2 X 16 Cores Intel SkyLake ( Intel(R) Xeon(R)
> Gold 6142 CPU @ 2.60GHz) and 384 GB RAM .  I have compiled and am running
> Einstein Toolkit without OpenMP and using mpich-3.3.1.
> 
> 
> The issue is that the simulation seems to be running at a very slow pace.
> The number of physical time per hour that it is completing is only about
> 1.3 units. At this rate to complete 1700 units, it would take about 54
> days, in contrast to 2.8 days on (Intel(R) Xeon(R) CPU E5-2630 v3 @
> 2.40GHz) as per the details at the example run of GW150914 available at the
> gallery (GW150914-ETK gallery
> <https://einsteintoolkit.org/gallery/bbh/index.html>).
> 
> I have also tried using intel mpi (impi) but with simular results.
> 
> I am also attaching the out file from the simulation.
> 
> Looking forward to your inputs.
> 
> 
> Thanks and regards,
> 
> 
> 
> 
> 
> Vaishak P
> 
> PhD Scholar,
> Shyama Prasad Mukherjee Fellow
> Inter-University Center for Astronomy and Astrophysics (IUCAA)
> Pune, India



-- 
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20191004/0038a58a/attachment.bin 


More information about the Users mailing list