[Users] Benchmarking

Wed May 3 15:59:16 CDT 2017

Bhavesh

To be exact, the remedy for this particular Slab error is not to use more
cores, but to use more MPI processes. You can keep the number of cores
constant if you reduce the number of OpenMP threads per MPI process.

Given that you are benchmarking, you should anyway experiment with these
parameters, as performance can crucially depend on them. Usually, using
fewer threads and more processes is more efficient for small core counts.

Finally, only comparing the overall run time is not sufficient to make a
statement about performance. Each run has several "tuning knobs", and
choosing the right values for these is important to achieve good
performance. Using the default settings will often lead to quite poor
performance. Cactus timer output as well as experience with performing runs
on HPC systems is indispensable to get good performance.

-erik

On Tue, May 2, 2017 at 5:09 PM, Khamesra, Bhavesh <
bhaveshkhamesra at gatech.edu> wrote:

> Hi, I have sent the pull request with the optionlist for Stampede - KNL on
> Bitbucket simfactory repo. I have tested this with a couple of thornlists
> including the einsteintoolkit.th and GW150914.th. This is still in
> experimental stage and so would be great if someone could also test it.
>
>
> Working on benchmarking the performance on Stampede KNL, I was able to do
> some test runs using the GW150914 simulation. However, I have been
> running into some issues with it.
>
>
> 1. I tried running QC0 simulation on both Stampede SandyBridge and KNL.
> While it runs fine on Stampede but it crashes on KNL with this error -
>
> while executing schedule bin BoundaryConditions, routine Rota
> tingSymmetry180::Rot180_ApplyBC in thorn RotatingSymmetry180, file
> /work/04082/tg833814/Cactus_ETK_dev/arrangements/CactusNumerical/
> RotatingSymmetry180/src/rotatingsymmetry180.c:460:
>
>   -> TAT/Slab can only be used if there is a single local component per
> MPI process
> TACC: MPI job exited with code: 134
> I looked up at previous tickets and found the solution to increase the
> number of cores. But if the same simulation can be run on stampede on 64
> cores, why does it require higher number of cores on KNL? Or is it some
> other issue?
>
> 2. I was able to run GW150914 on development queue (68 cores) and the
> speeds on Stampede were around 12.9M while that on KNL goes around 2.4M. To
> understand the reason for such small speeds, I tried running this on higher
> number of cores on  Stampede (128) and it runs at speed of around 20.9M (tested
> the run for 12 hours). However, on doing the same in normal queue in KNL,
> the simulation crashes after a couple of iterations on KNL with some
> segmentation fault error. Also, before crashing, the speed on KNL is around
> 4.2M. I have attached the error file of the simulation.
>
>
> Could someone please look at this? Let me know if you need any other
> information.
>
>
> Thanks
>
> .............................
>
> Bhavesh Khamesra
>
> Graduate Student
>
> Centre of Relativistic Astrophysics
>
> Georgia Institute of Technology
>
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
>
>

-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20170503/9f2f073e/attachment-0001.html