[Users] Benchmarking
Khamesra, Bhavesh
bhaveshkhamesra at gatech.edu
Fri May 5 10:37:09 CDT 2017
Hi Erik, Thanks for the reply. I tried playing with num-threads options in machine files and was able to run the QC0 on development node. Reducing the num-threads to 17 keeping number of cores to 64 showed some increase in the speed but it is still quite low - around 13-16M/hour compared the 55-65 M/hour in stampede. For GW150914, the speed on KNL is around 3.5-4M/hour compared to 12M/hour on Stampede. I also briefly looked at TimerReport but any particular thorn did not stand out. I will study it in more detail.
In general, how can I find the optimized values of 'turning knobs' (except trial and error method) and what are the constraints on them? What are the general options/parameters I can change to boost up the performance? I also had several questions about various options in machine files and about optimization and MPI in general. Can you suggest some reference where I can read more about this?
Lastly, the crashing the GW150914 in normal queue doesn't seem to be due to this reason (but I may be wrong). The error file shows segmentation fault errors. I was browsing through the past tickets and found that you had also encountered a similar segfault issue on KNL. Were you able to resolve it? I am attaching the error file, could you please look at it?
Thanks
.............................
Bhavesh Khamesra
Graduate Student
Centre of Relativistic Astrophysics
Georgia Institute of Technology
________________________________
From: schnetter at gmail.com <schnetter at gmail.com> on behalf of Erik Schnetter <schnetter at cct.lsu.edu>
Sent: Wednesday, May 3, 2017 4:59:16 PM
To: Khamesra, Bhavesh
Cc: users at einsteintoolkit.org
Subject: Re: [Users] Benchmarking
Bhavesh
To be exact, the remedy for this particular Slab error is not to use more cores, but to use more MPI processes. You can keep the number of cores constant if you reduce the number of OpenMP threads per MPI process.
Given that you are benchmarking, you should anyway experiment with these parameters, as performance can crucially depend on them. Usually, using fewer threads and more processes is more efficient for small core counts.
Finally, only comparing the overall run time is not sufficient to make a statement about performance. Each run has several "tuning knobs", and choosing the right values for these is important to achieve good performance. Using the default settings will often lead to quite poor performance. Cactus timer output as well as experience with performing runs on HPC systems is indispensable to get good performance.
-erik
On Tue, May 2, 2017 at 5:09 PM, Khamesra, Bhavesh <bhaveshkhamesra at gatech.edu<mailto:bhaveshkhamesra at gatech.edu>> wrote:
Hi, I have sent the pull request with the optionlist for Stampede - KNL on Bitbucket simfactory repo. I have tested this with a couple of thornlists including the einsteintoolkit.th<http://einsteintoolkit.th> and GW150914.th. This is still in experimental stage and so would be great if someone could also test it.
Working on benchmarking the performance on Stampede KNL, I was able to do some test runs using the GW150914 simulation. However, I have been running into some issues with it.
1. I tried running QC0 simulation on both Stampede SandyBridge and KNL. While it runs fine on Stampede but it crashes on KNL with this error -
while executing schedule bin BoundaryConditions, routine RotatingSymmetry180::Rot180_ApplyBC in thorn RotatingSymmetry180, file /work/04082/tg833814/Cactus_ETK_dev/arrangements/CactusNumerical/RotatingSymmetry180/src/rotatingsymmetry180.c:460:
-> TAT/Slab can only be used if there is a single local component per MPI process
TACC: MPI job exited with code: 134
I looked up at previous tickets and found the solution to increase the number of cores. But if the same simulation can be run on stampede on 64 cores, why does it require higher number of cores on KNL? Or is it some other issue?
2. I was able to run GW150914 on development queue (68 cores) and the speeds on Stampede were around 12.9M while that on KNL goes around 2.4M. To understand the reason for such small speeds, I tried running this on higher number of cores on Stampede (128) and it runs at speed of around 20.9M (tested the run for 12 hours). However, on doing the same in normal queue in KNL, the simulation crashes after a couple of iterations on KNL with some segmentation fault error. Also, before crashing, the speed on KNL is around 4.2M. I have attached the error file of the simulation.
Could someone please look at this? Let me know if you need any other information.
Thanks
.............................
Bhavesh Khamesra
Graduate Student
Centre of Relativistic Astrophysics
Georgia Institute of Technology
_______________________________________________
Users mailing list
Users at einsteintoolkit.org<mailto:Users at einsteintoolkit.org>
http://lists.einsteintoolkit.org/mailman/listinfo/users
--
Erik Schnetter <schnetter at cct.lsu.edu<mailto:schnetter at cct.lsu.edu>>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20170505/22eb6aa5/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Benchmarking_ETK_GW150914_KNL.err
Type: application/octet-stream
Size: 25582 bytes
Desc: Benchmarking_ETK_GW150914_KNL.err
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20170505/22eb6aa5/attachment-0001.obj
More information about the Users
mailing list