[Users] Benchmarking results for McLachlan rewrite
Erik Schnetter
schnetter at cct.lsu.edu
Fri Jul 3 15:38:31 CDT 2015
I ran the Simfactory benchmark for ML_BSSN on both the current version and
the "rewrite" branch to see whether this branch is ready for production
use. I ran this benchmark on a single node of Shelob at LSU. In both cases,
using 2 OpenMP threads and 8 MPI processes per node was fastest, so I am
reporting these results below. Since I was interested in the performance of
McLachlan, this is a unigrid vacuum benchmark using fourth order
differencing.
One noteworthy difference is that dissipation as implemented in the
"rewrite" branch is finally approximately as fast as thorn Dissipation, and
I have thus used this option for the "rewrite" branch.
Here are the high-level results:
current: 3.03136e-06 sec per grid point
rewrite: 2.85734e-06 sec per grid point
That is, the rewrite branch is about 5% faster.
More detailed timing results for CCTK_EVOL are as follows:
Current:
----------------------------------------------------------------------------------------
Time Time Imblnc Timer
gettimeof getrusage
percent secs percent
secs secs
----------------------------------------------------------------------------------------
97.8% 373.1 0.0% |_CallEvol
373.1 738.5
97.8% 373.1 0.0% | |_CCTK_EVOL
373.1 738.4
97.8% 373.0 0.0% | | |_CallFunction
373 738.3
10.3% 39.3 36.9% | | | |_syncs
62.27 122.6
10.3% 39.3 36.9% | | | | |_Sync
62.25 122.6
2.1% 8.0 18.6% | | | | | |_comm_state[3].state_fill_sen
9.839 19.49
6.1% 23.2 46.3% | | | | | |_comm_state[6].state_do_some_
43.18 84.73
1.5% 5.5 1.8% | | | | | |_comm_state[7].state_empty_re
5.61 10.84
87.3% 333.2 6.4% | | | |_thorns
310.3 614.5
21.8% 83.0 3.2% | | | | |_ML_BSSN_Advect
84.45 168.8
8.6% 32.9 51.3% | | | | |_ML_BSSN_NewRad
0.0354 0.05899
14.1% 53.6 1.9% | | | | |_ML_BSSN_RHS1
54.68 109.1
16.0% 61.1 2.0% | | | | |_ML_BSSN_RHS2
62.35 124.5
5.1% 19.4 9.0% | | | | |_ML_BSSN_convertToADMBase
18.63 37.17
4.2% 16.1 3.1% | | | | |_ML_BSSN_convertToADMBaseDtLaps
16.11 31.98
2.4% 9.1 20.0% | | | | |_ML_BSSN_enforce
11.34 22.65
8.3% 31.7 15.3% | | | | |_MoL_Add
33.12 62.48
1.0% 3.8 48.9% | | | | |_ReflectionSymmetry_Apply
7.461 14.93
5.2% 19.8 3.1% | | | | |_dissipation_add
20.47 40.87
Rewrite:
--------------------------------------------------------------------------------------------------
Time Time Imblnc Timer
gettimeof getrusage
percent secs percent
secs secs
--------------------------------------------------------------------------------------------------
99.0% 356.1 0.0% |_CallEvol
356.1 703.8
98.9% 356.0 0.0% | |_CCTK_EVOL
356.1 703.8
98.9% 356.0 0.0% | | |_CallFunction
356 703.7
14.0% 50.4 37.3% | | | |_syncs
80.43 158.2
14.0% 50.4 37.3% | | | | |_Sync
80.42 158.2
2.1% 7.7 7.7% | | | | |
|_comm_state[3].state_fill_send_buffers. 8.341 16.51
9.7% 35.0 45.2% | | | | | |_comm_state[6].state_do_some_work.step
63.77 125.2
1.5% 5.6 2.5% | | | | |
|_comm_state[7].state_empty_recv_buffers 5.629 10.87
84.8% 305.1 9.9% | | | |_thorns
275.2 544.4
5.5% 19.8 7.1% | | | | |_ML_BSSN_ADMBaseEverywhere
19.58 39.15
4.4% 16.0 5.3% | | | | |_ML_BSSN_ADMBaseInterior
15.91 31.63
2.3% 8.3 16.6% | | | | |_ML_BSSN_EnforceEverywhere
10.01 20.02
10.8% 39.0 5.0% | | | | |_ML_BSSN_EvolutionInteriorSplitBy1
38.75 77.28
21.2% 76.4 4.6% | | | | |_ML_BSSN_EvolutionInteriorSplitBy2
75.82 151.5
23.3% 83.9 5.8% | | | | |_ML_BSSN_EvolutionInteriorSplitBy3
83.78 166.8
9.7% 35.0 48.9% | | | | |_ML_BSSN_NewRad
0.1266 0.156
6.0% 21.5 20.7% | | | | |_MoL_Add
23.12 43.16
0.9% 3.3 49.4% | | | | |_ReflectionSymmetry_Apply
6.508 12.97
As can be seen, the non-McLachlan numbers are comparable, although not
identical. This is to be expected. The main RHS evaluation is split over
several routines in each case; these are (RHS1, RHS2, Advect,
dissipation_add) for the current version and
(EvolutionInteriorSplitBy[1,2,3]) for the rewrite branch. The way in which
the RHS evaluation is actually split is of course different for both cases.
With these numbers in hand, I think we are ready to switch to the rewrite
branch.
-erik
--
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20150703/c9319bb7/attachment.html
More information about the Users
mailing list