[Users] Benchmarking results for McLachlan rewrite

Fri Jul 3 15:38:31 CDT 2015

I ran the Simfactory benchmark for ML_BSSN on both the current version and
the "rewrite" branch to see whether this branch is ready for production
use. I ran this benchmark on a single node of Shelob at LSU. In both cases,
using 2 OpenMP threads and 8 MPI processes per node was fastest, so I am
reporting these results below. Since I was interested in the performance of
McLachlan, this is a unigrid vacuum benchmark using fourth order
differencing.

One noteworthy difference is that dissipation as implemented in the
"rewrite" branch is finally approximately as fast as thorn Dissipation, and
I have thus used this option for the "rewrite" branch.

Here are the high-level results:

current: 3.03136e-06 sec per grid point
rewrite: 2.85734e-06 sec per grid point

That is, the rewrite branch is about 5% faster.

More detailed timing results for CCTK_EVOL are as follows:

Current:

----------------------------------------------------------------------------------------

Time      Time   Imblnc   Timer
gettimeof  getrusage

percent   secs   percent
secs       secs

----------------------------------------------------------------------------------------

  97.8%    373.1    0.0%  |_CallEvol
373.1      738.5

  97.8%    373.1    0.0%  | |_CCTK_EVOL
373.1      738.4

  97.8%    373.0    0.0%  | | |_CallFunction
373      738.3

  10.3%     39.3   36.9%  | | | |_syncs
62.27      122.6

  10.3%     39.3   36.9%  | | | | |_Sync
62.25      122.6

   2.1%      8.0   18.6%  | | | | | |_comm_state[3].state_fill_sen
9.839      19.49

   6.1%     23.2   46.3%  | | | | | |_comm_state[6].state_do_some_
43.18      84.73

   1.5%      5.5    1.8%  | | | | | |_comm_state[7].state_empty_re
5.61      10.84

  87.3%    333.2    6.4%  | | | |_thorns
310.3      614.5

  21.8%     83.0    3.2%  | | | | |_ML_BSSN_Advect
84.45      168.8

   8.6%     32.9   51.3%  | | | | |_ML_BSSN_NewRad
0.0354    0.05899

  14.1%     53.6    1.9%  | | | | |_ML_BSSN_RHS1
54.68      109.1

  16.0%     61.1    2.0%  | | | | |_ML_BSSN_RHS2
62.35      124.5

   5.1%     19.4    9.0%  | | | | |_ML_BSSN_convertToADMBase
18.63      37.17

   4.2%     16.1    3.1%  | | | | |_ML_BSSN_convertToADMBaseDtLaps
16.11      31.98

   2.4%      9.1   20.0%  | | | | |_ML_BSSN_enforce
11.34      22.65

   8.3%     31.7   15.3%  | | | | |_MoL_Add
33.12      62.48

   1.0%      3.8   48.9%  | | | | |_ReflectionSymmetry_Apply
7.461      14.93

   5.2%     19.8    3.1%  | | | | |_dissipation_add
20.47      40.87

Rewrite:

--------------------------------------------------------------------------------------------------

Time      Time   Imblnc   Timer
  gettimeof  getrusage

percent   secs   percent
        secs       secs

--------------------------------------------------------------------------------------------------

  99.0%    356.1    0.0%  |_CallEvol
      356.1      703.8

  98.9%    356.0    0.0%  | |_CCTK_EVOL
      356.1      703.8

  98.9%    356.0    0.0%  | | |_CallFunction
        356      703.7

  14.0%     50.4   37.3%  | | | |_syncs
      80.43      158.2

  14.0%     50.4   37.3%  | | | | |_Sync
      80.42      158.2

   2.1%      7.7    7.7%  | | | | |
|_comm_state[3].state_fill_send_buffers.      8.341      16.51

   9.7%     35.0   45.2%  | | | | | |_comm_state[6].state_do_some_work.step
      63.77      125.2

   1.5%      5.6    2.5%  | | | | |
|_comm_state[7].state_empty_recv_buffers      5.629      10.87

  84.8%    305.1    9.9%  | | | |_thorns
      275.2      544.4

   5.5%     19.8    7.1%  | | | | |_ML_BSSN_ADMBaseEverywhere
      19.58      39.15

   4.4%     16.0    5.3%  | | | | |_ML_BSSN_ADMBaseInterior
      15.91      31.63

   2.3%      8.3   16.6%  | | | | |_ML_BSSN_EnforceEverywhere
      10.01      20.02

  10.8%     39.0    5.0%  | | | | |_ML_BSSN_EvolutionInteriorSplitBy1
      38.75      77.28

  21.2%     76.4    4.6%  | | | | |_ML_BSSN_EvolutionInteriorSplitBy2
      75.82      151.5

  23.3%     83.9    5.8%  | | | | |_ML_BSSN_EvolutionInteriorSplitBy3
      83.78      166.8

   9.7%     35.0   48.9%  | | | | |_ML_BSSN_NewRad
      0.1266      0.156

   6.0%     21.5   20.7%  | | | | |_MoL_Add
      23.12      43.16

   0.9%      3.3   49.4%  | | | | |_ReflectionSymmetry_Apply
      6.508      12.97

As can be seen, the non-McLachlan numbers are comparable, although not
identical. This is to be expected. The main RHS evaluation is split over
several routines in each case; these are (RHS1, RHS2, Advect,
dissipation_add) for the current version and
(EvolutionInteriorSplitBy[1,2,3]) for the rewrite branch. The way in which
the RHS evaluation is actually split is of course different for both cases.

With these numbers in hand, I think we are ready to switch to the rewrite
branch.

-erik

-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20150703/c9319bb7/attachment.html