<div dir="ltr">I ran the Simfactory benchmark for ML_BSSN on both the current version and the &quot;rewrite&quot; branch to see whether this branch is ready for production use. I ran this benchmark on a single node of Shelob at LSU. In both cases, using 2 OpenMP threads and 8 MPI processes per node was fastest, so I am reporting these results below. Since I was interested in the performance of McLachlan, this is a unigrid vacuum benchmark using fourth order differencing.<div><br></div><div>One noteworthy difference is that dissipation as implemented in the &quot;rewrite&quot; branch is finally approximately as fast as thorn Dissipation, and I have thus used this option for the &quot;rewrite&quot; branch.<br clear="all"><div><br></div><div>Here are the high-level results:</div><div><br></div><div>current: 3.03136e-06 sec per grid point</div><div>rewrite: 2.85734e-06 sec per grid point</div><div><br></div><div>That is, the rewrite branch is about 5% faster.</div><div><br></div><div><br></div><div><br></div><div>More detailed timing results for CCTK_EVOL are as follows:</div><div><br></div><div>Current:</div><div><br></div><div><p style="margin:0px;font-size:10px;font-family:Menlo">----------------------------------------------------------------------------------------</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">Time      Time   Imblnc   Timer                                     gettimeof  getrusage</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">percent   secs   percent                                                 secs       secs</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">----------------------------------------------------------------------------------------</p><p style="margin:0px;font-size:10px;font-family:Menlo">  97.8%    373.1    0.0%  |_CallEvol                                    373.1      738.5</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">  97.8%    373.1    0.0%  | |_CCTK_EVOL                                 373.1      738.4</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">  97.8%    373.0    0.0%  | | |_CallFunction                              373      738.3</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">  10.3%     39.3   36.9%  | | | |_syncs                                 62.27      122.6</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">  10.3%     39.3   36.9%  | | | | |_Sync                                62.25      122.6</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   2.1%      8.0   18.6%  | | | | | |_comm_state[3].state_fill_sen      9.839      19.49</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   6.1%     23.2   46.3%  | | | | | |_comm_state[6].state_do_some_      43.18      84.73</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   1.5%      5.5    1.8%  | | | | | |_comm_state[7].state_empty_re       5.61      10.84</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">  87.3%    333.2    6.4%  | | | |_thorns                                310.3      614.5</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">  21.8%     83.0    3.2%  | | | | |_ML_BSSN_Advect                      84.45      168.8</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   8.6%     32.9   51.3%  | | | | |_ML_BSSN_NewRad                     0.0354    0.05899</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">  14.1%     53.6    1.9%  | | | | |_ML_BSSN_RHS1                        54.68      109.1</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">  16.0%     61.1    2.0%  | | | | |_ML_BSSN_RHS2                        62.35      124.5</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   5.1%     19.4    9.0%  | | | | |_ML_BSSN_convertToADMBase            18.63      37.17</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   4.2%     16.1    3.1%  | | | | |_ML_BSSN_convertToADMBaseDtLaps      16.11      31.98</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   2.4%      9.1   20.0%  | | | | |_ML_BSSN_enforce                     11.34      22.65</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   8.3%     31.7   15.3%  | | | | |_MoL_Add                             33.12      62.48</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   1.0%      3.8   48.9%  | | | | |_ReflectionSymmetry_Apply            7.461      14.93</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   5.2%     19.8    3.1%  | | | | |_dissipation_add                     20.47      40.87</p></div><div><br></div><div><br></div><div><br></div><div>Rewrite:</div><div><br></div><div><p style="margin:0px;font-size:10px;font-family:Menlo">--------------------------------------------------------------------------------------------------</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">Time      Time   Imblnc   Timer                                               gettimeof  getrusage</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">percent   secs   percent                                                           secs       secs</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">--------------------------------------------------------------------------------------------------</p><p style="margin:0px;font-size:10px;font-family:Menlo">  99.0%    356.1    0.0%  |_CallEvol                                              356.1      703.8</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">  98.9%    356.0    0.0%  | |_CCTK_EVOL                                           356.1      703.8</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">  98.9%    356.0    0.0%  | | |_CallFunction                                        356      703.7</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">  14.0%     50.4   37.3%  | | | |_syncs                                           80.43      158.2</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">  14.0%     50.4   37.3%  | | | | |_Sync                                          80.42      158.2</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   2.1%      7.7    7.7%  | | | | | |_comm_state[3].state_fill_send_buffers.      8.341      16.51</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   9.7%     35.0   45.2%  | | | | | |_comm_state[6].state_do_some_work.step       63.77      125.2</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   1.5%      5.6    2.5%  | | | | | |_comm_state[7].state_empty_recv_buffers      5.629      10.87</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">  84.8%    305.1    9.9%  | | | |_thorns                                          275.2      544.4</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   5.5%     19.8    7.1%  | | | | |_ML_BSSN_ADMBaseEverywhere                     19.58      39.15</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   4.4%     16.0    5.3%  | | | | |_ML_BSSN_ADMBaseInterior                       15.91      31.63</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   2.3%      8.3   16.6%  | | | | |_ML_BSSN_EnforceEverywhere                     10.01      20.02</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">  10.8%     39.0    5.0%  | | | | |_ML_BSSN_EvolutionInteriorSplitBy1             38.75      77.28</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">  21.2%     76.4    4.6%  | | | | |_ML_BSSN_EvolutionInteriorSplitBy2             75.82      151.5</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">  23.3%     83.9    5.8%  | | | | |_ML_BSSN_EvolutionInteriorSplitBy3             83.78      166.8</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   9.7%     35.0   48.9%  | | | | |_ML_BSSN_NewRad                               0.1266      0.156</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   6.0%     21.5   20.7%  | | | | |_MoL_Add                                       23.12      43.16</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">   0.9%      3.3   49.4%  | | | | |_ReflectionSymmetry_Apply                      6.508      12.97</p></div><div><br></div><div><br></div><div><br></div><div>As can be seen, the non-McLachlan numbers are comparable, although not identical. This is to be expected. The main RHS evaluation is split over several routines in each case; these are (RHS1, RHS2, Advect, dissipation_add) for the current version and (EvolutionInteriorSplitBy[1,2,3]) for the rewrite branch. The way in which the RHS evaluation is actually split is of course different for both cases.</div><div><br></div><div>With these numbers in hand, I think we are ready to switch to the rewrite branch.</div><div><br></div><div>-erik</div><div><br></div>-- <br><div class="gmail_signature">Erik Schnetter &lt;<a href="mailto:schnetter@cct.lsu.edu" target="_blank">schnetter@cct.lsu.edu</a>&gt;<br><a href="http://www.perimeterinstitute.ca/personal/eschnetter/" target="_blank">http://www.perimeterinstitute.ca/personal/eschnetter/</a></div>
</div></div>