<div dir="ltr">I ran the Simfactory benchmark for ML_BSSN on both the current version and the "rewrite" branch to see whether this branch is ready for production use. I ran this benchmark on a single node of Shelob at LSU. In both cases, using 2 OpenMP threads and 8 MPI processes per node was fastest, so I am reporting these results below. Since I was interested in the performance of McLachlan, this is a unigrid vacuum benchmark using fourth order differencing.<div><br></div><div>One noteworthy difference is that dissipation as implemented in the "rewrite" branch is finally approximately as fast as thorn Dissipation, and I have thus used this option for the "rewrite" branch.<br clear="all"><div><br></div><div>Here are the high-level results:</div><div><br></div><div>current: 3.03136e-06 sec per grid point</div><div>rewrite: 2.85734e-06 sec per grid point</div><div><br></div><div>That is, the rewrite branch is about 5% faster.</div><div><br></div><div><br></div><div><br></div><div>More detailed timing results for CCTK_EVOL are as follows:</div><div><br></div><div>Current:</div><div><br></div><div><p style="margin:0px;font-size:10px;font-family:Menlo">----------------------------------------------------------------------------------------</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">Time Time Imblnc Timer gettimeof getrusage</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">percent secs percent secs secs</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">----------------------------------------------------------------------------------------</p><p style="margin:0px;font-size:10px;font-family:Menlo"> 97.8% 373.1 0.0% |_CallEvol 373.1 738.5</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 97.8% 373.1 0.0% | |_CCTK_EVOL 373.1 738.4</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 97.8% 373.0 0.0% | | |_CallFunction 373 738.3</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 10.3% 39.3 36.9% | | | |_syncs 62.27 122.6</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 10.3% 39.3 36.9% | | | | |_Sync 62.25 122.6</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 2.1% 8.0 18.6% | | | | | |_comm_state[3].state_fill_sen 9.839 19.49</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 6.1% 23.2 46.3% | | | | | |_comm_state[6].state_do_some_ 43.18 84.73</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 1.5% 5.5 1.8% | | | | | |_comm_state[7].state_empty_re 5.61 10.84</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 87.3% 333.2 6.4% | | | |_thorns 310.3 614.5</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 21.8% 83.0 3.2% | | | | |_ML_BSSN_Advect 84.45 168.8</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 8.6% 32.9 51.3% | | | | |_ML_BSSN_NewRad 0.0354 0.05899</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 14.1% 53.6 1.9% | | | | |_ML_BSSN_RHS1 54.68 109.1</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 16.0% 61.1 2.0% | | | | |_ML_BSSN_RHS2 62.35 124.5</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 5.1% 19.4 9.0% | | | | |_ML_BSSN_convertToADMBase 18.63 37.17</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 4.2% 16.1 3.1% | | | | |_ML_BSSN_convertToADMBaseDtLaps 16.11 31.98</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 2.4% 9.1 20.0% | | | | |_ML_BSSN_enforce 11.34 22.65</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 8.3% 31.7 15.3% | | | | |_MoL_Add 33.12 62.48</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 1.0% 3.8 48.9% | | | | |_ReflectionSymmetry_Apply 7.461 14.93</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 5.2% 19.8 3.1% | | | | |_dissipation_add 20.47 40.87</p></div><div><br></div><div><br></div><div><br></div><div>Rewrite:</div><div><br></div><div><p style="margin:0px;font-size:10px;font-family:Menlo">--------------------------------------------------------------------------------------------------</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">Time Time Imblnc Timer gettimeof getrusage</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">percent secs percent secs secs</p>
<p style="margin:0px;font-size:10px;font-family:Menlo">--------------------------------------------------------------------------------------------------</p><p style="margin:0px;font-size:10px;font-family:Menlo"> 99.0% 356.1 0.0% |_CallEvol 356.1 703.8</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 98.9% 356.0 0.0% | |_CCTK_EVOL 356.1 703.8</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 98.9% 356.0 0.0% | | |_CallFunction 356 703.7</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 14.0% 50.4 37.3% | | | |_syncs 80.43 158.2</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 14.0% 50.4 37.3% | | | | |_Sync 80.42 158.2</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 2.1% 7.7 7.7% | | | | | |_comm_state[3].state_fill_send_buffers. 8.341 16.51</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 9.7% 35.0 45.2% | | | | | |_comm_state[6].state_do_some_work.step 63.77 125.2</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 1.5% 5.6 2.5% | | | | | |_comm_state[7].state_empty_recv_buffers 5.629 10.87</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 84.8% 305.1 9.9% | | | |_thorns 275.2 544.4</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 5.5% 19.8 7.1% | | | | |_ML_BSSN_ADMBaseEverywhere 19.58 39.15</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 4.4% 16.0 5.3% | | | | |_ML_BSSN_ADMBaseInterior 15.91 31.63</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 2.3% 8.3 16.6% | | | | |_ML_BSSN_EnforceEverywhere 10.01 20.02</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 10.8% 39.0 5.0% | | | | |_ML_BSSN_EvolutionInteriorSplitBy1 38.75 77.28</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 21.2% 76.4 4.6% | | | | |_ML_BSSN_EvolutionInteriorSplitBy2 75.82 151.5</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 23.3% 83.9 5.8% | | | | |_ML_BSSN_EvolutionInteriorSplitBy3 83.78 166.8</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 9.7% 35.0 48.9% | | | | |_ML_BSSN_NewRad 0.1266 0.156</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 6.0% 21.5 20.7% | | | | |_MoL_Add 23.12 43.16</p>
<p style="margin:0px;font-size:10px;font-family:Menlo"> 0.9% 3.3 49.4% | | | | |_ReflectionSymmetry_Apply 6.508 12.97</p></div><div><br></div><div><br></div><div><br></div><div>As can be seen, the non-McLachlan numbers are comparable, although not identical. This is to be expected. The main RHS evaluation is split over several routines in each case; these are (RHS1, RHS2, Advect, dissipation_add) for the current version and (EvolutionInteriorSplitBy[1,2,3]) for the rewrite branch. The way in which the RHS evaluation is actually split is of course different for both cases.</div><div><br></div><div>With these numbers in hand, I think we are ready to switch to the rewrite branch.</div><div><br></div><div>-erik</div><div><br></div>-- <br><div class="gmail_signature">Erik Schnetter <<a href="mailto:schnetter@cct.lsu.edu" target="_blank">schnetter@cct.lsu.edu</a>><br><a href="http://www.perimeterinstitute.ca/personal/eschnetter/" target="_blank">http://www.perimeterinstitute.ca/personal/eschnetter/</a></div>
</div></div>