<div dir="ltr">On Fri, Jul 24, 2015 at 10:57 AM, Ian Hinder <span dir="ltr">&lt;<a href="mailto:ian.hinder@aei.mpg.de" target="_blank">ian.hinder@aei.mpg.de</a>&gt;</span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><br><div><span class=""><div>On 8 Jul 2015, at 16:53, Ian Hinder &lt;<a href="mailto:ian.hinder@aei.mpg.de" target="_blank">ian.hinder@aei.mpg.de</a>&gt; wrote:</div><br><blockquote type="cite"><div style="word-wrap:break-word"><br><div><div>On 8 Jul 2015, at 15:14, Erik Schnetter &lt;<a href="mailto:schnetter@cct.lsu.edu" target="_blank">schnetter@cct.lsu.edu</a>&gt; wrote:</div><br><blockquote type="cite"><div dir="ltr">I added a second benchmark, using a Thornburg04 patch system, 8th order finite differencing, and 4th order patch interpolation. The results are<div><br></div><div><div style="margin:0px;font-size:10px;font-family:Menlo">original: 8.53935e-06 sec</div><div style="margin:0px;font-size:10px;font-family:Menlo">rewrite:  8.55188e-06 sec</div><div style="margin:0px;font-size:10px;font-family:Menlo"><br></div><div style="margin:0px;font-size:10px;font-family:Menlo"><span style="font-family:arial,sans-serif;font-size:small">this time with 1 thread per MPI process, since that was most efficient in both cases. Most of the time is spent in inter-patch interpolation, which is much more expensive than in a &quot;regular&quot; case since this benchmark is run on a single node and hence with very small grids.</span><br></div><div style="margin:0px;font-size:10px;font-family:Menlo"><span style="font-family:arial,sans-serif;font-size:small"><br></span></div><div style="margin:0px;font-size:10px;font-family:Menlo"><span style="font-family:arial,sans-serif;font-size:small">With these numbers under our belt, can we merge the rewrite branch?</span></div></div></div></blockquote><div><br></div><div>The &quot;jacobian&quot; benchmark that I gave you was still a pure kernel benchmark, involving no interpatch interpolation.  It just measured the speed of the RHSs when Jacobians were included.  I would also not use a single-threaded benchmark with very small grid sizes; this might have been fastest in this artificial case, but in practice I don&#39;t think we would use that configuration.  The benchmark you have now run seems to be more of a &quot;complete system&quot; benchmark, which is useful, but different.</div><div><br></div><div>I think it is important that the kernel itself has not gotten slower, even if the kernel is not currently a major contributor to runtime.  We specifically split out the advection derivatives because they made the code with 8th order and Jacobians a fair bit slower.  I would just like to see that this is not still the case with the new version, which has changed the way this is handled.</div></div></div></blockquote><div><br></div></span><div>I have now run my benchmarks on both the original and the rewritten McLachlan.  I seem to find that the ML_BSSN_* functions in</div><div>Evolve/CallEvol/CCTK_EVOL/CallFunction/thorns, excluding the constraint calculations, are between 11% and 15% slower with the rewrite branch, depending on the details of the evolution.  See attached plot.  This is on Datura with quite old CPUs (Intel Xeon CPU X5650 2.67GHz).</div></div></div></blockquote><div><br></div><div>I just realized that you probably used the wrong rhs_evaluation method for McLachlan. While improving performance, I implemented 3 different ways to evaluate the RHS: (1) all in one routine, (2) split manually, and (3) split semi-automatically by Kranc. (2) and (3) are identical for practical purposes, and thus (2) should not be used. In my benchmarks, I always explicitly specified (3). However, the default in McLachlan is still at (1), and thus likely not as efficient as it should be.</div><div><br></div><div>The parameter setting ML_BSSN::rhs_evaluation = &quot;splitBy&quot; chooses (3).</div><div><br></div><div>I will soon push McLachlan changes to make (3) the default and to remove (2).</div><div><br></div><div>-erik</div></div><div><br></div>-- <br><div class="gmail_signature">Erik Schnetter &lt;<a href="mailto:schnetter@cct.lsu.edu" target="_blank">schnetter@cct.lsu.edu</a>&gt;<br><a href="http://www.perimeterinstitute.ca/personal/eschnetter/" target="_blank">http://www.perimeterinstitute.ca/personal/eschnetter/</a></div>

</div></div>