[Users] Benchmarking results for McLachlan rewrite

Wed Jul 29 11:15:25 CDT 2015

On Fri, Jul 24, 2015 at 10:57 AM, Ian Hinder <ian.hinder at aei.mpg.de> wrote:

>
> On 8 Jul 2015, at 16:53, Ian Hinder <ian.hinder at aei.mpg.de> wrote:
>
>
> On 8 Jul 2015, at 15:14, Erik Schnetter <schnetter at cct.lsu.edu> wrote:
>
> I added a second benchmark, using a Thornburg04 patch system, 8th order
> finite differencing, and 4th order patch interpolation. The results are
>
> original: 8.53935e-06 sec
> rewrite:  8.55188e-06 sec
>
> this time with 1 thread per MPI process, since that was most efficient in
> both cases. Most of the time is spent in inter-patch interpolation, which
> is much more expensive than in a "regular" case since this benchmark is run
> on a single node and hence with very small grids.
>
> With these numbers under our belt, can we merge the rewrite branch?
>
>
> The "jacobian" benchmark that I gave you was still a pure kernel
> benchmark, involving no interpatch interpolation.  It just measured the
> speed of the RHSs when Jacobians were included.  I would also not use a
> single-threaded benchmark with very small grid sizes; this might have been
> fastest in this artificial case, but in practice I don't think we would use
> that configuration.  The benchmark you have now run seems to be more of a
> "complete system" benchmark, which is useful, but different.
>
> I think it is important that the kernel itself has not gotten slower, even
> if the kernel is not currently a major contributor to runtime.  We
> specifically split out the advection derivatives because they made the code
> with 8th order and Jacobians a fair bit slower.  I would just like to see
> that this is not still the case with the new version, which has changed the
> way this is handled.
>
>
> I have now run my benchmarks on both the original and the rewritten
> McLachlan.  I seem to find that the ML_BSSN_* functions in
> Evolve/CallEvol/CCTK_EVOL/CallFunction/thorns, excluding the constraint
> calculations, are between 11% and 15% slower with the rewrite branch,
> depending on the details of the evolution.  See attached plot.  This is on
> Datura with quite old CPUs (Intel Xeon CPU X5650 2.67GHz).
>

I just realized that you probably used the wrong rhs_evaluation method for
McLachlan. While improving performance, I implemented 3 different ways to
evaluate the RHS: (1) all in one routine, (2) split manually, and (3) split
semi-automatically by Kranc. (2) and (3) are identical for practical
purposes, and thus (2) should not be used. In my benchmarks, I always
explicitly specified (3). However, the default in McLachlan is still at
(1), and thus likely not as efficient as it should be.

The parameter setting ML_BSSN::rhs_evaluation = "splitBy" chooses (3).

I will soon push McLachlan changes to make (3) the default and to remove
(2).

-erik

-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20150729/5ea5deff/attachment.html