[Users] Benchmarking results for McLachlan rewrite

Ian Hinder ian.hinder at aei.mpg.de
Fri Jul 24 10:57:23 CDT 2015


On 8 Jul 2015, at 16:53, Ian Hinder <ian.hinder at aei.mpg.de> wrote:

> 
> On 8 Jul 2015, at 15:14, Erik Schnetter <schnetter at cct.lsu.edu> wrote:
> 
>> I added a second benchmark, using a Thornburg04 patch system, 8th order finite differencing, and 4th order patch interpolation. The results are
>> 
>> original: 8.53935e-06 sec
>> rewrite:  8.55188e-06 sec
>> 
>> this time with 1 thread per MPI process, since that was most efficient in both cases. Most of the time is spent in inter-patch interpolation, which is much more expensive than in a "regular" case since this benchmark is run on a single node and hence with very small grids.
>> 
>> With these numbers under our belt, can we merge the rewrite branch?
> 
> The "jacobian" benchmark that I gave you was still a pure kernel benchmark, involving no interpatch interpolation.  It just measured the speed of the RHSs when Jacobians were included.  I would also not use a single-threaded benchmark with very small grid sizes; this might have been fastest in this artificial case, but in practice I don't think we would use that configuration.  The benchmark you have now run seems to be more of a "complete system" benchmark, which is useful, but different.
> 
> I think it is important that the kernel itself has not gotten slower, even if the kernel is not currently a major contributor to runtime.  We specifically split out the advection derivatives because they made the code with 8th order and Jacobians a fair bit slower.  I would just like to see that this is not still the case with the new version, which has changed the way this is handled.

I have now run my benchmarks on both the original and the rewritten McLachlan.  I seem to find that the ML_BSSN_* functions in
Evolve/CallEvol/CCTK_EVOL/CallFunction/thorns, excluding the constraint calculations, are between 11% and 15% slower with the rewrite branch, depending on the details of the evolution.  See attached plot.  This is on Datura with quite old CPUs (Intel Xeon CPU X5650 2.67GHz).

-- 
Ian Hinder
http://members.aei.mpg.de/ianhin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20150724/8f36d9c1/attachment-0002.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-2.pdf
Type: application/pdf
Size: 40722 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20150724/8f36d9c1/attachment-0001.pdf 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20150724/8f36d9c1/attachment-0003.html 


More information about the Users mailing list