[Users] Benchmarking results for McLachlan rewrite

Erik Schnetter schnetter at cct.lsu.edu
Fri Jul 24 13:39:25 CDT 2015


On Fri, Jul 24, 2015 at 1:58 PM, Ian Hinder <ian.hinder at aei.mpg.de> wrote:

>
> On 24 Jul 2015, at 19:42, Erik Schnetter <schnetter at cct.lsu.edu> wrote:
>
> On Fri, Jul 24, 2015 at 1:39 PM, Ian Hinder <ian.hinder at aei.mpg.de> wrote:
>
>>
>> On 24 Jul 2015, at 19:15, Erik Schnetter <schnetter at cct.lsu.edu> wrote:
>>
>> On Fri, Jul 24, 2015 at 11:57 AM, Ian Hinder <ian.hinder at aei.mpg.de>
>> wrote:
>>
>>>
>>> On 8 Jul 2015, at 16:53, Ian Hinder <ian.hinder at aei.mpg.de> wrote:
>>>
>>>
>>> On 8 Jul 2015, at 15:14, Erik Schnetter <schnetter at cct.lsu.edu> wrote:
>>>
>>> I added a second benchmark, using a Thornburg04 patch system, 8th order
>>> finite differencing, and 4th order patch interpolation. The results are
>>>
>>> original: 8.53935e-06 sec
>>> rewrite:  8.55188e-06 sec
>>>
>>> this time with 1 thread per MPI process, since that was most efficient
>>> in both cases. Most of the time is spent in inter-patch interpolation,
>>> which is much more expensive than in a "regular" case since this benchmark
>>> is run on a single node and hence with very small grids.
>>>
>>> With these numbers under our belt, can we merge the rewrite branch?
>>>
>>>
>>> The "jacobian" benchmark that I gave you was still a pure kernel
>>> benchmark, involving no interpatch interpolation.  It just measured the
>>> speed of the RHSs when Jacobians were included.  I would also not use a
>>> single-threaded benchmark with very small grid sizes; this might have been
>>> fastest in this artificial case, but in practice I don't think we would use
>>> that configuration.  The benchmark you have now run seems to be more of a
>>> "complete system" benchmark, which is useful, but different.
>>>
>>> I think it is important that the kernel itself has not gotten slower,
>>> even if the kernel is not currently a major contributor to runtime.  We
>>> specifically split out the advection derivatives because they made the code
>>> with 8th order and Jacobians a fair bit slower.  I would just like to see
>>> that this is not still the case with the new version, which has changed the
>>> way this is handled.
>>>
>>>
>>> I have now run my benchmarks on both the original and the rewritten
>>> McLachlan.  I seem to find that the ML_BSSN_* functions in
>>> Evolve/CallEvol/CCTK_EVOL/CallFunction/thorns, excluding the constraint
>>> calculations, are between 11% and 15% slower with the rewrite branch,
>>> depending on the details of the evolution.  See attached plot.  This is on
>>> Datura with quite old CPUs (Intel Xeon CPU X5650 2.67GHz).
>>>
>>
>> What exactly do you measure -- which bins or routines? Does this involve
>> communication? Are you using thorn Dissipation?
>>
>>
>> I take all the timers in Evolve/CallEvol/CCTK_EVOL/CallFunction/thorns
>> that start with ML_BSSN_ and eliminate the ones containing "constraints"
>> (case insensitive).  This is running on two processes, one node, 6 threads
>> per node.  Threads are correctly bound to cores.  There is ghostzone
>> exchange between the processes, so yes, there is communication in the
>> ML_BSSN_SelectBCs SYNC calls, but it is node-local.
>>
>
> Can you include thorn Dissipation in the "before" case, and use
> McLachlan's dissipation in the "after" case?
>
>
> There is no dissipation in either case.
>
> The output data is in
>
>
> http://git.barrywardell.net/?p=McLachlanBenchmarks.git;h=refs/runs/orig/20150724-174334
>
> http://git.barrywardell.net/?p=McLachlanBenchmarks.git;h=refs/runs/rewrite/20150724-170542
>
> including the parameter files.
>
> Actually, what I said before was wrong; the timers I am using are under
> "thorns", not "syncs", so even the node-local communication should not be
> counted.
>

McLachlan has not been optimized for runs without dissipation. If you this
this is important, then we can introduce a special case. I expect this to
improve performance. However, running BSSN without dissipation is not what
one would do in production, so I didn't investigate this case.

-erik

-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20150724/5866e415/attachment.html 


More information about the Users mailing list