[Users] Benchmarking results for McLachlan rewrite

Erik Schnetter schnetter at cct.lsu.edu
Fri Jul 24 16:01:14 CDT 2015


On Fri, Jul 24, 2015 at 3:43 PM, Ian Hinder <ian.hinder at aei.mpg.de> wrote:

>
> On 24 Jul 2015, at 20:39, Erik Schnetter <schnetter at cct.lsu.edu> wrote:
>
> On Fri, Jul 24, 2015 at 1:58 PM, Ian Hinder <ian.hinder at aei.mpg.de> wrote:
>
>>
>> On 24 Jul 2015, at 19:42, Erik Schnetter <schnetter at cct.lsu.edu> wrote:
>>
>> On Fri, Jul 24, 2015 at 1:39 PM, Ian Hinder <ian.hinder at aei.mpg.de>
>> wrote:
>>
>>>
>>> On 24 Jul 2015, at 19:15, Erik Schnetter <schnetter at cct.lsu.edu> wrote:
>>>
>>> On Fri, Jul 24, 2015 at 11:57 AM, Ian Hinder <ian.hinder at aei.mpg.de>
>>> wrote:
>>>
>>>>
>>>> On 8 Jul 2015, at 16:53, Ian Hinder <ian.hinder at aei.mpg.de> wrote:
>>>>
>>>>
>>>> On 8 Jul 2015, at 15:14, Erik Schnetter <schnetter at cct.lsu.edu> wrote:
>>>>
>>>> I added a second benchmark, using a Thornburg04 patch system, 8th order
>>>> finite differencing, and 4th order patch interpolation. The results are
>>>>
>>>> original: 8.53935e-06 sec
>>>> rewrite:  8.55188e-06 sec
>>>>
>>>> this time with 1 thread per MPI process, since that was most efficient
>>>> in both cases. Most of the time is spent in inter-patch interpolation,
>>>> which is much more expensive than in a "regular" case since this benchmark
>>>> is run on a single node and hence with very small grids.
>>>>
>>>> With these numbers under our belt, can we merge the rewrite branch?
>>>>
>>>>
>>>> The "jacobian" benchmark that I gave you was still a pure kernel
>>>> benchmark, involving no interpatch interpolation.  It just measured the
>>>> speed of the RHSs when Jacobians were included.  I would also not use a
>>>> single-threaded benchmark with very small grid sizes; this might have been
>>>> fastest in this artificial case, but in practice I don't think we would use
>>>> that configuration.  The benchmark you have now run seems to be more of a
>>>> "complete system" benchmark, which is useful, but different.
>>>>
>>>> I think it is important that the kernel itself has not gotten slower,
>>>> even if the kernel is not currently a major contributor to runtime.  We
>>>> specifically split out the advection derivatives because they made the code
>>>> with 8th order and Jacobians a fair bit slower.  I would just like to see
>>>> that this is not still the case with the new version, which has changed the
>>>> way this is handled.
>>>>
>>>>
>>>> I have now run my benchmarks on both the original and the rewritten
>>>> McLachlan.  I seem to find that the ML_BSSN_* functions in
>>>> Evolve/CallEvol/CCTK_EVOL/CallFunction/thorns, excluding the constraint
>>>> calculations, are between 11% and 15% slower with the rewrite branch,
>>>> depending on the details of the evolution.  See attached plot.  This is on
>>>> Datura with quite old CPUs (Intel Xeon CPU X5650 2.67GHz).
>>>>
>>>
>>> What exactly do you measure -- which bins or routines? Does this involve
>>> communication? Are you using thorn Dissipation?
>>>
>>>
>>> I take all the timers in Evolve/CallEvol/CCTK_EVOL/CallFunction/thorns
>>> that start with ML_BSSN_ and eliminate the ones containing "constraints"
>>> (case insensitive).  This is running on two processes, one node, 6 threads
>>> per node.  Threads are correctly bound to cores.  There is ghostzone
>>> exchange between the processes, so yes, there is communication in the
>>> ML_BSSN_SelectBCs SYNC calls, but it is node-local.
>>>
>>
>> Can you include thorn Dissipation in the "before" case, and use
>> McLachlan's dissipation in the "after" case?
>>
>>
>> There is no dissipation in either case.
>>
>> The output data is in
>>
>>
>> http://git.barrywardell.net/?p=McLachlanBenchmarks.git;h=refs/runs/orig/20150724-174334
>>
>> http://git.barrywardell.net/?p=McLachlanBenchmarks.git;h=refs/runs/rewrite/20150724-170542
>>
>> including the parameter files.
>>
>> Actually, what I said before was wrong; the timers I am using are under
>> "thorns", not "syncs", so even the node-local communication should not be
>> counted.
>>
>
> McLachlan has not been optimized for runs without dissipation. If you this
> this is important, then we can introduce a special case. I expect this to
> improve performance. However, running BSSN without dissipation is not what
> one would do in production, so I didn't investigate this case.
>
>
> I agree that runs without dissipation are not relevant, but since I
> usually use the Dissipation thorn, I didn't include it in the benchmark,
> which was a benchmark of McLachlan.  I assume that McLachlan now always
> calculates the dissipation term, even when it is zero, and that is what you
> mean by "not optimised"?  This will introduce a performance regression (if
> this is the reason for the increased benchmark time, then presumably only
> on the level of ~15% for the kernel, hence less for a whole simulation) for
> any simulation which uses dissipation from the Dissipation thorn.  Since
> McLachlan's dissipation was previously very slow, this is presumably what
> most existing parameter files use.
>
> Regarding switching to use McLachlan for dissipation: McLachlan's
> dissipation is a bit more limited than the Dissipation thorn; it looks like
> McLachlan is hard-coded to use dissipation of order 1+fdOrder, rather than
> the dissipation order being chosen separately.  Sometimes lower orders are
> used as an optimisation (the effect on convergence being judged to be
> minimal).  And actually, critically, there is no way to specify different
> dissipation orders on different refinement levels.  This is typically used
> in production binary simulations.
>

In other words, you are asking for a version of ML_BSSN where it is
efficient to not use dissipation. Currently, that means that dissipation is
disabled. The question is -- should this be the default?

Do you think it is faster to use dissipation from McLachlan than to use
> that provided by Dissipation?
>

Yes, I think so.

-erik

-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20150724/34603fc0/attachment.html 


More information about the Users mailing list