[Users] meeting minutes for 2017-03-20

Erik Schnetter schnetter at cct.lsu.edu
Mon Mar 20 15:26:19 CDT 2017


On Mon, Mar 20, 2017 at 11:19 AM, Roland Haas <rhaas at illinois.edu> wrote:

> Present: Frank, Eloisa, Erik, Steve, Roland, Peter, Vassili, Ian,
> Roberto
>
> Failing tests:
> * boostedpuncture failure likely due to code change
> * GRHydro tests shows different behaviour depending on options, the
>   test seems to fail only with new enough compiler, and -ffast-math
> * affected hydro tests are very sensitive to data, it may be sufficient
>   to use a mild increase in resolution
> * as a general approach we need to decide if we want -ffast-math or if
>   removing it does not affect results too much. It may be sufficient
>   for gcc to specify options that allow the same subset of fast-math
>   that icc uses by default
> * if fast-math is needed for good performance, then we have to relax
>   test constraints for these tests
> * action items:
> ** run production benchmark with and without -fast-math on a modern
> machine (Erik), use GW150914
>

Here are the results from Wheeler, after 20480 iterations (a few hours of
run time):

With -Ofast (current state):

 100.0%  12981.7    0.0%  Evolve
   1.298e+04  1.102e+04  3.245e+13  5.409e+11  5.926e+12  1.079e+05
 1.686e+12
   2.3%    298.5   13.6%  | | | | |_ML_BSSN_EvolutionInteriorSplitBy1
      264.3      262.5  6.609e+11  1.614e+07  2.598e+09  8.086e+06
 1.585e+08
   3.3%    434.5   15.9%  | | | | |_ML_BSSN_EvolutionInteriorSplitBy2
      370.3      368.9  9.259e+11   2.26e+07  4.156e+09  1.133e+07
 2.818e+08
   4.2%    543.7   24.0%  | | | | |_ML_BSSN_EvolutionInteriorSplitBy3
      428.3      426.6  1.071e+12  2.615e+07  4.976e+09  1.303e+07
 2.968e+08

With -O3 -fno-math-errno -fno-trapping-math -fno-rounding-math
-fno-signaling-nans -fcx-limited-range (only "harmless" optimizations):

 100.0%  12743.1    0.0%  Evolve
   1.274e+04  1.077e+04  3.186e+13   5.31e+11  5.817e+12  1.111e+05
 1.631e+12
   2.5%    316.3   13.4%  | | | | |_ML_BSSN_EvolutionInteriorSplitBy1
      293.9      291.5  7.349e+11  1.794e+07  3.155e+09  8.642e+06
 4.517e+08
   3.7%    477.0   15.3%  | | | | |_ML_BSSN_EvolutionInteriorSplitBy2
      413.6      412.3  1.034e+12  2.525e+07  4.774e+09  1.198e+07
 3.414e+08
   4.7%    594.7   23.0%  | | | | |_ML_BSSN_EvolutionInteriorSplitBy3
      490.4      489.2  1.226e+12  2.993e+07  5.635e+09  1.478e+07
 4.406e+08

This means:
- the RHS routines are about 8% slower
- the overall run time is 2% slower

The overall run times could also be affected by other random factors (e.g.
I/O, network bandwidth, etc.), but the pure RHS numbers should be correct.
However, since the overall run time increase is consistent with the RHS run
time increase (with the usual other costs, e.g. ADM variables, horizon
finding, mesh refinement, ...), I think they are reliable.

-erik

-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20170320/bc6cf532/attachment.html 


More information about the Users mailing list