[Users] bug in Dissipation thorn?

Erik Schnetter schnetter at cct.lsu.edu
Tue Mar 5 06:41:03 CST 2013


Kentaro

When applying performance optimisations, one has to be careful not to be
guided by one's experience, but rather only by measurements. Things behave
often in a quite surprising manner. Do you have performance data to support
your statements that array expressions are faster than do loops (without
OpenMP)? Do you also have performance data to support your statement that
OpenMP is slowing things down in this case? With performance data I refer
e.g. to Cactus timer output for dissipation for a reasonable setup, running
on a reliable machine (e.g. an HPC node).

My personal hypothesis would be that one of the changes you introduce
(remove do loops, remove OpenMP parallelization) inhibits some compiler
optimisation and thus leads to more consistent results.

As Frank says, reduction operations are not an issue here. I don't see
which compiler optimisations would lead to random differences.

Do you have two versions of the produced executable, one that produces
random changes and one that doesn't? Could you make these available to me?
I would like to compare the produced machine code to see the difference.

-erik



On Tue, Mar 5, 2013 at 3:33 AM, Kentaro Takami <kentaro.takami at aei.mpg.de>wrote:

> Hi, Erik,
>
> > I notice that you made several changes to the code when converting:
> > (1) You introduce a local array var, which is a copy of the input
> variable
> > (2) You rewrote the do loops with array expressions
> >
> > Both are not good for performance. The former is certainly slowing things
> > down, the second complicates an OpenMP parallelisation and makes the code
> > more difficult to read. I am therefore hesitant to apply these. Did you
> try
> > making only one of these changes, to see whether this would suffice?
>
> Maybe we need the changing point (1) to avoid random results,
> although the copy of array require additional cost (but this array copy is
> not
> so expensive compared with the do loop copy.
>
> For changing point (2), we can choose both array expression and do loop
> style.
> Unless using OpenMP, array expression style is faster than do loop style,
> because there is no cost of do loop over head, and compiler can
> optimize efficiently.
> However when we use OpenMP, I'm not clear which style is more efficient.
> The operations in dissipation equation are cheap and simple
> (so maybe the calculation efficiency is limited by data transfer from
> cache memory),
> while OpenMP require large overhead costs of OMP parallelization.
> Therefore I choose array expression style.
>
>
> > The document to which you pointed contains also the suggestion to use the
> > option -fp-model-precise. Did you try this?
>
> Yes, I tried to use this option.
> Then we can avoid random results even if we use original f77 code.
>
>
> > As a side remark, in Fortran 90 you can also use a select case statement
> > instead of if statements to choose the dissipation order.
>
> Oh, yes.
> We should use "select case", because "select case" is more efficient than
> "if"
> in recent compiler.
>
>
> Kentaro
>



-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20130305/fe474ccd/attachment.html 


More information about the Users mailing list