[Users] bug in Dissipation thorn?
diener at cct.lsu.edu
Tue Mar 5 09:55:37 CST 2013
Just to clarify regarding the OpenMP parallelization. Even though it's
supposed to be true that Fortran90 array constructs can be OpenMP
parallelized using the !$OMP WORKSHARE directive, it turns out that some
compilers (most notably the intel compilers) doesn't actually parallelize
these constructs. Instead one processor does all the work. For that
unfortunate reason the loop construct is much more efficient than the
array construct on many machines.
On Tue, 5 Mar 2013, Erik Schnetter wrote:
> When applying performance optimisations, one has to be careful not to be
> guided by one's experience, but rather only by measurements. Things behave
> often in a quite surprising manner. Do you have performance data to support
> your statements that array expressions are faster than do loops (without
> OpenMP)? Do you also have performance data to support your statement that
> OpenMP is slowing things down in this case? With performance data I refer
> e.g. to Cactus timer output for dissipation for a reasonable setup, running
> on a reliable machine (e.g. an HPC node).
> My personal hypothesis would be that one of the changes you introduce
> (remove do loops, remove OpenMP parallelization) inhibits some compiler
> optimisation and thus leads to more consistent results.
> As Frank says, reduction operations are not an issue here. I don't see which
> compiler optimisations would lead to random differences.
> Do you have two versions of the produced executable, one that produces
> random changes and one that doesn't? Could you make these available to me? I
> would like to compare the produced machine code to see the difference.
> On Tue, Mar 5, 2013 at 3:33 AM, Kentaro Takami <kentaro.takami at aei.mpg.de>
> Hi, Erik,
> > I notice that you made several changes to the code when
> > (1) You introduce a local array var, which is a copy of the
> input variable
> > (2) You rewrote the do loops with array expressions
> > Both are not good for performance. The former is certainly
> slowing things
> > down, the second complicates an OpenMP parallelisation and
> makes the code
> > more difficult to read. I am therefore hesitant to apply
> these. Did you try
> > making only one of these changes, to see whether this would
> Maybe we need the changing point (1) to avoid random results,
> although the copy of array require additional cost (but this array
> copy is not
> so expensive compared with the do loop copy.
> For changing point (2), we can choose both array expression and do
> loop style.
> Unless using OpenMP, array expression style is faster than do loop
> because there is no cost of do loop over head, and compiler can
> optimize efficiently.
> However when we use OpenMP, I'm not clear which style is more
> The operations in dissipation equation are cheap and simple
> (so maybe the calculation efficiency is limited by data transfer from
> cache memory),
> while OpenMP require large overhead costs of OMP parallelization.
> Therefore I choose array expression style.
> > The document to which you pointed contains also the suggestion to
> use the
> > option -fp-model-precise. Did you try this?
> Yes, I tried to use this option.
> Then we can avoid random results even if we use original f77 code.
> > As a side remark, in Fortran 90 you can also use a select case
> > instead of if statements to choose the dissipation order.
> Oh, yes.
> We should use "select case", because "select case" is more efficient
> than "if"
> in recent compiler.
> Erik Schnetter <schnetter at cct.lsu.edu>
More information about the Users