<div dir="ltr">Kentaro<div><br></div><div>When applying performance optimisations, one has to be careful not to be guided by one's experience, but rather only by measurements. Things behave often in a quite surprising manner. Do you have performance data to support your statements that array expressions are faster than do loops (without OpenMP)? Do you also have performance data to support your statement that OpenMP is slowing things down in this case? With performance data I refer e.g. to Cactus timer output for dissipation for a reasonable setup, running on a reliable machine (e.g. an HPC node).</div>
<div><br></div><div style>My personal hypothesis would be that one of the changes you introduce (remove do loops, remove OpenMP parallelization) inhibits some compiler optimisation and thus leads to more consistent results.</div>
<div style><br></div><div style>As Frank says, reduction operations are not an issue here. I don't see which compiler optimisations would lead to random differences.</div><div style><br></div><div style>Do you have two versions of the produced executable, one that produces random changes and one that doesn't? Could you make these available to me? I would like to compare the produced machine code to see the difference.</div>
<div style><br></div><div style>-erik</div><div style><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Mar 5, 2013 at 3:33 AM, Kentaro Takami <span dir="ltr"><<a href="mailto:kentaro.takami@aei.mpg.de" target="_blank">kentaro.takami@aei.mpg.de</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi, Erik,<br>
<div class="im"><br>
> I notice that you made several changes to the code when converting:<br>
> (1) You introduce a local array var, which is a copy of the input variable<br>
> (2) You rewrote the do loops with array expressions<br>
><br>
> Both are not good for performance. The former is certainly slowing things<br>
> down, the second complicates an OpenMP parallelisation and makes the code<br>
> more difficult to read. I am therefore hesitant to apply these. Did you try<br>
> making only one of these changes, to see whether this would suffice?<br>
<br>
</div>Maybe we need the changing point (1) to avoid random results,<br>
although the copy of array require additional cost (but this array copy is not<br>
so expensive compared with the do loop copy.<br>
<br>
For changing point (2), we can choose both array expression and do loop style.<br>
Unless using OpenMP, array expression style is faster than do loop style,<br>
because there is no cost of do loop over head, and compiler can<br>
optimize efficiently.<br>
However when we use OpenMP, I'm not clear which style is more efficient.<br>
The operations in dissipation equation are cheap and simple<br>
(so maybe the calculation efficiency is limited by data transfer from<br>
cache memory),<br>
while OpenMP require large overhead costs of OMP parallelization.<br>
Therefore I choose array expression style.<br>
<div class="im"><br>
<br>
> The document to which you pointed contains also the suggestion to use the<br>
> option -fp-model-precise. Did you try this?<br>
<br>
</div>Yes, I tried to use this option.<br>
Then we can avoid random results even if we use original f77 code.<br>
<div class="im"><br>
<br>
> As a side remark, in Fortran 90 you can also use a select case statement<br>
> instead of if statements to choose the dissipation order.<br>
<br>
</div>Oh, yes.<br>
We should use "select case", because "select case" is more efficient than "if"<br>
in recent compiler.<br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
Kentaro<br>
</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br>Erik Schnetter <<a href="mailto:schnetter@cct.lsu.edu" target="_blank">schnetter@cct.lsu.edu</a>><br><a href="http://www.perimeterinstitute.ca/personal/eschnetter/" target="_blank">http://www.perimeterinstitute.ca/personal/eschnetter/</a>
</div>