[Users] bug in Dissipation thorn?

Kentaro Takami kentaro.takami at aei.mpg.de
Wed Mar 6 11:45:17 CST 2013


Hi, all,


To Frank,

>> However using "-fp-model precise" in option list( i.e., this option is
>> used by all fortran77 code), the run will become much slow down.
>
> Out of interest: how much is "much"?

I didn't test it.
However the document( p11 in
http://software.intel.com/sites/default/files/article/164389/fp-consistency-102511.pdf
)
say that the performance is reduced by 12-15%.


To Peter,

> Just to clarify regarding the OpenMP parallelization. Even though it's
> supposed to be true that Fortran90 array constructs can be OpenMP
> parallelized using the !$OMP WORKSHARE directive, it turns out that some
> compilers (most notably the intel compilers) doesn't actually parallelize
> these constructs. Instead one processor does all the work. For that
> unfortunate reason the loop construct is much more efficient than the
> array construct on many machines.

Thank you for telling me about it.
I have never used "!$OMP WORKSHARE" directive.
So I tested this directive in the following performance test.


To Erik,

> Do you have performance data to support your statements
> that array expressions are faster than do loops (without OpenMP)?

Intel compiler document such as
http://www-h.eng.cam.ac.uk/help/languages/fortran/intelfortran/for_ug2.pdf ,
say "Rather than use explicit loops for array access, use elemental
array operations...(P20)".
So I thought that array expressions are faster than do-loops,
although I had never compared these.

I'm attaching data of performance test for attached par file.
Due to these data, both expressions are almost same.


> Do you also have performance data to support your statement that
> OpenMP is slowing things down in this case?

Please see the attached file.
Apparently the case (4) is the fastest, that is, OMP palatalization
shouldn't be used
for these small section.


> My personal hypothesis would be that one of the changes you introduce
> (remove do loops, remove OpenMP parallelization) inhibits some compiler
> optimisation and thus leads to more consistent results.

Already I mentioned,
"introducing a local array var, which is a copy of the input variable"
prevent from random results.


> Do you have two versions of the produced executable, one that produces
> random changes and one that doesn't? Could you make these available to me? I
> would like to compare the produced machine code to see the difference.

I guess you can access damiana at AEI.
You can use the following executable files in
"/lustre/datura/takami/Share_Dir/EXE__Test_Dissipation/".

For random results,
"cactus_GRHydro.F77--datura.OMP-NO_PRECISE-NO".

For consistent results,
"cactus_GRHydro.F77--datura.OMP-NO_PRECISE-YES".

Here, both are exactly same source programs and compile options
except for "-fp-model precise" option.

In order to compare the results, usually I use the following command:
$h5diff \
OUTPUT_DATA_1/CHECKPOINT/checkpoint.chkpt.it_5.file_0.h5 \
OUTPUT_DATA_2/CHECKPOINT/checkpoint.chkpt.it_5.file_0.h5


Kentaro TAKAMI
-------------- next part --------------
############################################################
###   < datura @ AEI >
###     * Using 1 node which has 12 core.
###     * No. of MPI procs = 3
###     * OMP_NUM_THREADS  = 4
###     * Compiler/intel/11.1.072/11.1.072
###     * mpi/openmpi/1.5.4-intel11.1.072
############################################################

### Thorn  | Scheduled routine in time bin  | gettimeofday [secs] | getrusage [secs] | cycle [secs] | cycle[avg] [secs] | cycle[stddev] [secs] | cycle[min] [secs] | cycle[max] [secs] 

### Using original "apply_dissipation.F77".
Dissipation     | Add Kreiss-Oliger dissipation to the rig|    11.96313300 |    47.47678600 | 31903408532.00000000 | 212689390.21333334 |   604164732.26355398 | 176702924.00000000 | 229004460.00000000

### Using "apply_dissipation.F90" with array style.
Dissipation     | Add Kreiss-Oliger dissipation to the rig|    54.57291800 |    54.53770000 | 145535541364.00000000 | 970236942.42666662 |  1292351841.91893029 | 896251884.00000000 | 896422760.00000000

### Using "apply_dissipation.F90" with do-loop style.
Dissipation     | Add Kreiss-Oliger dissipation to the rig|    20.46832500 |    78.85801400 | 54585152824.00000000 | 363901018.82666665 |   884053289.24521708 | 294528912.00000000 | 319096392.00000000

### Using "apply_dissipation.F90" with array style, but removing OMP directives in "apply_dissipation.F90".
Dissipation     | Add Kreiss-Oliger dissipation to the rig|    34.43200800 |    34.42277000 | 91824635792.00000000 | 612164238.61333334 |   929892716.40058255 | 529260276.00000000 | 823454040.00000000

############################################################
############################################################
-------------- next part --------------
############################################################
###   < datura @ AEI >
###     * Using 1 node which has 12 core.
###     * No. of MPI procs = 12
###     * OPENMP = no
###     * Compiler/intel/11.1.072/11.1.072
###     * mpi/openmpi/1.5.4-intel11.1.072
############################################################

### Thorn  | Scheduled routine in time bin  | gettimeofday [secs] | getrusage [secs] | cycle [secs] | cycle[avg] [secs] | cycle[stddev] [secs] | cycle[min] [secs] | cycle[max] [secs] 

### Using original "apply_dissipation.F77".
Dissipation  | Add Kreiss-Oliger dissipation to the rig|     9.00494700 |     8.99663300 | 24014389324.00000000 | 160095928.82666665 |   504044061.39257681 | 106947484.00000000 | 136139860.00000000

### Using "apply_dissipation.F90" with array style.
Dissipation  | Add Kreiss-Oliger dissipation to the rig|    12.45886200 |    12.45310200 | 33225261280.00000000 | 221501741.86666667 |   698063683.03449178 | 128566928.00000000 | 277494908.00000000

### Using "apply_dissipation.F90" with do-loop style.
Dissipation  | Add Kreiss-Oliger dissipation to the rig|    11.99245300 |    11.98918200 | 31981491072.00000000 | 213209940.47999999 |   846669851.43720722 | 128413964.00000000 | 199963072.00000000

############################################################
############################################################
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TOV__GRHydro.1Lev.par
Type: application/octet-stream
Size: 7596 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20130306/a66795b1/attachment.obj 


More information about the Users mailing list