[Users] compiler warning for McLachlan

Tue Sep 27 07:20:49 CDT 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 27 Sep 2011, at 06:04, Frank Loeffler wrote:

> Hi
> 
> On Mon, Sep 26, 2011 at 09:08:40PM -0500, Frank Loeffler wrote:
>> In addition, McLachlan seems to compile for a very long time and uses a
>> lot of memory (several GB per compiler process), for the Intel compiler.
> 
> Actually, it makes it unusable: even one compiler process now eats above
> 10GB of Ram, on a 8GB machine. Am I the only one with this problem? This
> is on my workstation (numrel07) where this used to work for quite a
> while.

Yes, McLachlan has changed.  Kranc can now select the finite difference operator based on a run-time parameter.  This eliminates the need for multiple versions of the McLachlan thorn.  You can just use ML_BSSN and set fdOrder to 2, 4, 6 or 8.  For compatibility purposes, for one release, we keep the existing thorns and set the default of fdOrder to the corresponding thorn order.  

All finite difference methods are compiled in, and selected with a switch statement in the inner loop.  This makes the inner loop much larger than it was before, which must be causing problems for the compiler (even though it should be able to tell that the separate branches of the switch statement are independent and can never be optimised together).  This should not have a run-time performance impact because the additional instructions are not executed, and Erik tested that the generated code didn't look any worse than before.  I also saw negligible changes in speed.

The reason that we didn't see any problems and you did is that we were using 

	VECTORISE_INLINE = no

as in datura.cfg, whereas the default is "yes", and the numrel.cfg file leaves it set to the default.  Leaving this set to "yes" means that the Vectors thorn will attempt to inline vectorised operations, and the Vectors thorn also uses this to tell Kranc to use inlining for the finite differencing operators rather than functions.  Using functions for the derivatives allows the code to fit into the instruction cache in many cases where it previously would not, for example when using 8th order with multipatch.  It is "yes" by default as one would expect function calls to be slower than inlining when instruction-cache misses are not relevant.

I can reproduce your problem.  I created a little wrapper for icpc:

icpc-time:
/usr/bin/time -f 'WALLTIME=%E s, MAXRSS=%M kB' /cluster/Compiler/Intel/11.1.072/bin/intel64/icpc "${@}"

and changed the optionlist to use this instead of icpc.  I then compiled ML_BSSN_O2 using this new optionlist.  With VECTORISE_INLINE = no (datura's default), I get

COMPILING /home/ianhin/Cactus/EinsteinToolkit/arrangements/McLachlan/ML_BSSN_O2/src/ML_BSSN_O2_RHS2.cc
WALLTIME=0:10.61 s, MAXRSS=811584 kB

i.e. it uses 800 MB to compile ML_BSSN_O2_RHS2.cc in 10 seconds.

I then set VECTORISE_INLINE = yes (the default), and ML_BSSN_O2_RHS2 printed the threshold warning that you saw and started taking many GB of RAM and didn't finish compiling after about 10 minutes.

So a fix to your problem is to set VECTORISE_INLINE = no in your optionlist and reconfigure.  Alternatively, you could temporarily remove the 4th, 6th and 8th order code from ML_BSSN_O2 by editing McLachlan/m/McLachlan_BSSN.m and modifying intParameters/fdOrder/AllowedValues to be {derivOrder} instead of {2,4,6,8}.  Then type "make McLachlan_BSSN.out" in the m directory to regenerate the thorn.  Then rebuild.

Erik: should we make VECTORISE_INLINE = no the default in Vectors?

- -- 
Ian Hinder
ian.hinder at aei.mpg.de

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAk6Bv6EACgkQF1LN8Zj+CejsmACeIFP9o4fI8uYPb+DOgW14vW2Z
uo4An2ZZxd/ZZ/Dk4Pvrc9jbYK41aSdp
=ORGu
-----END PGP SIGNATURE-----