[Users] ET on KNL.

Eloisa Bentivegna eloisa.bentivegna at ct.infn.it
Wed Mar 1 06:04:43 CST 2017


On 28/02/17 23:17, David Radice wrote:
> Hello Eloisa,
> 
> sorry for the delay in the reply. For the records I did manage to
> compile and run ET on KNL (stampede), but I did not manage to run any
> benchmark with it yet. The current status is:
> 
> * intel-17: the compiler fails to compile Carpet and either gives an
> internal error or segfaults. * gcc-6.3: used to compile and run with
> Erik's spack installation (it is currently broken). I did not really
> manage to benchmark it since even a low-resolution TOV test did not
> run to completion (meaning less than 4 coarse grid steps) within 30
> minutes on 4 nodes.
> 
> This was using the current stable release of the ET (2016-11) and
> WhiskyTHC. You might have more luck with GRHydro / pure-vacuum runs.

Hi David and all,

thanks for all the help. It turned out that consolidating my
configuration made things significantly better: I was using Intel 16 (to
avoid the Carpet problem with Intel 17) along with a strange mix of
libraries (mostly compiled with Intel 17, and the only available on
Marconi), and that seemed to impact the performance quite strongly. With
everything Intel 17 (and using -no-vec on bbox.cc), I now obtain a
runspeed on a Marconi KNL node which is around 80% of a Xeon E5 v4.

There are still some puzzling features, though. One is that using
-no-vec, along with the settings:

VECTORISE			= no
VECTORISE_ALIGNED_ARRAYS	= no
VECTORISE_INLINE		= no
VECTORISE_ALIGN_FOR_CACHE	= no
VECTORISE_ALIGN_INTERIOR	= no

in my optionlist, I obtain essentially the same throughput. This is a
vacuum McLachlan run with very little else turned on (but I can run a
QC0 benchmark for definiteness, if people are interested). I too am
using the November release.

Second, hyperthreading decreases the runspeed significantly. I am using
272 threads on the 68-core KNL, and for what I can gather from the
Carpet output, all of the cores are engaged. More cores are reported,
however, than available on the node:

INFO (Carpet): MPI is enabled
INFO (Carpet): Carpet is running on 1 processes
INFO (Carpet): This is process 0
INFO (Carpet): OpenMP is enabled
INFO (Carpet): This process contains 272 threads, this is thread 0
INFO (Carpet): There are 272 threads in total
INFO (Carpet): There are 272 threads per process
INFO (Carpet): This process runs on host r098c04s01, pid=2465840
INFO (Carpet): This process runs on 272 cores: 0-271
INFO (Carpet): Thread 0 runs on 1 core: 0
INFO (Carpet): Thread 1 runs on 1 core: 68
INFO (Carpet): Thread 2 runs on 1 core: 136
INFO (Carpet): Thread 3 runs on 1 core: 204
…

Notice that I am requesting hyperthreading by using num-smt=4 and
num-threads=272. Is this correct?

Thanks again,
Eloisa


More information about the Users mailing list