[Users] ET on KNL.

Erik Schnetter schnetter at cct.lsu.edu
Wed Mar 1 08:07:18 CST 2017


On Wed, Mar 1, 2017 at 7:04 AM, Eloisa Bentivegna <
eloisa.bentivegna at ct.infn.it> wrote:

> On 28/02/17 23:17, David Radice wrote:
> > Hello Eloisa,
> >
> > sorry for the delay in the reply. For the records I did manage to
> > compile and run ET on KNL (stampede), but I did not manage to run any
> > benchmark with it yet. The current status is:
> >
> > * intel-17: the compiler fails to compile Carpet and either gives an
> > internal error or segfaults. * gcc-6.3: used to compile and run with
> > Erik's spack installation (it is currently broken). I did not really
> > manage to benchmark it since even a low-resolution TOV test did not
> > run to completion (meaning less than 4 coarse grid steps) within 30
> > minutes on 4 nodes.
> >
> > This was using the current stable release of the ET (2016-11) and
> > WhiskyTHC. You might have more luck with GRHydro / pure-vacuum runs.
>
> Hi David and all,
>
> thanks for all the help. It turned out that consolidating my
> configuration made things significantly better: I was using Intel 16 (to
> avoid the Carpet problem with Intel 17) along with a strange mix of
> libraries (mostly compiled with Intel 17, and the only available on
> Marconi), and that seemed to impact the performance quite strongly. With
> everything Intel 17 (and using -no-vec on bbox.cc), I now obtain a
> runspeed on a Marconi KNL node which is around 80% of a Xeon E5 v4.
>
> There are still some puzzling features, though. One is that using
> -no-vec, along with the settings:
>
> VECTORISE                       = no
> VECTORISE_ALIGNED_ARRAYS        = no
> VECTORISE_INLINE                = no
> VECTORISE_ALIGN_FOR_CACHE       = no
> VECTORISE_ALIGN_INTERIOR        = no
>

I would expect that VECTORISE=yes (keep the others to "no") might improve
performance, in particular if you do not use hyperthreading, so that each
thread has more L1 cache space available.

in my optionlist, I obtain essentially the same throughput. This is a
> vacuum McLachlan run with very little else turned on (but I can run a
> QC0 benchmark for definiteness, if people are interested). I too am
> using the November release.
>
> Second, hyperthreading decreases the runspeed significantly. I am using
> 272 threads on the 68-core KNL, and for what I can gather from the
> Carpet output, all of the cores are engaged. More cores are reported,
> however, than available on the node:
>
> INFO (Carpet): MPI is enabled
> INFO (Carpet): Carpet is running on 1 processes
> INFO (Carpet): This is process 0
> INFO (Carpet): OpenMP is enabled
> INFO (Carpet): This process contains 272 threads, this is thread 0
> INFO (Carpet): There are 272 threads in total
> INFO (Carpet): There are 272 threads per process
> INFO (Carpet): This process runs on host r098c04s01, pid=2465840
> INFO (Carpet): This process runs on 272 cores: 0-271
> INFO (Carpet): Thread 0 runs on 1 core: 0
> INFO (Carpet): Thread 1 runs on 1 core: 68
> INFO (Carpet): Thread 2 runs on 1 core: 136
> INFO (Carpet): Thread 3 runs on 1 core: 204
>>

The nomenclature is inconsistent since it changes so often. This output
looks correct. (Carpet cannot easily distinguish between hyperthreads and
cores.) As long as there is only one thread per core, this is fine.

Notice that I am requesting hyperthreading by using num-smt=4 and
> num-threads=272. Is this correct?
>

This looks correct. You might also need to play with "ppn=" and "ppn-used=".

-erik

-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20170301/d3f2f522/attachment.html 


More information about the Users mailing list