[ET Trac] [Einstein Toolkit] #1850: Severe performance problem on Stampede
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Tue Dec 15 13:25:07 CST 2015
#1850: Severe performance problem on Stampede
-------------------------+--------------------------------------------------
Reporter: hinder | Owner:
Type: defect | Status: confirmed
Priority: major | Milestone:
Component: SimFactory | Version: development version
Resolution: | Keywords:
-------------------------+--------------------------------------------------
Comment (by eschnett):
I am surprised that the KMP_* option is necessary or beneficial in any
case. This sets up affinity via the Intel compiler. The compiler knows
nothing about MPI, hence it cannot reasonably distribute threads when
there are multiple MPI processes per node.
SystemTopology can undo all thread affinities. However, since MPI is
initialized before SystemTopology runs, it already needs to have the
correct socket (but not core) affinities set up on startup. The queueing
system can do this, but not the compiler. This is why it is currently
important to have the queuing system set up at least socket affinities.
As the original report speaks of "16 threads", this may be the case where
there is 1 MPI process with 16 threads running. If so, I am very surprised
that the Intel compiler does not set up good affinities -- as in this
case, it has sufficient knowledge to do so. It may be that this option was
chosen assuming there is a 1:1 correspondence between sockets and MPI
processes?
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1850#comment:11>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list