[ET Trac] [Einstein Toolkit] #1850: Severe performance problem on Stampede

Einstein Toolkit trac-noreply at einsteintoolkit.org
Tue Dec 15 13:25:07 CST 2015


#1850: Severe performance problem on Stampede
-------------------------+--------------------------------------------------
  Reporter:  hinder      |       Owner:                     
      Type:  defect      |      Status:  confirmed          
  Priority:  major       |   Milestone:                     
 Component:  SimFactory  |     Version:  development version
Resolution:              |    Keywords:                     
-------------------------+--------------------------------------------------

Comment (by eschnett):

 I am surprised that the KMP_* option is necessary or beneficial in any
 case. This sets up affinity via the Intel compiler. The compiler knows
 nothing about MPI, hence it cannot reasonably distribute threads when
 there are multiple MPI processes per node.

 SystemTopology can undo all thread affinities. However, since MPI is
 initialized before SystemTopology runs, it already needs to have the
 correct socket (but not core) affinities set up on startup. The queueing
 system can do this, but not the compiler. This is why it is currently
 important to have the queuing system set up at least socket affinities.

 As the original report speaks of "16 threads", this may be the case where
 there is 1 MPI process with 16 threads running. If so, I am very surprised
 that the Intel compiler does not set up good affinities -- as in this
 case, it has sufficient knowledge to do so. It may be that this option was
 chosen assuming there is a 1:1 correspondence between sockets and MPI
 processes?

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1850#comment:11>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list