[ET Trac] [Einstein Toolkit] #1850: Severe performance problem on Stampede

Einstein Toolkit trac-noreply at einsteintoolkit.org
Wed Dec 16 05:21:50 CST 2015


#1850: Severe performance problem on Stampede
-------------------------+--------------------------------------------------
  Reporter:  hinder      |       Owner:                     
      Type:  defect      |      Status:  confirmed          
  Priority:  major       |   Milestone:                     
 Component:  SimFactory  |     Version:  development version
Resolution:              |    Keywords:                     
-------------------------+--------------------------------------------------

Comment (by hinder):

 Replying to [comment:8 knarf]:
 > To summarize the current status: setting KMP_AFFINITY seems to be
 necessary for performance when using SystemTopology, but is harmful when
 not using it: either you have to use both, or none. Do I understand this
 correctly?

 That is not what I observed.  From the results that I saw, the only
 combination which results in slow speeds (factor of 8) is setting
 KMP_AFFINITY as simfactory sets it, and not using the thorns.  This
 suggests that the thorns are doing the right thing, and overriding
 whatever the environment variable has set; hence anyone who uses those
 thorns won't see a problem.  It also suggests that the environment
 variable setting is wrong (not just suboptimal).  To debug the problem, we
 could run hwloc (or is it SystemTopology?) with parameters set to just
 report the affinity, rather than set it, and see what the environment
 variable is doing.  The documentation for that variable is at
 https://software.intel.com/en-us/node/522691#AFFINITY_TYPES, but I find it
 hard to understand:

 > type = compact
 > Specifying compact assigns the OpenMP* thread <n>+1 to a free thread
 context as close as possible to the thread context where the <n> OpenMP*
 thread was placed. For example, in a topology map, the nearer a node is to
 the root, the more significance the node has when sorting the threads.

 > modifier = norespect
 > Do not respect original affinity mask for the process. Binds OpenMP*
 threads to all operating system processors.
 > In early versions of the OpenMP* run-time library that supported only
 the physical and logical affinity types, norespect was the default and was
 not recognized as a modifier.
 > The default was changed to respect when types compact and scatter were
 added; therefore, thread bindings for the logical and physical affinity
 types may have changed with the newer compilers in situations where the
 application specified a partial initial thread affinity mask.

 My initial reading of this is that "norespect" means that threads within a
 process may run on any OS processor, which I think translates into any
 physical core, i.e. also any physical processor.  But I am not an expert
 on this variable.  Erik, do you know what this setting is supposed to do?

 Note that Michael Clark reported different results, but he says that they
 were probably not accurate, as he cannot reproduce them now.

 Michael: is it possible that the run script you were using was not the
 updated one you had modified?  Editing the run script in
 simfactory/mdb/runscripts is not sufficient.  It then needs to be added to
 the Cactus configuration before rerunning.  This requires a "sim build
 <config> --runscript <runscriptname>".

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1850#comment:16>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list