[ET Trac] [Einstein Toolkit] #1850: Severe performance problem on Stampede
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Wed Dec 16 05:21:50 CST 2015
#1850: Severe performance problem on Stampede
-------------------------+--------------------------------------------------
Reporter: hinder | Owner:
Type: defect | Status: confirmed
Priority: major | Milestone:
Component: SimFactory | Version: development version
Resolution: | Keywords:
-------------------------+--------------------------------------------------
Comment (by hinder):
Replying to [comment:8 knarf]:
> To summarize the current status: setting KMP_AFFINITY seems to be
necessary for performance when using SystemTopology, but is harmful when
not using it: either you have to use both, or none. Do I understand this
correctly?
That is not what I observed. From the results that I saw, the only
combination which results in slow speeds (factor of 8) is setting
KMP_AFFINITY as simfactory sets it, and not using the thorns. This
suggests that the thorns are doing the right thing, and overriding
whatever the environment variable has set; hence anyone who uses those
thorns won't see a problem. It also suggests that the environment
variable setting is wrong (not just suboptimal). To debug the problem, we
could run hwloc (or is it SystemTopology?) with parameters set to just
report the affinity, rather than set it, and see what the environment
variable is doing. The documentation for that variable is at
https://software.intel.com/en-us/node/522691#AFFINITY_TYPES, but I find it
hard to understand:
> type = compact
> Specifying compact assigns the OpenMP* thread <n>+1 to a free thread
context as close as possible to the thread context where the <n> OpenMP*
thread was placed. For example, in a topology map, the nearer a node is to
the root, the more significance the node has when sorting the threads.
> modifier = norespect
> Do not respect original affinity mask for the process. Binds OpenMP*
threads to all operating system processors.
> In early versions of the OpenMP* run-time library that supported only
the physical and logical affinity types, norespect was the default and was
not recognized as a modifier.
> The default was changed to respect when types compact and scatter were
added; therefore, thread bindings for the logical and physical affinity
types may have changed with the newer compilers in situations where the
application specified a partial initial thread affinity mask.
My initial reading of this is that "norespect" means that threads within a
process may run on any OS processor, which I think translates into any
physical core, i.e. also any physical processor. But I am not an expert
on this variable. Erik, do you know what this setting is supposed to do?
Note that Michael Clark reported different results, but he says that they
were probably not accurate, as he cannot reproduce them now.
Michael: is it possible that the run script you were using was not the
updated one you had modified? Editing the run script in
simfactory/mdb/runscripts is not sufficient. It then needs to be added to
the Cactus configuration before rerunning. This requires a "sim build
<config> --runscript <runscriptname>".
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1850#comment:16>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list