[ET Trac] [Einstein Toolkit] #774: hack to unfix OpenMP threads from cores on Cray machines
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Sat Mar 17 16:45:56 CDT 2012
#774: hack to unfix OpenMP threads from cores on Cray machines
-----------------------+----------------------------------------------------
Reporter: rhaas | Owner: eschnett
Type: task | Status: closed
Priority: optional | Milestone:
Component: Carpet | Version:
Resolution: fixed | Keywords: hack
-----------------------+----------------------------------------------------
Comment (by rhaas):
The issue occurs (eg on Kraken where each node has two cpu sockets and
each socket houses 6 cores) if one runs with 12 threads. Then I get
{{{
INFO (Carpet): MPI is enabled
INFO (Carpet): Carpet is running on 1 processes
ESC[1mWARNING level 2 in thorn Carpet processor 0 host nid11406
(line 212 of
/nics/c/home/rhaas/Zelmani/arrangements/Carpet/Carpet/src/SetupGH.cc):
->ESC[0m Although MPI is enabled, the environment variable
CACTUS_NUM_PROCS is not set.
INFO (Carpet): This is process 0
INFO (Carpet): OpenMP is enabled
INFO (Carpet): This process contains 12 threads
ESC[1mWARNING level 2 in thorn Carpet processor 0 host nid11406
(line 246 of
/nics/c/home/rhaas/Zelmani/arrangements/Carpet/Carpet/src/SetupGH.cc):
->ESC[0m Although OpenMP is enabled, the environment variable
CACTUS_NUM_THREADS is not set.
INFO (Carpet): There are 12 threads in total
INFO (Carpet): There are 12 threads per process
INFO (Carpet): Host listing:
host 0: "nid11406"
INFO (Carpet): Host/process mapping:
process 0: host 0 "nid11406"
INFO (Carpet): Host mapping: This is process 0, host 0 "nid11406"
INFO (Carpet): This process runs on host nid11406, pid=8985
INFO (Carpet): This process runs on 6 cores: 0-5
ESC[1mWARNING level 1 in thorn Carpet processor 0 host nid11406
(line 383 of
/nics/c/home/rhaas/Zelmani/arrangements/Carpet/Carpet/src/SetupGH.cc):
->ESC[0m The number of threads for this process is larger its number of
cores. This may indicate a performance problem.
}}}
This was run with:
{{{
aprun -cc numa_node -n @NUM_PROCS@ -d @NUM_THREADS@ ${NODE_PROCS}
${SOCKET_PROCS} @EXECUTABLE@ -L 3 @PARFILE@
}}}
and
{{{
create-submit affinitytest --procs 12 --num-threads 12 --walltime 0:05:00
}}}
For this situation the Cray affinity display utility outputs:
{{{
Hello from rank 0, thread 1, on nid01593. (core affinity = 0-5)
Hello from rank 0, thread 2, on nid01593. (core affinity = 0-5)
Hello from rank 0, thread 3, on nid01593. (core affinity = 0-5)
Hello from rank 0, thread 4, on nid01593. (core affinity = 0-5)
Hello from rank 0, thread 5, on nid01593. (core affinity = 6-11)
Hello from rank 0, thread 6, on nid01593. (core affinity = 6-11)
Hello from rank 0, thread 7, on nid01593. (core affinity = 6-11)
Hello from rank 0, thread 8, on nid01593. (core affinity = 6-11)
Hello from rank 0, thread 11, on nid01593. (core affinity = 0-5)
Hello from rank 0, thread 0, on nid01593. (core affinity = 0-5)
Hello from rank 0, thread 9, on nid01593. (core affinity = 6-11)
Hello from rank 0, thread 10, on nid01593. (core affinity = 6-11)
}}}
The reason for the warning is that each individual thread is given 6 cores
to run on, not 12. This is actually a sensible setting since it prevents
threads migrating from one socket to the other which can impact memory
bandwidth. It would even be perceivable to bind each thread to exactly one
core, which is fine as long as no two threads '''have''' to run on the
same core.
I think a sharp version of the criterion would be: Given all the allowed
affinity sets of all threads in all the processes on this cluster node, is
there a way of spreading the threads over the cores so that no core is
occupied by more than one thread? If not, warn. This seems like a hard
thing to test for (though sounds suspiciously like one of these travelling
salesman like problems [it's not the travelling salesman problem
certainly]) :-( .
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/774#comment:5>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list