[ET Trac] [Einstein Toolkit] #774: hack to unfix OpenMP threads from cores on Cray machines

Einstein Toolkit trac-noreply at einsteintoolkit.org
Sat Mar 17 16:45:56 CDT 2012


#774: hack to unfix OpenMP threads from cores on Cray machines
-----------------------+----------------------------------------------------
  Reporter:  rhaas     |       Owner:  eschnett
      Type:  task      |      Status:  closed  
  Priority:  optional  |   Milestone:          
 Component:  Carpet    |     Version:          
Resolution:  fixed     |    Keywords:  hack    
-----------------------+----------------------------------------------------

Comment (by rhaas):

 The issue occurs (eg on Kraken where each node has two cpu sockets and
 each socket houses 6 cores) if one runs with 12 threads. Then I get
 {{{
 INFO (Carpet): MPI is enabled
 INFO (Carpet): Carpet is running on 1 processes
 ESC[1mWARNING level 2 in thorn Carpet processor 0 host nid11406
   (line 212 of
 /nics/c/home/rhaas/Zelmani/arrangements/Carpet/Carpet/src/SetupGH.cc):
   ->ESC[0m Although MPI is enabled, the environment variable
 CACTUS_NUM_PROCS is not set.
 INFO (Carpet): This is process 0
 INFO (Carpet): OpenMP is enabled
 INFO (Carpet): This process contains 12 threads
 ESC[1mWARNING level 2 in thorn Carpet processor 0 host nid11406
   (line 246 of
 /nics/c/home/rhaas/Zelmani/arrangements/Carpet/Carpet/src/SetupGH.cc):
   ->ESC[0m Although OpenMP is enabled, the environment variable
 CACTUS_NUM_THREADS is not set.
 INFO (Carpet): There are 12 threads in total
 INFO (Carpet): There are 12 threads per process
 INFO (Carpet): Host listing:
    host 0: "nid11406"
 INFO (Carpet): Host/process mapping:
    process 0: host 0 "nid11406"
 INFO (Carpet): Host mapping: This is process 0, host 0 "nid11406"
 INFO (Carpet): This process runs on host nid11406, pid=8985
 INFO (Carpet): This process runs on 6 cores: 0-5
 ESC[1mWARNING level 1 in thorn Carpet processor 0 host nid11406
   (line 383 of
 /nics/c/home/rhaas/Zelmani/arrangements/Carpet/Carpet/src/SetupGH.cc):
   ->ESC[0m The number of threads for this process is larger its number of
 cores. This may indicate a performance problem.
 }}}

 This was run with:
 {{{
  aprun -cc numa_node -n @NUM_PROCS@ -d @NUM_THREADS@ ${NODE_PROCS}
 ${SOCKET_PROCS} @EXECUTABLE@ -L 3 @PARFILE@
 }}}
 and
 {{{
 create-submit affinitytest --procs 12 --num-threads 12 --walltime 0:05:00
 }}}

 For this situation the Cray affinity display utility outputs:
 {{{
 Hello from rank 0, thread 1, on nid01593. (core affinity = 0-5)
 Hello from rank 0, thread 2, on nid01593. (core affinity = 0-5)
 Hello from rank 0, thread 3, on nid01593. (core affinity = 0-5)
 Hello from rank 0, thread 4, on nid01593. (core affinity = 0-5)
 Hello from rank 0, thread 5, on nid01593. (core affinity = 6-11)
 Hello from rank 0, thread 6, on nid01593. (core affinity = 6-11)
 Hello from rank 0, thread 7, on nid01593. (core affinity = 6-11)
 Hello from rank 0, thread 8, on nid01593. (core affinity = 6-11)
 Hello from rank 0, thread 11, on nid01593. (core affinity = 0-5)
 Hello from rank 0, thread 0, on nid01593. (core affinity = 0-5)
 Hello from rank 0, thread 9, on nid01593. (core affinity = 6-11)
 Hello from rank 0, thread 10, on nid01593. (core affinity = 6-11)
 }}}

 The reason for the warning is that each individual thread is given 6 cores
 to run on, not 12. This is actually a sensible setting since it prevents
 threads migrating from one socket to the other which can impact memory
 bandwidth. It would even be perceivable to bind each thread to exactly one
 core, which is fine as long as no two threads '''have''' to run on the
 same core.

 I think a sharp version of the criterion would be: Given all the allowed
 affinity sets of all threads in all the processes on this cluster node, is
 there a way of spreading the threads over the cores so that no core is
 occupied by more than one thread? If not, warn. This seems like a hard
 thing to test for (though sounds suspiciously like one of these travelling
 salesman like problems [it's not the travelling salesman problem
 certainly]) :-( .

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/774#comment:5>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list