[ET Trac] [Einstein Toolkit] #1527: simfactory run script for bluewaters does not use -d or -cc numa_node

Einstein Toolkit trac-noreply at einsteintoolkit.org
Thu Jan 23 13:38:28 CST 2014


#1527: simfactory run script for bluewaters does not use -d or -cc numa_node
-------------------------+--------------------------------------------------
  Reporter:  rhaas       |       Owner:  eschnett           
      Type:  defect      |      Status:  review             
  Priority:  minor       |   Milestone:                     
 Component:  SimFactory  |     Version:  development version
Resolution:              |    Keywords:                     
-------------------------+--------------------------------------------------

Comment (by rhaas):

 Both runs used hwloc. Hwloc's output for the unmodified run is:
 {{{
 NFO (hwloc): MPI process-to-host mapping:
 This is MPI process 0 of 8
 MPI hosts:
   0: nid07793
   1: nid07854
 This MPI process runs on host 0 of 2
 On this host, this is MPI process 0 of 4
 INFO (hwloc): Topology support:
 Discovery support:
   discovery->pu                            : yes
 CPU binding support:
   cpubind->set_thisproc_cpubind            : yes
   cpubind->get_thisproc_cpubind            : yes
   cpubind->set_proc_cpubind                : yes
   cpubind->get_proc_cpubind                : yes
   cpubind->set_thisthread_cpubind          : yes
   cpubind->get_thisthread_cpubind          : yes
   cpubind->set_thread_cpubind              : yes
   cpubind->get_thread_cpubind              : yes
   cpubind->get_thisproc_last_cpu_location  : yes
   cpubind->get_proc_last_cpu_location      : yes
   cpubind->get_thisthread_last_cpu_location: yes
 Memory binding support:
   membind->set_thisproc_membind            : no
   membind->get_thisproc_membind            : no
   membind->set_proc_membind                : no
   membind->get_proc_membind                : no
   membind->set_thisthread_membind          : yes
   membind->get_thisthread_membind          : yes
   membind->set_area_membind                : yes
   membind->get_area_membind                : yes
   membind->alloc_membind                   : yes
   membind->firsttouch_membind              : yes
   membind->bind_membind                    : yes
   membind->interleave_membind              : yes
   membind->replicate_membind               : no
   membind->nexttouch_membind               : no
   membind->migrate_membind                 : yes
 INFO (hwloc): Hardware objects in this node:
 Machine L#0: (P#0, total=67108480KB, Backend=Linux, LinuxCgroup=/3012719,
 OSName=Linux, OSRelease=2.6.32.59-0.7.1_1.0402.7496-cray_gem_c,
 OSVersion="#1 SMP Wed Aug 7 03:55:25 UTC 2013", HostName=nid07793,
 Architecture=x86_64)
   Socket L#0: (P#0, total=33554048KB, CPUModel="AMD Opteron(TM) Processor
 6276                 ")
     NUMANode L#0: (P#0, local=16776832KB, total=16776832KB)
       L3Cache L#0: (P#-1, size=6144KB, linesize=64, ways=64)
         L2Cache L#0: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#0: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#0: (P#0)
               PU L#0: (P#0)
         L2Cache L#1: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#1: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#1: (P#1)
               PU L#1: (P#1)
         L2Cache L#2: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#2: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#2: (P#2)
               PU L#2: (P#2)
         L2Cache L#3: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#3: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#3: (P#3)
               PU L#3: (P#3)
         L2Cache L#4: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#4: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#4: (P#4)
               PU L#4: (P#4)
         L2Cache L#5: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#5: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#5: (P#5)
               PU L#5: (P#5)
         L2Cache L#6: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#6: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#6: (P#6)
               PU L#6: (P#6)
         L2Cache L#7: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#7: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#7: (P#7)
               PU L#7: (P#7)
     NUMANode L#1: (P#1, local=16777216KB, total=16777216KB)
       L3Cache L#1: (P#-1, size=6144KB, linesize=64, ways=64)
         L2Cache L#8: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#8: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#8: (P#0)
               PU L#8: (P#8)
         L2Cache L#9: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#9: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#9: (P#1)
               PU L#9: (P#9)
         L2Cache L#10: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#10: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#10: (P#2)
               PU L#10: (P#10)
         L2Cache L#11: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#11: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#11: (P#3)
               PU L#11: (P#11)
         L2Cache L#12: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#12: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#12: (P#4)
               PU L#12: (P#12)
         L2Cache L#13: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#13: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#13: (P#5)
               PU L#13: (P#13)
         L2Cache L#14: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#14: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#14: (P#6)
               PU L#14: (P#14)
         L2Cache L#15: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#15: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#15: (P#7)
               PU L#15: (P#15)
   Socket L#1: (P#1, total=33554432KB, CPUModel="AMD Opteron(TM) Processor
 6276                 ")
     NUMANode L#2: (P#2, local=16777216KB, total=16777216KB)
       L3Cache L#2: (P#-1, size=6144KB, linesize=64, ways=64)
         L2Cache L#16: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#16: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#16: (P#0)
               PU L#16: (P#16)
         L2Cache L#17: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#17: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#17: (P#1)
               PU L#17: (P#17)
         L2Cache L#18: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#18: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#18: (P#2)
               PU L#18: (P#18)
         L2Cache L#19: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#19: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#19: (P#3)
               PU L#19: (P#19)
         L2Cache L#20: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#20: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#20: (P#4)
               PU L#20: (P#20)
         L2Cache L#21: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#21: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#21: (P#5)
               PU L#21: (P#21)
         L2Cache L#22: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#22: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#22: (P#6)
               PU L#22: (P#22)
         L2Cache L#23: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#23: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#23: (P#7)
               PU L#23: (P#23)
     NUMANode L#3: (P#3, local=16777216KB, total=16777216KB)
       L3Cache L#3: (P#-1, size=6144KB, linesize=64, ways=64)
         L2Cache L#24: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#24: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#24: (P#0)
               PU L#24: (P#24)
         L2Cache L#25: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#25: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#25: (P#1)
               PU L#25: (P#25)
         L2Cache L#26: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#26: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#26: (P#2)
               PU L#26: (P#26)
         L2Cache L#27: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#27: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#27: (P#3)
               PU L#27: (P#27)
         L2Cache L#28: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#28: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#28: (P#4)
               PU L#28: (P#28)
         L2Cache L#29: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#29: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#29: (P#5)
               PU L#29: (P#29)
         L2Cache L#30: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#30: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#30: (P#6)
               PU L#30: (P#30)
         L2Cache L#31: (P#-1, size=2048KB, linesize=64, ways=16)
           L1dCache L#31: (P#-1, size=16KB, linesize=64, ways=4)
             Core L#31: (P#7)
               PU L#31: (P#31)
 INFO (hwloc): Thread CPU bindings:
   MPI process 0 on host 0 (process 0 of 4 on this host)
     OpenMP thread 0: PU set L#{0} P#{0}
     OpenMP thread 1: PU set L#{0} P#{0}
     OpenMP thread 2: PU set L#{0} P#{0}
     OpenMP thread 3: PU set L#{0} P#{0}
     OpenMP thread 4: PU set L#{0} P#{0}
     OpenMP thread 5: PU set L#{0} P#{0}
     OpenMP thread 6: PU set L#{0} P#{0}
     OpenMP thread 7: PU set L#{0} P#{0}
   MPI process 1 on host 0 (process 1 of 4 on this host)
     OpenMP thread 0: PU set L#{1} P#{1}
     OpenMP thread 1: PU set L#{1} P#{1}
     OpenMP thread 2: PU set L#{1} P#{1}
     OpenMP thread 3: PU set L#{1} P#{1}
     OpenMP thread 4: PU set L#{1} P#{1}
     OpenMP thread 5: PU set L#{1} P#{1}
     OpenMP thread 6: PU set L#{1} P#{1}
     OpenMP thread 7: PU set L#{1} P#{1}
   MPI process 2 on host 0 (process 2 of 4 on this host)
     OpenMP thread 0: PU set L#{2} P#{2}
     OpenMP thread 1: PU set L#{2} P#{2}
     OpenMP thread 2: PU set L#{2} P#{2}
     OpenMP thread 3: PU set L#{2} P#{2}
     OpenMP thread 4: PU set L#{2} P#{2}
     OpenMP thread 5: PU set L#{2} P#{2}
     OpenMP thread 6: PU set L#{2} P#{2}
     OpenMP thread 7: PU set L#{2} P#{2}
   MPI process 3 on host 0 (process 3 of 4 on this host)
     OpenMP thread 0: PU set L#{3} P#{3}
     OpenMP thread 1: PU set L#{3} P#{3}
     OpenMP thread 2: PU set L#{3} P#{3}
     OpenMP thread 3: PU set L#{3} P#{3}
     OpenMP thread 4: PU set L#{3} P#{3}
     OpenMP thread 5: PU set L#{3} P#{3}
     OpenMP thread 6: PU set L#{3} P#{3}
     OpenMP thread 7: PU set L#{3} P#{3}
 INFO (hwloc): Setting thread CPU bindings:
 INFO (hwloc): Thread CPU bindings:
   MPI process 0 on host 0 (process 0 of 4 on this host)
     OpenMP thread 0: PU set L#{0} P#{0}
     OpenMP thread 1: PU set L#{1} P#{1}
     OpenMP thread 2: PU set L#{2} P#{2}
     OpenMP thread 3: PU set L#{3} P#{3}
     OpenMP thread 4: PU set L#{4} P#{4}
     OpenMP thread 5: PU set L#{5} P#{5}
     OpenMP thread 6: PU set L#{6} P#{6}
     OpenMP thread 7: PU set L#{7} P#{7}
   MPI process 1 on host 0 (process 1 of 4 on this host)
     OpenMP thread 0: PU set L#{8} P#{8}
     OpenMP thread 1: PU set L#{9} P#{9}
     OpenMP thread 2: PU set L#{10} P#{10}
     OpenMP thread 3: PU set L#{11} P#{11}
     OpenMP thread 4: PU set L#{12} P#{12}
     OpenMP thread 5: PU set L#{13} P#{13}
     OpenMP thread 6: PU set L#{14} P#{14}
     OpenMP thread 7: PU set L#{15} P#{15}
   MPI process 2 on host 0 (process 2 of 4 on this host)
     OpenMP thread 0: PU set L#{16} P#{16}
     OpenMP thread 1: PU set L#{17} P#{17}
     OpenMP thread 2: PU set L#{18} P#{18}
     OpenMP thread 3: PU set L#{19} P#{19}
     OpenMP thread 4: PU set L#{20} P#{20}
     OpenMP thread 5: PU set L#{21} P#{21}
     OpenMP thread 6: PU set L#{22} P#{22}
     OpenMP thread 7: PU set L#{23} P#{23}
   MPI process 3 on host 0 (process 3 of 4 on this host)
     OpenMP thread 0: PU set L#{24} P#{24}
     OpenMP thread 1: PU set L#{25} P#{25}
     OpenMP thread 2: PU set L#{26} P#{26}
     OpenMP thread 3: PU set L#{27} P#{27}
     OpenMP thread 4: PU set L#{28} P#{28}
     OpenMP thread 5: PU set L#{29} P#{29}
     OpenMP thread 6: PU set L#{30} P#{30}
     OpenMP thread 7: PU set L#{31} P#{31}
 INFO (hwloc): Extracting CPU/cache/memory properties:
   There are 1 PUs per core (aka hardware SMT threads)
   There are 1 threads per core (aka SMT threads used)
   Cache (unknown name) has type "data" depth 1
     size 16384 linesize 64 associativity 4 stride 4096, for 1 PUs
   Cache (unknown name) has type "unified" depth 2
     size 2097152 linesize 64 associativity 16 stride 131072, for 1 PUs
   Cache (unknown name) has type "unified" depth 3
     size 6291456 linesize 64 associativity 64 stride 98304, for 8 PUs
   Memory has type "local" depth 2
     size 17179475968 pagesize 4096, for 8 PUs
   Memory has type "global" depth 2
     size 68717903872 pagesize 4096, for 32 PUs
 }}}
 I attach stdout for both runs. The new runscript's output is in
 qc0-new.out and the unmodified one in qc0-vanilla.out.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1527#comment:3>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list