[ET Trac] [Einstein Toolkit] #1527: simfactory run script for bluewaters does not use -d or -cc numa_node
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Thu Jan 23 13:38:28 CST 2014
#1527: simfactory run script for bluewaters does not use -d or -cc numa_node
-------------------------+--------------------------------------------------
Reporter: rhaas | Owner: eschnett
Type: defect | Status: review
Priority: minor | Milestone:
Component: SimFactory | Version: development version
Resolution: | Keywords:
-------------------------+--------------------------------------------------
Comment (by rhaas):
Both runs used hwloc. Hwloc's output for the unmodified run is:
{{{
NFO (hwloc): MPI process-to-host mapping:
This is MPI process 0 of 8
MPI hosts:
0: nid07793
1: nid07854
This MPI process runs on host 0 of 2
On this host, this is MPI process 0 of 4
INFO (hwloc): Topology support:
Discovery support:
discovery->pu : yes
CPU binding support:
cpubind->set_thisproc_cpubind : yes
cpubind->get_thisproc_cpubind : yes
cpubind->set_proc_cpubind : yes
cpubind->get_proc_cpubind : yes
cpubind->set_thisthread_cpubind : yes
cpubind->get_thisthread_cpubind : yes
cpubind->set_thread_cpubind : yes
cpubind->get_thread_cpubind : yes
cpubind->get_thisproc_last_cpu_location : yes
cpubind->get_proc_last_cpu_location : yes
cpubind->get_thisthread_last_cpu_location: yes
Memory binding support:
membind->set_thisproc_membind : no
membind->get_thisproc_membind : no
membind->set_proc_membind : no
membind->get_proc_membind : no
membind->set_thisthread_membind : yes
membind->get_thisthread_membind : yes
membind->set_area_membind : yes
membind->get_area_membind : yes
membind->alloc_membind : yes
membind->firsttouch_membind : yes
membind->bind_membind : yes
membind->interleave_membind : yes
membind->replicate_membind : no
membind->nexttouch_membind : no
membind->migrate_membind : yes
INFO (hwloc): Hardware objects in this node:
Machine L#0: (P#0, total=67108480KB, Backend=Linux, LinuxCgroup=/3012719,
OSName=Linux, OSRelease=2.6.32.59-0.7.1_1.0402.7496-cray_gem_c,
OSVersion="#1 SMP Wed Aug 7 03:55:25 UTC 2013", HostName=nid07793,
Architecture=x86_64)
Socket L#0: (P#0, total=33554048KB, CPUModel="AMD Opteron(TM) Processor
6276 ")
NUMANode L#0: (P#0, local=16776832KB, total=16776832KB)
L3Cache L#0: (P#-1, size=6144KB, linesize=64, ways=64)
L2Cache L#0: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#0: (P#-1, size=16KB, linesize=64, ways=4)
Core L#0: (P#0)
PU L#0: (P#0)
L2Cache L#1: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#1: (P#-1, size=16KB, linesize=64, ways=4)
Core L#1: (P#1)
PU L#1: (P#1)
L2Cache L#2: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#2: (P#-1, size=16KB, linesize=64, ways=4)
Core L#2: (P#2)
PU L#2: (P#2)
L2Cache L#3: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#3: (P#-1, size=16KB, linesize=64, ways=4)
Core L#3: (P#3)
PU L#3: (P#3)
L2Cache L#4: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#4: (P#-1, size=16KB, linesize=64, ways=4)
Core L#4: (P#4)
PU L#4: (P#4)
L2Cache L#5: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#5: (P#-1, size=16KB, linesize=64, ways=4)
Core L#5: (P#5)
PU L#5: (P#5)
L2Cache L#6: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#6: (P#-1, size=16KB, linesize=64, ways=4)
Core L#6: (P#6)
PU L#6: (P#6)
L2Cache L#7: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#7: (P#-1, size=16KB, linesize=64, ways=4)
Core L#7: (P#7)
PU L#7: (P#7)
NUMANode L#1: (P#1, local=16777216KB, total=16777216KB)
L3Cache L#1: (P#-1, size=6144KB, linesize=64, ways=64)
L2Cache L#8: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#8: (P#-1, size=16KB, linesize=64, ways=4)
Core L#8: (P#0)
PU L#8: (P#8)
L2Cache L#9: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#9: (P#-1, size=16KB, linesize=64, ways=4)
Core L#9: (P#1)
PU L#9: (P#9)
L2Cache L#10: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#10: (P#-1, size=16KB, linesize=64, ways=4)
Core L#10: (P#2)
PU L#10: (P#10)
L2Cache L#11: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#11: (P#-1, size=16KB, linesize=64, ways=4)
Core L#11: (P#3)
PU L#11: (P#11)
L2Cache L#12: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#12: (P#-1, size=16KB, linesize=64, ways=4)
Core L#12: (P#4)
PU L#12: (P#12)
L2Cache L#13: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#13: (P#-1, size=16KB, linesize=64, ways=4)
Core L#13: (P#5)
PU L#13: (P#13)
L2Cache L#14: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#14: (P#-1, size=16KB, linesize=64, ways=4)
Core L#14: (P#6)
PU L#14: (P#14)
L2Cache L#15: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#15: (P#-1, size=16KB, linesize=64, ways=4)
Core L#15: (P#7)
PU L#15: (P#15)
Socket L#1: (P#1, total=33554432KB, CPUModel="AMD Opteron(TM) Processor
6276 ")
NUMANode L#2: (P#2, local=16777216KB, total=16777216KB)
L3Cache L#2: (P#-1, size=6144KB, linesize=64, ways=64)
L2Cache L#16: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#16: (P#-1, size=16KB, linesize=64, ways=4)
Core L#16: (P#0)
PU L#16: (P#16)
L2Cache L#17: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#17: (P#-1, size=16KB, linesize=64, ways=4)
Core L#17: (P#1)
PU L#17: (P#17)
L2Cache L#18: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#18: (P#-1, size=16KB, linesize=64, ways=4)
Core L#18: (P#2)
PU L#18: (P#18)
L2Cache L#19: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#19: (P#-1, size=16KB, linesize=64, ways=4)
Core L#19: (P#3)
PU L#19: (P#19)
L2Cache L#20: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#20: (P#-1, size=16KB, linesize=64, ways=4)
Core L#20: (P#4)
PU L#20: (P#20)
L2Cache L#21: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#21: (P#-1, size=16KB, linesize=64, ways=4)
Core L#21: (P#5)
PU L#21: (P#21)
L2Cache L#22: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#22: (P#-1, size=16KB, linesize=64, ways=4)
Core L#22: (P#6)
PU L#22: (P#22)
L2Cache L#23: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#23: (P#-1, size=16KB, linesize=64, ways=4)
Core L#23: (P#7)
PU L#23: (P#23)
NUMANode L#3: (P#3, local=16777216KB, total=16777216KB)
L3Cache L#3: (P#-1, size=6144KB, linesize=64, ways=64)
L2Cache L#24: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#24: (P#-1, size=16KB, linesize=64, ways=4)
Core L#24: (P#0)
PU L#24: (P#24)
L2Cache L#25: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#25: (P#-1, size=16KB, linesize=64, ways=4)
Core L#25: (P#1)
PU L#25: (P#25)
L2Cache L#26: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#26: (P#-1, size=16KB, linesize=64, ways=4)
Core L#26: (P#2)
PU L#26: (P#26)
L2Cache L#27: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#27: (P#-1, size=16KB, linesize=64, ways=4)
Core L#27: (P#3)
PU L#27: (P#27)
L2Cache L#28: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#28: (P#-1, size=16KB, linesize=64, ways=4)
Core L#28: (P#4)
PU L#28: (P#28)
L2Cache L#29: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#29: (P#-1, size=16KB, linesize=64, ways=4)
Core L#29: (P#5)
PU L#29: (P#29)
L2Cache L#30: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#30: (P#-1, size=16KB, linesize=64, ways=4)
Core L#30: (P#6)
PU L#30: (P#30)
L2Cache L#31: (P#-1, size=2048KB, linesize=64, ways=16)
L1dCache L#31: (P#-1, size=16KB, linesize=64, ways=4)
Core L#31: (P#7)
PU L#31: (P#31)
INFO (hwloc): Thread CPU bindings:
MPI process 0 on host 0 (process 0 of 4 on this host)
OpenMP thread 0: PU set L#{0} P#{0}
OpenMP thread 1: PU set L#{0} P#{0}
OpenMP thread 2: PU set L#{0} P#{0}
OpenMP thread 3: PU set L#{0} P#{0}
OpenMP thread 4: PU set L#{0} P#{0}
OpenMP thread 5: PU set L#{0} P#{0}
OpenMP thread 6: PU set L#{0} P#{0}
OpenMP thread 7: PU set L#{0} P#{0}
MPI process 1 on host 0 (process 1 of 4 on this host)
OpenMP thread 0: PU set L#{1} P#{1}
OpenMP thread 1: PU set L#{1} P#{1}
OpenMP thread 2: PU set L#{1} P#{1}
OpenMP thread 3: PU set L#{1} P#{1}
OpenMP thread 4: PU set L#{1} P#{1}
OpenMP thread 5: PU set L#{1} P#{1}
OpenMP thread 6: PU set L#{1} P#{1}
OpenMP thread 7: PU set L#{1} P#{1}
MPI process 2 on host 0 (process 2 of 4 on this host)
OpenMP thread 0: PU set L#{2} P#{2}
OpenMP thread 1: PU set L#{2} P#{2}
OpenMP thread 2: PU set L#{2} P#{2}
OpenMP thread 3: PU set L#{2} P#{2}
OpenMP thread 4: PU set L#{2} P#{2}
OpenMP thread 5: PU set L#{2} P#{2}
OpenMP thread 6: PU set L#{2} P#{2}
OpenMP thread 7: PU set L#{2} P#{2}
MPI process 3 on host 0 (process 3 of 4 on this host)
OpenMP thread 0: PU set L#{3} P#{3}
OpenMP thread 1: PU set L#{3} P#{3}
OpenMP thread 2: PU set L#{3} P#{3}
OpenMP thread 3: PU set L#{3} P#{3}
OpenMP thread 4: PU set L#{3} P#{3}
OpenMP thread 5: PU set L#{3} P#{3}
OpenMP thread 6: PU set L#{3} P#{3}
OpenMP thread 7: PU set L#{3} P#{3}
INFO (hwloc): Setting thread CPU bindings:
INFO (hwloc): Thread CPU bindings:
MPI process 0 on host 0 (process 0 of 4 on this host)
OpenMP thread 0: PU set L#{0} P#{0}
OpenMP thread 1: PU set L#{1} P#{1}
OpenMP thread 2: PU set L#{2} P#{2}
OpenMP thread 3: PU set L#{3} P#{3}
OpenMP thread 4: PU set L#{4} P#{4}
OpenMP thread 5: PU set L#{5} P#{5}
OpenMP thread 6: PU set L#{6} P#{6}
OpenMP thread 7: PU set L#{7} P#{7}
MPI process 1 on host 0 (process 1 of 4 on this host)
OpenMP thread 0: PU set L#{8} P#{8}
OpenMP thread 1: PU set L#{9} P#{9}
OpenMP thread 2: PU set L#{10} P#{10}
OpenMP thread 3: PU set L#{11} P#{11}
OpenMP thread 4: PU set L#{12} P#{12}
OpenMP thread 5: PU set L#{13} P#{13}
OpenMP thread 6: PU set L#{14} P#{14}
OpenMP thread 7: PU set L#{15} P#{15}
MPI process 2 on host 0 (process 2 of 4 on this host)
OpenMP thread 0: PU set L#{16} P#{16}
OpenMP thread 1: PU set L#{17} P#{17}
OpenMP thread 2: PU set L#{18} P#{18}
OpenMP thread 3: PU set L#{19} P#{19}
OpenMP thread 4: PU set L#{20} P#{20}
OpenMP thread 5: PU set L#{21} P#{21}
OpenMP thread 6: PU set L#{22} P#{22}
OpenMP thread 7: PU set L#{23} P#{23}
MPI process 3 on host 0 (process 3 of 4 on this host)
OpenMP thread 0: PU set L#{24} P#{24}
OpenMP thread 1: PU set L#{25} P#{25}
OpenMP thread 2: PU set L#{26} P#{26}
OpenMP thread 3: PU set L#{27} P#{27}
OpenMP thread 4: PU set L#{28} P#{28}
OpenMP thread 5: PU set L#{29} P#{29}
OpenMP thread 6: PU set L#{30} P#{30}
OpenMP thread 7: PU set L#{31} P#{31}
INFO (hwloc): Extracting CPU/cache/memory properties:
There are 1 PUs per core (aka hardware SMT threads)
There are 1 threads per core (aka SMT threads used)
Cache (unknown name) has type "data" depth 1
size 16384 linesize 64 associativity 4 stride 4096, for 1 PUs
Cache (unknown name) has type "unified" depth 2
size 2097152 linesize 64 associativity 16 stride 131072, for 1 PUs
Cache (unknown name) has type "unified" depth 3
size 6291456 linesize 64 associativity 64 stride 98304, for 8 PUs
Memory has type "local" depth 2
size 17179475968 pagesize 4096, for 8 PUs
Memory has type "global" depth 2
size 68717903872 pagesize 4096, for 32 PUs
}}}
I attach stdout for both runs. The new runscript's output is in
qc0-new.out and the unmodified one in qc0-vanilla.out.
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1527#comment:3>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list