[Users] Getting too many threads per process started on "remote" nodes
Erik Schnetter
schnetter at cct.lsu.edu
Fri Jan 31 08:01:10 CST 2020
Anthony
Thus sounds as if the environment variable OMP_NUM_THREADS was not sent to
the second node. This would be the fault of the mpirun command. You might
need to use a particular option.
-erik
On Thu, Jan 30, 2020 at 22:17 Shoup, Anthony <shoup.31 at osu.edu> wrote:
> Hi all,
>
> I am running ETK (2019_10) on a home built cluster consisting of two nodes
> (8 cores, 16 threads, 64GB 4.3 GHz each). I just finished my second node
> and am trying to run a simulation (BBHMedRes) over both nodes. For starters
> I am just running one process (one thread per process) on each node. When
> I execute my simfactory submit command, I get one process with one thread
> on the node I submitted the simulation on. However, I get one process with
> 16 threads on the second node which I don't want. When I run on just the
> first node, the number of processes and threads per process I get are just
> what I specify in the simfactory submit command. If I submit the
> simulation on the second node and just run on the second node I get
> processs/threads just what I specify in the simfactory submit command. Its
> only when I run on multiply nodes that don't get the # of processes/threads
> that I specify. Is there something I am doing wrong? I am using OpenMPI.
>
> Thanks for any help, Tony...
>
> Relevant data is:
>
>
> 1. RunScript:
>
> #!/bin/sh
>
> # This runscript is used internally by simfactory as a template during
> the
> # sim setup and sim setup-silent commands
> # Edit at your own risk
>
> echo "Preparing:"
> set -x # Output commands
> set -e # Abort on errors
>
> cd @RUNDIR at -active
>
> echo "Checking:"
> pwd
> hostname
> date
>
> echo "Environment:"
> export CACTUS_NUM_PROCS=@NUM_PROCS@
> export CACTUS_NUM_THREADS=@NUM_THREADS@
> export GMON_OUT_PREFIX=gmon.out
> export OMP_NUM_THREADS=@NUM_THREADS@
> env | sort > SIMFACTORY/ENVIRONMENT
>
> echo "Starting:"
> export CACTUS_STARTTIME=$(date +%s)
>
> if [ ${CACTUS_NUM_PROCS} = 1 ]; then
> if [ @RUNDEBUG@ -eq 0 ]; then
> @EXECUTABLE@ -L 3 @PARFILE@
> else
> gdb --args @EXECUTABLE@ -L 3 @PARFILE@
> fi
> else
> mpirun --hostfile /home/mpiuser/mpi-hosts -np @NUM_PROCS@ @EXECUTABLE@
> -L 3 @PARFILE@
> fi
>
> echo "Stopping:"
> date
> echo "Done."
>
> 2. mpi-hosts file:
>
> localhost slots=1
> RZNode2 slots=1
>
> 3. simfactory submit command: ./simfactory/bin/sim submit BBHMedRes
> --parfile=par/BBHMedRes.par --procs=2 --num-smt=1 --num-threads=1
> --ppn-used=1 --ppn=1 --wallt
> ime=99:0:0 | cat
>
> 4. Machine file on first node (RZNode1):
>
>
> [RZNode1]
>
> # This machine description file is used internally by simfactory as a
> template
> # during the sim setup and sim setup-silent commands
> # Edit at your own risk
> # Machine description
> nickname = RZNode1
> name = RZNode1
> location = somewhere
> description = Whatever
> status = personal
>
> # Access to this machine
> hostname = RZNode1
> aliaspattern = ^generic\.some\.where$
>
> # Source tree management
> sourcebasedir = /home/Cactus
> optionlist = generic.cfg
> submitscript = generic.sub
> runscript = generic.run
> make = make -j at MAKEJOBS@
> basedir = /home/mpiuser/simulations
> ppn = 1 # was 16
> max-num-threads = 1 # was 16
> num-threads = 1 # was 16
> nodes = 2
> submit = exec nohup @SCRIPTFILE@ < /dev/null > @RUNDIR@
> /@SIMULATION_NAME at .out 2> @RUNDIR@/@SIMULATION_NAME at .err & echo $!
> getstatus = ps @JOB_ID@
> stop = kill @JOB_ID@
> submitpattern = (.*)
> statuspattern = "^ *@JOB_ID@ "
> queuedpattern = $^
> runningpattern = ^
> holdingpattern = $^
> exechost = echo localhost
> exechostpattern = (.*)
> stdout = cat @SIMULATION_NAME at .out
> stderr = cat @SIMULATION_NAME at .err
> stdout-follow = tail -n 100 -f @SIMULATION_NAME at .out
> @SIMULATION_NAME at .err
>
> 5. Machine file on second node (RZNode2):
>
> [RZNode2]
>
> # This machine description file is used internally by simfactory as a
> template
> # during the sim setup and sim setup-silent commands
> # Edit at your own risk
> # Machine description
> nickname = RZNode2
> name = RZNode2
> location = somewhere
> description = Whatever
> status = personal
>
> # Access to this machine
> hostname = RZNode2
> aliaspattern = ^generic\.some\.where$
>
> # Source tree management
> sourcebasedir = /home/ET_2019_10
> optionlist = generic.cfg
> submitscript = generic.sub
> runscript = generic.run
> make = make -j at MAKEJOBS@
> basedir = /home/mpiuser/simulations
> ppn = 1
> max-num-threads = 1
> num-threads = 1
> nodes = 1
> submit = exec nohup @SCRIPTFILE@ < /dev/null > @RUNDIR@
> /@SIMULATION_NAME at .out 2> @RUNDIR@/@SIMULATION_NAME at .err & echo $!
> getstatus = ps @JOB_ID@
> stop = kill @JOB_ID@
> submitpattern = (.*)
> statuspattern = "^ *@JOB_ID@ "
> queuedpattern = $^
> runningpattern = ^
> holdingpattern = $^
> exechost = echo localhost
> exechostpattern = (.*)
> stdout = cat @SIMULATION_NAME at .out
> stderr = cat @SIMULATION_NAME at .err
> stdout-follow = tail -n 100 -f @SIMULATION_NAME at .out
> @SIMULATION_NAME at .err
>
>
>
>
> *Anthony Shoup* PhD, Senior Lecturer
> College of Arts & Sciences, College of Engineering Departments of
> Physics, Astronomy, EEIC
> 315 Science Bldg. | 4250 Campus Dr. Lima, OH 45807
> 419-995-8018 Office | 419-516-2257 Mobile
> *shoup.31 at osu.edu*
> <https://email.osu.edu/owa/redir.aspx?C=j5WpnJiBk0W5oVlCbtvB-xiCkA_lbdEIi9hlk7ByHiG7ARrxjwDFmAW8S_XespJbMLJRblY5JKc.&URL=mailto%3ashoup.31%40osu.edu>
> *osu.edu*
> <https://email.osu.edu/owa/redir.aspx?C=j5WpnJiBk0W5oVlCbtvB-xiCkA_lbdEIi9hlk7ByHiG7ARrxjwDFmAW8S_XespJbMLJRblY5JKc.&URL=http%3a%2f%2fosu.edu>
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
>
--
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20200131/e041f63d/attachment-0001.html
More information about the Users
mailing list