[Users] optimal nodes, mpi tasks, and cpus per mpi task?
Roland Haas
rhaas at illinois.edu
Fri Apr 3 13:10:22 CDT 2020
Hello Bill,
no real answer, but at least some pointers maybe. There is an older
email thread in which Jim Healy looked into the best setup for
Stampede2:
http://lists.einsteintoolkit.org/pipermail/users/2018-January/006007.html
where Jim included a nice plot (linked on that page as a "binary
attachment") showing speed vs. number of MPI ranks (and thus inverse
number of CPUs per task):
http://lists.einsteintoolkit.org/pipermail/users/attachments/20180120/e87cdc37/attachment-0001.png
He then also tried some of the task-based parallelism that was being
explored at that time (they may be in the toolkit by now, but not
enabled since they were quite memory hungry), eg:
http://lists.einsteintoolkit.org/pipermail/users/2018-January/006032.html
with a plot:
http://lists.einsteintoolkit.org/pipermail/users/attachments/20180126/681dcd3a/attachment-0001.png
showing significant speedup when using the task-based code when using
24 nodes but none when using only 4 nodes.
So basically: if you can get away with it, then usually 1 CPU per MPI
tasks is fastest, but once you use many nodes you tend to end up in a
situation where a handful of MPI ranks per node is often best (~8 or so
I'd say for the unmodified ET code on Stampede2 SKX). Though in order
to really see that effect you need to use hundreds of nodes.
Yours,
Roland
> Okay just a little advice from the community, running a qc0 BBH merger
> example on our HPC at Vanderbilt and trying to tune my parameters. WE
> use Slurm / SBATCH and the ones I am working with are:
>
> o amount of memory per node
>
> o number of nodes
>
> o numpy of MPI tasks
>
> o if more than one cpu per task helps
>
> One example was 4 GB per node, 4 nodes, 4 MPI tasks, and 2 CPUs per
> task. Was a little slower than 1 CPU per task---perhaps memory. I have
> learned from you all that ETK is a memory hog more than a CPU hog, ur,
> to say lots of memory helps vs lots of cores.
>
> Any examples you have of simple, vanilla BBH would be good. My SBATCH
> script includes
>
> #SBATCH --mem 4000 # amount of memory per node
> #SBATCH --nodes=4 # Like -24, number of nodes on which to run
> #SBATCH -n 4 # Total number of mpi tasks requested, def 1
> task/1 cpu
> #SBATCH -c 2 # Number of cpus per mpi task
> #SBATCH -t 1:00:00 # Run time (d-hh:mm:ss)
>
> and was told to use srun and not mpirun -np XXXX,
>
>
> myparFile="qc0-mclachlan.par"
> myCactusExe="/labs/einstein/20191028/Cactus/exe/cactus_sim"
>
> ##echo "mpirun -np 4 $myCactusExe $myparFile"
> ##mpirun -np 4 $myCactusExe $myparFile
> echo "srun $myCactusExe $myparFile"
> srun $myCactusExe $myparFile
>
> Eventually I want to get Simfactory running and see several HPC's that
> use Slurm in the machine database directory. Still I think I need to
> understand these parameters to tune the scripts Simfactory uses.
>
> thanks, bill e.g.
>
--
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20200403/338c9907/attachment.bin
More information about the Users
mailing list