[Users] optimal nodes, mpi tasks, and cpus per mpi task?

Fri Apr 3 13:10:22 CDT 2020

Hello Bill,

no real answer, but at least some pointers maybe. There is an older
email thread in which Jim Healy looked into the best setup for
Stampede2:

http://lists.einsteintoolkit.org/pipermail/users/2018-January/006007.html

where Jim included a nice plot (linked on that page as a "binary
attachment") showing speed vs. number of MPI ranks (and thus inverse
number of CPUs per task):

http://lists.einsteintoolkit.org/pipermail/users/attachments/20180120/e87cdc37/attachment-0001.png

He then also tried some of the task-based parallelism that was being
explored at that time (they may be in the toolkit by now, but not
enabled since they were quite memory hungry), eg:

http://lists.einsteintoolkit.org/pipermail/users/2018-January/006032.html

with a plot:

http://lists.einsteintoolkit.org/pipermail/users/attachments/20180126/681dcd3a/attachment-0001.png

showing significant speedup when using the task-based code when using
24 nodes but none when using only 4 nodes.

So basically: if you can get away with it, then usually 1 CPU per MPI
tasks is fastest, but once you use many nodes you tend to end up in a
situation where a handful of MPI ranks per node is often best (~8 or so
I'd say for the unmodified ET code on Stampede2 SKX). Though in order
to really see that effect you need to use hundreds of nodes.

Yours,
Roland

> Okay just a little advice from the community, running a qc0 BBH merger 
> example on our HPC at Vanderbilt and trying to tune my parameters.  WE 
> use Slurm / SBATCH and the ones I am working with are:
> 
> o amount of memory per node
> 
> o number of nodes
> 
> o numpy of MPI tasks
> 
> o if more than one cpu per task helps
> 
> One example was 4 GB per node, 4 nodes, 4 MPI tasks, and 2 CPUs per 
> task.  Was a little slower than 1 CPU per task---perhaps memory.  I have 
> learned from you all that ETK is a memory hog more than a CPU hog, ur, 
> to say lots of memory helps vs lots of cores.
> 
> Any examples you have of simple, vanilla BBH would be good.  My SBATCH 
> script includes
> 
> #SBATCH --mem 4000         # amount of memory per node
> #SBATCH --nodes=4         # Like -24, number of nodes on which to run
> #SBATCH -n 4              # Total number of mpi tasks requested, def 1 
> task/1 cpu
> #SBATCH -c 2              # Number of cpus per mpi task
> #SBATCH -t 1:00:00         # Run time (d-hh:mm:ss)
> 
> and was told to use srun and not mpirun -np XXXX,
> 
> 
> myparFile="qc0-mclachlan.par"
> myCactusExe="/labs/einstein/20191028/Cactus/exe/cactus_sim"
> 
> ##echo "mpirun -np 4 $myCactusExe $myparFile"
> ##mpirun -np 4 $myCactusExe $myparFile
> echo "srun $myCactusExe $myparFile"
> srun $myCactusExe $myparFile
> 
> Eventually I want to get Simfactory running and see several HPC's that 
> use Slurm in the machine database directory.  Still I think I need to 
> understand these parameters to tune the scripts Simfactory uses.
> 
> thanks, bill e.g.
> 

-- 
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20200403/338c9907/attachment.bin