[Users] Issue with Multiple Node Simulation on cluster

Erik Schnetter schnetter at gmail.com
Thu Dec 8 09:03:27 CST 2022


Spandan

The problem is likely that MPI does not automatically forward your
OpenMP setting to the other nodes. You are setting the environment
variable OMP_NUM_THREADS in the run script, and it is likely necessary
to forward this environment variable to the other processes as well.
Your MPI documentation will tell you how to do this. This is likely an
additional option you need to pass when calling "mpirun".

-erik

On Thu, Dec 8, 2022 at 2:50 AM Spandan Sarma 19306
<spandan19 at iiserb.ac.in> wrote:
>
> Hello,
>
>
> This mail is in continuation to the ticket, “Issue with compiling ET on cluster”, by Shamim.
>
>
> So after Roland’s suggestion, we found that using the –prefix <openmpi-directory> command along with hostfile worked successfully in simulating a multiple node simulation in our HPC.
>
>
> Now we find that the BNSM gallery simulation evolves for only 240 iterations on 2 nodes (16+16 procs, 24 hr walltime), which is very slow with respect to, simulation on 1 node (16 procs, 24 hr walltime) evolved for 120988 iterations. The parallelization process goes well within 1 node, we received iterations - 120988, 67756, 40008 for procs - 16, 8, 4 (24 hr walltime) respectively. We are unable to understand what is causing this issue when openmpi is given 2 nodes (16+16 procs).
>
>
> In the output files we found the following, which may be an indication towards the issue:
>
> IINFO (Carpet): MPI is enabled
>
> INFO (Carpet): Carpet is running on 32 processes
>
> INFO (Carpet): This is process 0
>
> INFO (Carpet): OpenMP is enabled
>
> INFO (Carpet): This process contains 1 threads, this is thread 0
>
> INFO (Carpet): There are 144 threads in total
>
> INFO (Carpet): There are 4.5 threads per process
>
> INFO (Carpet): This process runs on host n129, pid=20823
>
> INFO (Carpet): This process runs on 1 core: 0
>
> INFO (Carpet): Thread 0 runs on 1 core: 0
>
> INFO (Carpet): This simulation is running in 3 dimensions
>
> INFO (Carpet): Boundary specification for map 0:
>
>    nboundaryzones: [[3,3,3],[3,3,3]]
>
>    is_internal   : [[0,0,0],[0,0,0]]
>
>    is_staggered  : [[0,0,0],[0,0,0]]
>
>    shiftout      : [[1,0,1],[0,0,0]]
>
> WARNING level 1 from host n131 process 21
>
>   in thorn Carpet, file /home2/mallick/ET9/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:426:
>
>   -> The number of threads for this process is larger its number of cores. This may indicate a performance problem.
>
>
> This is something that we couldn’t understand as we asked for only 32 procs, with num-threads set to 1. The command that we used to submit our job was:
>
>  ./simfactory/bin/sim create-submit p32_mpin_npn --procs=32 --ppn=16 --num-threads=1 --ppn-used=16 --num-smt=1 --parfile=par/nsnstohmns1.par --walltime=24:10:00
>
>
> I have attached the out file, runscript, submitscript, optionlist, machine file for reference. Thanks in advance for help.
>
>
> Sincerely,
>
> --
> Spandan Sarma
> BS-MS' 19
> Department of Physics (4th Year),
> IISER Bhopal
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users



-- 
Erik Schnetter <schnetter at gmail.com>
http://www.perimeterinstitute.ca/personal/eschnetter/


More information about the Users mailing list