[Users] Issue with Multiple Node Simulation on cluster

Steven R. Brandt sbrandt at cct.lsu.edu
Fri Dec 9 08:42:08 CST 2022


It's not too late to do a check, though, to see if all other nodes have 
the same OMP_NUM_THREADS value. Maybe that's the warning? It sounds like 
it should be an error.

--Steve

On 12/8/2022 5:23 PM, Erik Schnetter wrote:
> Steve
>
> Code that runs as part of the Cactus executable is running too late
> for this. At that time, OpenMP has already been initialized.
>
> There is the environment variable "CACTUS_NUM_THREADS" which is
> checked at run time, but only if it is set (for backward
> compatibility). Most people do not bother setting it, leaving this
> error undetected. There is a warning output, but these are generally
> ignored.
>
> -erik
>
> On Thu, Dec 8, 2022 at 3:48 PM Steven R. Brandt <sbrandt at cct.lsu.edu> wrote:
>> We could probably add some startup code in which MPI broadcasts the
>> OMP_NUM_THREADS setting to all the other processes and either checks the
>> value of the environment variable or calls omp_set_num_threads() or some
>> such.
>>
>> --Steve
>>
>> On 12/8/2022 9:03 AM, Erik Schnetter wrote:
>>> Spandan
>>>
>>> The problem is likely that MPI does not automatically forward your
>>> OpenMP setting to the other nodes. You are setting the environment
>>> variable OMP_NUM_THREADS in the run script, and it is likely necessary
>>> to forward this environment variable to the other processes as well.
>>> Your MPI documentation will tell you how to do this. This is likely an
>>> additional option you need to pass when calling "mpirun".
>>>
>>> -erik
>>>
>>> On Thu, Dec 8, 2022 at 2:50 AM Spandan Sarma 19306
>>> <spandan19 at iiserb.ac.in> wrote:
>>>> Hello,
>>>>
>>>>
>>>> This mail is in continuation to the ticket, “Issue with compiling ET on cluster”, by Shamim.
>>>>
>>>>
>>>> So after Roland’s suggestion, we found that using the –prefix <openmpi-directory> command along with hostfile worked successfully in simulating a multiple node simulation in our HPC.
>>>>
>>>>
>>>> Now we find that the BNSM gallery simulation evolves for only 240 iterations on 2 nodes (16+16 procs, 24 hr walltime), which is very slow with respect to, simulation on 1 node (16 procs, 24 hr walltime) evolved for 120988 iterations. The parallelization process goes well within 1 node, we received iterations - 120988, 67756, 40008 for procs - 16, 8, 4 (24 hr walltime) respectively. We are unable to understand what is causing this issue when openmpi is given 2 nodes (16+16 procs).
>>>>
>>>>
>>>> In the output files we found the following, which may be an indication towards the issue:
>>>>
>>>> IINFO (Carpet): MPI is enabled
>>>>
>>>> INFO (Carpet): Carpet is running on 32 processes
>>>>
>>>> INFO (Carpet): This is process 0
>>>>
>>>> INFO (Carpet): OpenMP is enabled
>>>>
>>>> INFO (Carpet): This process contains 1 threads, this is thread 0
>>>>
>>>> INFO (Carpet): There are 144 threads in total
>>>>
>>>> INFO (Carpet): There are 4.5 threads per process
>>>>
>>>> INFO (Carpet): This process runs on host n129, pid=20823
>>>>
>>>> INFO (Carpet): This process runs on 1 core: 0
>>>>
>>>> INFO (Carpet): Thread 0 runs on 1 core: 0
>>>>
>>>> INFO (Carpet): This simulation is running in 3 dimensions
>>>>
>>>> INFO (Carpet): Boundary specification for map 0:
>>>>
>>>>      nboundaryzones: [[3,3,3],[3,3,3]]
>>>>
>>>>      is_internal   : [[0,0,0],[0,0,0]]
>>>>
>>>>      is_staggered  : [[0,0,0],[0,0,0]]
>>>>
>>>>      shiftout      : [[1,0,1],[0,0,0]]
>>>>
>>>> WARNING level 1 from host n131 process 21
>>>>
>>>>     in thorn Carpet, file /home2/mallick/ET9/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:426:
>>>>
>>>>     -> The number of threads for this process is larger its number of cores. This may indicate a performance problem.
>>>>
>>>>
>>>> This is something that we couldn’t understand as we asked for only 32 procs, with num-threads set to 1. The command that we used to submit our job was:
>>>>
>>>>    ./simfactory/bin/sim create-submit p32_mpin_npn --procs=32 --ppn=16 --num-threads=1 --ppn-used=16 --num-smt=1 --parfile=par/nsnstohmns1.par --walltime=24:10:00
>>>>
>>>>
>>>> I have attached the out file, runscript, submitscript, optionlist, machine file for reference. Thanks in advance for help.
>>>>
>>>>
>>>> Sincerely,
>>>>
>>>> --
>>>> Spandan Sarma
>>>> BS-MS' 19
>>>> Department of Physics (4th Year),
>>>> IISER Bhopal
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at einsteintoolkit.org
>>>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>>
>> _______________________________________________
>> Users mailing list
>> Users at einsteintoolkit.org
>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>
>


More information about the Users mailing list