[Users] Issue with Multiple Node Simulation on cluster

Spandan Sarma 19306 spandan19 at iiserb.ac.in
Tue Dec 27 10:01:35 CST 2022


Thank you, Peter, Samuel, Steve, and Erik for the suggestions and comments.
We will try to check for more options for optimizing the performance.

Regards,
Spandan Sarma

On Wed, 21 Dec 2022, 15:25 Peter Diener, <diener at cct.lsu.edu> wrote:

> Dear Spendan,
>
> You say your simulations performed 2840 timesteps in half an hour on 32
> procs, which is 5680 timesteps per hour. Running for a full day you get
> 132105 timesteps, i.e. 5504 timesteps per hour. So you're right there is a
> small difference in speed. However, remember that the grid structure will
> be changing as the black holes move across the grid, so some variation in
> speed is to be expected. I think the small difference you observed is
> within the natural range of variation.
>
> Cheers,
>
>    Peter
>
>
> On Thu, 15 Dec 2022, Spandan Sarma 19306 wrote:
>
> > Dear Erik and Steven,
> >
> > Thank you so much for the suggestions. We changed the runscript to add -x
> > OMP_NUMTHREADS to the command line and it worked in solving the issue
> with
> > the total number of threads being 144. Now it sets to 32 (equal to the
> > number of procs).
> >
> > Also, the iterations have increased to 132105 for 32 procs (24 hr
> walltime)
> > compared to just 240 before. Although this is a huge increase, we
> expected
> > it to be a bit more. For a shorter walltime (30 mins) we received
> iterations
> > - 2840, 2140, 1216 for procs - 32, 16, 8. Are there any more changes
> that we
> > can do to improve on this?
> >
> > The new runscript and the output file (as a drive link) are attached
> below
> > (no changes were made to the machine file, option list and the submit
> script
> > from before).
> >
> > p32_omp.out
> >
> > On Thu, Dec 15, 2022 at 2:49 PM Spandan Sarma 19306 <
> spandan19 at iiserb.ac.in>
> > wrote:
> >       Dear Erik and Steven,
> >
> > Thank you so much for the suggestions. We changed the runscript to add
> > -x OMP_NUMTHREADS to the command line and it worked in solving the
> > issue with the total number of threads being 144. Now it sets to 32
> > (equal to the number of procs).
> >
> > Also, the iterations have increased to 132105 for 32 procs (24 hr
> > walltime) compared to just 240 before. Although this is a huge
> > increase, we expected it to be a bit more. For a shorter walltime (30
> > mins) we received iterations - 2840, 2140, 1216 for procs - 32, 16, 8.
> > Are there any more changes that we can do to improve on this?
> >
> > The new runscript and the output file for 32 procs are attached below
> > (no changes were made to the machine file, option list and the submit
> > script from before).
> >
> > On Fri, Dec 9, 2022 at 8:13 PM Steven R. Brandt <sbrandt at cct.lsu.edu>
> > wrote:
> >       It's not too late to do a check, though, to see if all
> >       other nodes have
> >       the same OMP_NUM_THREADS value. Maybe that's the warning?
> >       It sounds like
> >       it should be an error.
> >
> >       --Steve
> >
> >       On 12/8/2022 5:23 PM, Erik Schnetter wrote:
> >       > Steve
> >       >
> >       > Code that runs as part of the Cactus executable is
> >       running too late
> >       > for this. At that time, OpenMP has already been
> >       initialized.
> >       >
> >       > There is the environment variable "CACTUS_NUM_THREADS"
> >       which is
> >       > checked at run time, but only if it is set (for backward
> >       > compatibility). Most people do not bother setting it,
> >       leaving this
> >       > error undetected. There is a warning output, but these
> >       are generally
> >       > ignored.
> >       >
> >       > -erik
> >       >
> >       > On Thu, Dec 8, 2022 at 3:48 PM Steven R. Brandt
> >       <sbrandt at cct.lsu.edu> wrote:
> >       >> We could probably add some startup code in which MPI
> >       broadcasts the
> >       >> OMP_NUM_THREADS setting to all the other processes and
> >       either checks the
> >       >> value of the environment variable or calls
> >       omp_set_num_threads() or some
> >       >> such.
> >       >>
> >       >> --Steve
> >       >>
> >       >> On 12/8/2022 9:03 AM, Erik Schnetter wrote:
> >       >>> Spandan
> >       >>>
> >       >>> The problem is likely that MPI does not automatically
> >       forward your
> >       >>> OpenMP setting to the other nodes. You are setting the
> >       environment
> >       >>> variable OMP_NUM_THREADS in the run script, and it is
> >       likely necessary
> >       >>> to forward this environment variable to the other
> >       processes as well.
> >       >>> Your MPI documentation will tell you how to do this.
> >       This is likely an
> >       >>> additional option you need to pass when calling
> >       "mpirun".
> >       >>>
> >       >>> -erik
> >       >>>
> >       >>> On Thu, Dec 8, 2022 at 2:50 AM Spandan Sarma 19306
> >       >>> <spandan19 at iiserb.ac.in> wrote:
> >       >>>> Hello,
> >       >>>>
> >       >>>>
> >       >>>> This mail is in continuation to the ticket, “Issue
> >       with compiling ET on cluster”, by Shamim.
> >       >>>>
> >       >>>>
> >       >>>> So after Roland’s suggestion, we found that using the
> >       –prefix <openmpi-directory> command along with hostfile
> >       worked successfully in simulating a multiple node
> >       simulation in our HPC.
> >       >>>>
> >       >>>>
> >       >>>> Now we find that the BNSM gallery simulation evolves
> >       for only 240 iterations on 2 nodes (16+16 procs, 24 hr
> >       walltime), which is very slow with respect to, simulation
> >       on 1 node (16 procs, 24 hr walltime) evolved for 120988
> >       iterations. The parallelization process goes well within 1
> >       node, we received iterations - 120988, 67756, 40008 for
> >       procs - 16, 8, 4 (24 hr walltime) respectively. We are
> >       unable to understand what is causing this issue when
> >       openmpi is given 2 nodes (16+16 procs).
> >       >>>>
> >       >>>>
> >       >>>> In the output files we found the following, which may
> >       be an indication towards the issue:
> >       >>>>
> >       >>>> IINFO (Carpet): MPI is enabled
> >       >>>>
> >       >>>> INFO (Carpet): Carpet is running on 32 processes
> >       >>>>
> >       >>>> INFO (Carpet): This is process 0
> >       >>>>
> >       >>>> INFO (Carpet): OpenMP is enabled
> >       >>>>
> >       >>>> INFO (Carpet): This process contains 1 threads, this
> >       is thread 0
> >       >>>>
> >       >>>> INFO (Carpet): There are 144 threads in total
> >       >>>>
> >       >>>> INFO (Carpet): There are 4.5 threads per process
> >       >>>>
> >       >>>> INFO (Carpet): This process runs on host n129,
> >       pid=20823
> >       >>>>
> >       >>>> INFO (Carpet): This process runs on 1 core: 0
> >       >>>>
> >       >>>> INFO (Carpet): Thread 0 runs on 1 core: 0
> >       >>>>
> >       >>>> INFO (Carpet): This simulation is running in 3
> >       dimensions
> >       >>>>
> >       >>>> INFO (Carpet): Boundary specification for map 0:
> >       >>>>
> >       >>>>      nboundaryzones: [[3,3,3],[3,3,3]]
> >       >>>>
> >       >>>>      is_internal   : [[0,0,0],[0,0,0]]
> >       >>>>
> >       >>>>      is_staggered  : [[0,0,0],[0,0,0]]
> >       >>>>
> >       >>>>      shiftout      : [[1,0,1],[0,0,0]]
> >       >>>>
> >       >>>> WARNING level 1 from host n131 process 21
> >       >>>>
> >       >>>>     in thorn Carpet, file
> >
>  /home2/mallick/ET9/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:426:
> >       >>>>
> >       >>>>     -> The number of threads for this process is
> >       larger its number of cores. This may indicate a
> >       performance problem.
> >       >>>>
> >       >>>>
> >       >>>> This is something that we couldn’t understand as we
> >       asked for only 32 procs, with num-threads set to 1. The
> >       command that we used to submit our job was:
> >       >>>>
> >       >>>>    ./simfactory/bin/sim create-submit p32_mpin_npn
> >       --procs=32 --ppn=16 --num-threads=1 --ppn-used=16
> >       --num-smt=1 --parfile=par/nsnstohmns1.par
> >       --walltime=24:10:00
> >       >>>>
> >       >>>>
> >       >>>> I have attached the out file, runscript,
> >       submitscript, optionlist, machine file for reference.
> >       Thanks in advance for help.
> >       >>>>
> >       >>>>
> >       >>>> Sincerely,
> >       >>>>
> >       >>>> --
> >       >>>> Spandan Sarma
> >       >>>> BS-MS' 19
> >       >>>> Department of Physics (4th Year),
> >       >>>> IISER Bhopal
> >       >>>> _______________________________________________
> >       >>>> Users mailing list
> >       >>>> Users at einsteintoolkit.org
> >       >>>>
> >       http://lists.einsteintoolkit.org/mailman/listinfo/users
> >       >>>
> >       >> _______________________________________________
> >       >> Users mailing list
> >       >> Users at einsteintoolkit.org
> >       >> http://lists.einsteintoolkit.org/mailman/listinfo/users
> >       >
> >       >
> >       _______________________________________________
> >       Users mailing list
> >       Users at einsteintoolkit.org
> >       http://lists.einsteintoolkit.org/mailman/listinfo/users
> >
> >
> >
> > --
> > Spandan Sarma
> > BS-MS' 19Department of Physics (4th Year),
> > IISER Bhopal
> >
> >
> >
> > --
> > Spandan Sarma
> > BS-MS' 19Department of Physics (4th Year),
> > IISER Bhopal
> >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20221227/99a14b12/attachment.html 


More information about the Users mailing list