[Users] Issue with Multiple Node Simulation on cluster
Peter Diener
diener at cct.lsu.edu
Wed Dec 21 03:55:04 CST 2022
Dear Spendan,
You say your simulations performed 2840 timesteps in half an hour on 32
procs, which is 5680 timesteps per hour. Running for a full day you get
132105 timesteps, i.e. 5504 timesteps per hour. So you're right there is a
small difference in speed. However, remember that the grid structure will
be changing as the black holes move across the grid, so some variation in
speed is to be expected. I think the small difference you observed is
within the natural range of variation.
Cheers,
Peter
On Thu, 15 Dec 2022, Spandan Sarma 19306 wrote:
> Dear Erik and Steven,
>
> Thank you so much for the suggestions. We changed the runscript to add -x
> OMP_NUMTHREADS to the command line and it worked in solving the issue with
> the total number of threads being 144. Now it sets to 32 (equal to the
> number of procs).
>
> Also, the iterations have increased to 132105 for 32 procs (24 hr walltime)
> compared to just 240 before. Although this is a huge increase, we expected
> it to be a bit more. For a shorter walltime (30 mins) we received iterations
> - 2840, 2140, 1216 for procs - 32, 16, 8. Are there any more changes that we
> can do to improve on this?
>
> The new runscript and the output file (as a drive link) are attached below
> (no changes were made to the machine file, option list and the submit script
> from before).
>
> p32_omp.out
>
> On Thu, Dec 15, 2022 at 2:49 PM Spandan Sarma 19306 <spandan19 at iiserb.ac.in>
> wrote:
> Dear Erik and Steven,
>
> Thank you so much for the suggestions. We changed the runscript to add
> -x OMP_NUMTHREADS to the command line and it worked in solving the
> issue with the total number of threads being 144. Now it sets to 32
> (equal to the number of procs).
>
> Also, the iterations have increased to 132105 for 32 procs (24 hr
> walltime) compared to just 240 before. Although this is a huge
> increase, we expected it to be a bit more. For a shorter walltime (30
> mins) we received iterations - 2840, 2140, 1216 for procs - 32, 16, 8.
> Are there any more changes that we can do to improve on this?
>
> The new runscript and the output file for 32 procs are attached below
> (no changes were made to the machine file, option list and the submit
> script from before).
>
> On Fri, Dec 9, 2022 at 8:13 PM Steven R. Brandt <sbrandt at cct.lsu.edu>
> wrote:
> It's not too late to do a check, though, to see if all
> other nodes have
> the same OMP_NUM_THREADS value. Maybe that's the warning?
> It sounds like
> it should be an error.
>
> --Steve
>
> On 12/8/2022 5:23 PM, Erik Schnetter wrote:
> > Steve
> >
> > Code that runs as part of the Cactus executable is
> running too late
> > for this. At that time, OpenMP has already been
> initialized.
> >
> > There is the environment variable "CACTUS_NUM_THREADS"
> which is
> > checked at run time, but only if it is set (for backward
> > compatibility). Most people do not bother setting it,
> leaving this
> > error undetected. There is a warning output, but these
> are generally
> > ignored.
> >
> > -erik
> >
> > On Thu, Dec 8, 2022 at 3:48 PM Steven R. Brandt
> <sbrandt at cct.lsu.edu> wrote:
> >> We could probably add some startup code in which MPI
> broadcasts the
> >> OMP_NUM_THREADS setting to all the other processes and
> either checks the
> >> value of the environment variable or calls
> omp_set_num_threads() or some
> >> such.
> >>
> >> --Steve
> >>
> >> On 12/8/2022 9:03 AM, Erik Schnetter wrote:
> >>> Spandan
> >>>
> >>> The problem is likely that MPI does not automatically
> forward your
> >>> OpenMP setting to the other nodes. You are setting the
> environment
> >>> variable OMP_NUM_THREADS in the run script, and it is
> likely necessary
> >>> to forward this environment variable to the other
> processes as well.
> >>> Your MPI documentation will tell you how to do this.
> This is likely an
> >>> additional option you need to pass when calling
> "mpirun".
> >>>
> >>> -erik
> >>>
> >>> On Thu, Dec 8, 2022 at 2:50 AM Spandan Sarma 19306
> >>> <spandan19 at iiserb.ac.in> wrote:
> >>>> Hello,
> >>>>
> >>>>
> >>>> This mail is in continuation to the ticket, “Issue
> with compiling ET on cluster”, by Shamim.
> >>>>
> >>>>
> >>>> So after Roland’s suggestion, we found that using the
> –prefix <openmpi-directory> command along with hostfile
> worked successfully in simulating a multiple node
> simulation in our HPC.
> >>>>
> >>>>
> >>>> Now we find that the BNSM gallery simulation evolves
> for only 240 iterations on 2 nodes (16+16 procs, 24 hr
> walltime), which is very slow with respect to, simulation
> on 1 node (16 procs, 24 hr walltime) evolved for 120988
> iterations. The parallelization process goes well within 1
> node, we received iterations - 120988, 67756, 40008 for
> procs - 16, 8, 4 (24 hr walltime) respectively. We are
> unable to understand what is causing this issue when
> openmpi is given 2 nodes (16+16 procs).
> >>>>
> >>>>
> >>>> In the output files we found the following, which may
> be an indication towards the issue:
> >>>>
> >>>> IINFO (Carpet): MPI is enabled
> >>>>
> >>>> INFO (Carpet): Carpet is running on 32 processes
> >>>>
> >>>> INFO (Carpet): This is process 0
> >>>>
> >>>> INFO (Carpet): OpenMP is enabled
> >>>>
> >>>> INFO (Carpet): This process contains 1 threads, this
> is thread 0
> >>>>
> >>>> INFO (Carpet): There are 144 threads in total
> >>>>
> >>>> INFO (Carpet): There are 4.5 threads per process
> >>>>
> >>>> INFO (Carpet): This process runs on host n129,
> pid=20823
> >>>>
> >>>> INFO (Carpet): This process runs on 1 core: 0
> >>>>
> >>>> INFO (Carpet): Thread 0 runs on 1 core: 0
> >>>>
> >>>> INFO (Carpet): This simulation is running in 3
> dimensions
> >>>>
> >>>> INFO (Carpet): Boundary specification for map 0:
> >>>>
> >>>> nboundaryzones: [[3,3,3],[3,3,3]]
> >>>>
> >>>> is_internal : [[0,0,0],[0,0,0]]
> >>>>
> >>>> is_staggered : [[0,0,0],[0,0,0]]
> >>>>
> >>>> shiftout : [[1,0,1],[0,0,0]]
> >>>>
> >>>> WARNING level 1 from host n131 process 21
> >>>>
> >>>> in thorn Carpet, file
> /home2/mallick/ET9/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:426:
> >>>>
> >>>> -> The number of threads for this process is
> larger its number of cores. This may indicate a
> performance problem.
> >>>>
> >>>>
> >>>> This is something that we couldn’t understand as we
> asked for only 32 procs, with num-threads set to 1. The
> command that we used to submit our job was:
> >>>>
> >>>> ./simfactory/bin/sim create-submit p32_mpin_npn
> --procs=32 --ppn=16 --num-threads=1 --ppn-used=16
> --num-smt=1 --parfile=par/nsnstohmns1.par
> --walltime=24:10:00
> >>>>
> >>>>
> >>>> I have attached the out file, runscript,
> submitscript, optionlist, machine file for reference.
> Thanks in advance for help.
> >>>>
> >>>>
> >>>> Sincerely,
> >>>>
> >>>> --
> >>>> Spandan Sarma
> >>>> BS-MS' 19
> >>>> Department of Physics (4th Year),
> >>>> IISER Bhopal
> >>>> _______________________________________________
> >>>> Users mailing list
> >>>> Users at einsteintoolkit.org
> >>>>
> http://lists.einsteintoolkit.org/mailman/listinfo/users
> >>>
> >> _______________________________________________
> >> Users mailing list
> >> Users at einsteintoolkit.org
> >> http://lists.einsteintoolkit.org/mailman/listinfo/users
> >
> >
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
>
>
>
> --
> Spandan Sarma
> BS-MS' 19Department of Physics (4th Year),
> IISER Bhopal
>
>
>
> --
> Spandan Sarma
> BS-MS' 19Department of Physics (4th Year),
> IISER Bhopal
>
>
More information about the Users
mailing list