[Users] Changing the number of MPI processes on recovery using simfactory

Konrad Topolski k.topolski2 at student.uw.edu.pl
Wed Jul 14 05:04:52 CDT 2021


Thank you Erik - it seems that when I keep the num-threads variable in
machine.ini file, it overrides any argument given to simfactory executable.
Removing it appears to fix the problem.

If I don't provide --configuration=okeanos and --machine=okeanos though, it
defaults to any local (login) node I used to ./simfactory/bin/sim setup
simfactory.
So I suppose in each invocation I need to specify it?

After commenting out num-threads in okeanos.ini, here's what works and what
doesn't:

./simfactory/bin/sim create-submittopoltopolski at okeanos-login1:~/Cactus>
./simfactory/bin/sim submit GW150914_MPI --parfile GW150914_MPI.rpar
--define N 28 --walltime=1:00:00 --procs 192 --num-threads=4
Warning: Unknown machine name nid00069
Error: Unknown local machine nid00069. Please use 'sim setup' to create a
local machine entry from the generic template.
Aborting Simfactory.

topolski at okeanos-login1:~/Cactus> ./simfactory/bin/sim submit GW150914_MPI
--parfile GW150914_MPI.rpar --define N 28 --walltime=1:00:00 --procs 192
--num-threads=4 --machine=okeanos
Warning: Current Working directory does not match Cactus sourcetree,
changing to /home/topolski/Cactus
Warning: simulation "GW150914_MPI" does not exist or is not readable
Parameter file: /lustre/tetyda/home/topolski/Cactus/GW150914_MPI.rpar
Error: Executable /home/topolski/Cactus/exe/cactus_sim for configuration
sim does not exist or is not readable
Aborting Simfactory.

topolski at okeanos-login1:~/Cactus> ./simfactory/bin/sim create-submit
GW150914_MPI --parfile GW150914_MPI.rpar --define N 28 --walltime=1:00:00
--procs 192 --num-threads=4 --machine=okeanos --configuration=okeanos
Warning: Current Working directory does not match Cactus sourcetree,
changing to /home/topolski/Cactus
Parameter file: /lustre/tetyda/home/topolski/Cactus/GW150914_MPI.rpar
Skeleton Created
Job directory: "/home/topolski/simulations/GW150914_MPI"
Executable: "/home/topolski/Cactus/exe/cactus_okeanos"
Option list:
"/home/topolski/simulations/GW150914_MPI/SIMFACTORY/cfg/OptionList"
Submit script:
"/home/topolski/simulations/GW150914_MPI/SIMFACTORY/run/SubmitScript"
Run script:
"/home/topolski/simulations/GW150914_MPI/SIMFACTORY/run/RunScript"
Parameter file:
"/home/topolski/simulations/GW150914_MPI/SIMFACTORY/par/GW150914_MPI.rpar"
Assigned restart id: 0
Executing submit command: sbatch
/home/topolski/simulations/GW150914_MPI/output-0000/SIMFACTORY/SubmitScript
Submit finished, job id is 739787

And the last attempt yields the desired result.

So I suppose this is the correct way?


śr., 14 lip 2021 o 01:25 Erik Schnetter <schnetter at cct.lsu.edu> napisał(a):

> Konrad
>
> Changing the number of MPI processes and OpenMP threads with
> Simfactory when restarting works the same way as setting them in the
> first place. For example, your first run might be submitted with
>
> ./simfactory/bin/sim submit poisson --parfile=poisson.par --procs=120
> --num-threads=4
>
> You can restart this simulations with
>
> ./simfactory/bin/sim submit poisson
>
> which will re-use the original settings. You can also restart with
>
> ./simfactory/bin/sim submit poisson --procs=160 --num-threads=8
>
> to change these settings.
>
> If this does not work, then your machine might be configured wrong.
> For example, you say that you specify the number of MPI processes by
> setting "--num-threads", which sounds suspicious.
>
> The default-generated machine configuration only works for
> workstations or laptops. If you run this script on an HPC system, it
> will generate a nonsense configuration, and might even hide a "real"
> configuration if one is present.
>
> -erik
>
>
> On Tue, Jul 13, 2021 at 5:23 PM Konrad Topolski
> <k.topolski2 at student.uw.edu.pl> wrote:
> >
> > Hi,
> >
> > I am currently trying to find what the optimal number of MPI processes
> is for my purposes.
> > I have managed to change the number of MPI processes when restarting a
> simulation from a checkpoint - but using the bare executable, not
> simfactory.
> >
> > Now, I would like to learn how to do it in simfactory.
> >
> > I have learned that to successfully steer the number of threads per 1
> MPI process (which, combined with a total number of threads requested,
> yields the total number of MPI processes), I change the num-thread variable
> in the machine.ini file.
> > This is probably (certainly?) suboptimal, so if there's a proper way,
> I'd like to learn it.
> >
> > I submit/recover simulations via .
> > /simfactory/bin/sim submit <sim_name> --parfile <parfile_name>
> --recover  --procs NUM_PROCS  --machine=okeanos --configuration=okeanos
> >
> > If I don't use the --machine option specifying my cluster, it will
> default to some config with max nodes = 1 (generic?). Which is why I steer
> MPI processes via num-thread.
> >
> > Trying to recover a simulation via simfactory with a new machine file
> (with num-thread changed) yields an error message:
> >
> > INFO (Carpet): MPI is enabled
> > INFO (Carpet): Carpet is running on 4 processes
> > WARNING level 0 from host nid00392 process 0
> >   in thorn Carpet, file
> /lustre/tetyda/home/topolski/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:148:
> >   -> The environment variable CACTUS_NUM_PROCS is set to 96, but there
> are 4 MPI processes. This may indicate a severe problem with the MPI
> startup mechanism.
> >
> > What can I do to recover a simulation via simfactory and use a different
> number of MPI processes?
> >
> > While I'm at it, can I also change parameters such as the number of
> refinement levels or make new guesses for AHFinderDirect, in case the
> previously-used parameters did not provide high enough resolution for a
> successful find?
> >
> > Best regards
> > Konrad Topolski
> >
> >
> >
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at einsteintoolkit.org
> > http://lists.einsteintoolkit.org/mailman/listinfo/users
>
>
>
> --
> Erik Schnetter <schnetter at cct.lsu.edu>
> http://www.perimeterinstitute.ca/personal/eschnetter/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20210714/81d0defd/attachment.html 


More information about the Users mailing list