[Users] Changing the number of MPI processes on recovery using simfactory

Erik Schnetter schnetter at cct.lsu.edu
Tue Jul 13 18:24:50 CDT 2021


Changing the number of MPI processes and OpenMP threads with
Simfactory when restarting works the same way as setting them in the
first place. For example, your first run might be submitted with

./simfactory/bin/sim submit poisson --parfile=poisson.par --procs=120

You can restart this simulations with

./simfactory/bin/sim submit poisson

which will re-use the original settings. You can also restart with

./simfactory/bin/sim submit poisson --procs=160 --num-threads=8

to change these settings.

If this does not work, then your machine might be configured wrong.
For example, you say that you specify the number of MPI processes by
setting "--num-threads", which sounds suspicious.

The default-generated machine configuration only works for
workstations or laptops. If you run this script on an HPC system, it
will generate a nonsense configuration, and might even hide a "real"
configuration if one is present.


On Tue, Jul 13, 2021 at 5:23 PM Konrad Topolski
<k.topolski2 at student.uw.edu.pl> wrote:
> Hi,
> I am currently trying to find what the optimal number of MPI processes is for my purposes.
> I have managed to change the number of MPI processes when restarting a simulation from a checkpoint - but using the bare executable, not simfactory.
> Now, I would like to learn how to do it in simfactory.
> I have learned that to successfully steer the number of threads per 1 MPI process (which, combined with a total number of threads requested, yields the total number of MPI processes), I change the num-thread variable in the machine.ini file.
> This is probably (certainly?) suboptimal, so if there's a proper way, I'd like to learn it.
> I submit/recover simulations via .
> /simfactory/bin/sim submit <sim_name> --parfile <parfile_name>  --recover  --procs NUM_PROCS  --machine=okeanos --configuration=okeanos
> If I don't use the --machine option specifying my cluster, it will default to some config with max nodes = 1 (generic?). Which is why I steer MPI processes via num-thread.
> Trying to recover a simulation via simfactory with a new machine file (with num-thread changed) yields an error message:
> INFO (Carpet): MPI is enabled
> INFO (Carpet): Carpet is running on 4 processes
> WARNING level 0 from host nid00392 process 0
>   in thorn Carpet, file /lustre/tetyda/home/topolski/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:148:
>   -> The environment variable CACTUS_NUM_PROCS is set to 96, but there are 4 MPI processes. This may indicate a severe problem with the MPI startup mechanism.
> What can I do to recover a simulation via simfactory and use a different number of MPI processes?
> While I'm at it, can I also change parameters such as the number of refinement levels or make new guesses for AHFinderDirect, in case the previously-used parameters did not provide high enough resolution for a successful find?
> Best regards
> Konrad Topolski
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users

Erik Schnetter <schnetter at cct.lsu.edu>

More information about the Users mailing list