[Users] Failing tests on Expanse

Gabriele Bozzola bozzola.gabriele at gmail.com
Thu Jun 2 11:12:29 CDT 2022


Hi Roland,

That sounds reasonable. I think I was originally using srun, but was
recommended
to move to ibrun. I will try with srun to see if it works, in which case I
will update the
simfactory entry and the testsuite results.

Gabrieel

On Thu, Jun 2, 2022 at 8:03 AM Roland Haas <rhaas at illinois.edu> wrote:

> Hello Gabriele,
>
> ok, I can at least partially answer this. Indeed RNS's A2 test is code
> to use only 1 MPI rank:
>
> TEST rnsA2
> {
>   PROCS 1
> }
>
> and thus the most likely reason is that ibrun just pulls the number of
> MPI ranks from SLURM rather than from whatever simfactory tries to use.
>
> Since ibrun is no longer documented on the SDSC page (at least I do not
> see it on https://www.sdsc.edu/support/user_guides/expanse.html), maybe
> the easiest fix is to remove it and use the srun command they document
> now?
>
> Yours,
> Roland
>
> > Hello Gabriele,
> >
> > hmm.
> >
> > > /home/sbozzolo/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:148:
> > >   -> The environment variable CACTUS_NUM_PROCS is set to 1, but there
> are 2
> > > MPI processes. This may indicate a severe problem with the MPI startup
> > > mechanism.
> >
> > > IBRUN:  launch command: srun -n 2 --ntasks-per-node 2
> > >
> /expanse/lustre/projects/uic383/sbozzolo/ettests_2proc/SIMFACTORY/exe/cactus_sim
>
> >
> > Looking at these, I would have expected that CACTUS_NUM_PROCS is set to
> > 2 given that -n is 2 (being the number of MPI ranks).
> >
> > The current submitscript uses ibrun though current documentation uses
> > srun. Maybe changing to srun helps? Though the srun command does seem
> > to have 2 MPI procs in the way you expect to.
> >
> > Can you check (in the RunScript in
> > simulations/foo/output-0000/SIMFACTORY) what CACTUS_NUM_PROCS is set to?
> >
> > If this works with "regular" runs but fails with the testsuite using
> > --testsuite then the issue is most likely related to the complicated
> > method simfactory has to use to set the number of MPI ranks.
> >
> > I would check if the failing test is actually runnable only on 1 MPI
> > rank (set in test.ccl). In that case, Cactus will try to run in it in a
> > 2 MPI rank test suite but use only 1 MPI rank. Possibly ibrun ignores
> > Cactus' request and uses only information provided by SLURM.
> >
> > Yours,
> > Roland
> >
>
> --
> My email is as private as my paper mail. I therefore support encrypting
> and signing email messages. Get my PGP key from http://pgp.mit.edu .
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20220602/66432749/attachment.html 


More information about the Users mailing list