[Users] Failing tests on Expanse

Roland Haas rhaas at illinois.edu
Thu Jun 9 08:34:45 CDT 2022


Hello Gabriele,

Great. Thank you. I will backport to the release branch.

Yours,
Roland

> Hello,
> 
> Reverting to srun fixes the problem. I updated the master branches for the
> testsuite
> results and simfactory.
> 
> Gabriele
> 
> On Thu, Jun 2, 2022 at 9:12 AM Gabriele Bozzola <bozzola.gabriele at gmail.com>
> wrote:
> 
> > Hi Roland,
> >
> > That sounds reasonable. I think I was originally using srun, but was
> > recommended
> > to move to ibrun. I will try with srun to see if it works, in which case I
> > will update the
> > simfactory entry and the testsuite results.
> >
> > Gabrieel
> >
> > On Thu, Jun 2, 2022 at 8:03 AM Roland Haas <rhaas at illinois.edu> wrote:
> >  
> >> Hello Gabriele,
> >>
> >> ok, I can at least partially answer this. Indeed RNS's A2 test is code
> >> to use only 1 MPI rank:
> >>
> >> TEST rnsA2
> >> {
> >>   PROCS 1
> >> }
> >>
> >> and thus the most likely reason is that ibrun just pulls the number of
> >> MPI ranks from SLURM rather than from whatever simfactory tries to use.
> >>
> >> Since ibrun is no longer documented on the SDSC page (at least I do not
> >> see it on https://urldefense.com/v3/__https://www.sdsc.edu/support/user_guides/expanse.html__;!!DZ3fjg!6AKe0V5ww0am4Al2yttt_J0jlb9QxoSIlmt7-krRMsZOPCIG_Mo_95py9qR5wZ7lc_UNn5p5hqSBzROP7fbS0QzBYvI$ ), maybe
> >> the easiest fix is to remove it and use the srun command they document
> >> now?
> >>
> >> Yours,
> >> Roland
> >>  
> >> > Hello Gabriele,
> >> >
> >> > hmm.
> >> >  
> >> > > /home/sbozzolo/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:148:  
> >> > >   -> The environment variable CACTUS_NUM_PROCS is set to 1, but there  
> >> are 2  
> >> > > MPI processes. This may indicate a severe problem with the MPI startup
> >> > > mechanism.  
> >> >  
> >> > > IBRUN:  launch command: srun -n 2 --ntasks-per-node 2
> >> > >  
> >> /expanse/lustre/projects/uic383/sbozzolo/ettests_2proc/SIMFACTORY/exe/cactus_sim
> >>  
> >> >
> >> > Looking at these, I would have expected that CACTUS_NUM_PROCS is set to
> >> > 2 given that -n is 2 (being the number of MPI ranks).
> >> >
> >> > The current submitscript uses ibrun though current documentation uses
> >> > srun. Maybe changing to srun helps? Though the srun command does seem
> >> > to have 2 MPI procs in the way you expect to.
> >> >
> >> > Can you check (in the RunScript in
> >> > simulations/foo/output-0000/SIMFACTORY) what CACTUS_NUM_PROCS is set to?
> >> >
> >> > If this works with "regular" runs but fails with the testsuite using
> >> > --testsuite then the issue is most likely related to the complicated
> >> > method simfactory has to use to set the number of MPI ranks.
> >> >
> >> > I would check if the failing test is actually runnable only on 1 MPI
> >> > rank (set in test.ccl). In that case, Cactus will try to run in it in a
> >> > 2 MPI rank test suite but use only 1 MPI rank. Possibly ibrun ignores
> >> > Cactus' request and uses only information provided by SLURM.
> >> >
> >> > Yours,
> >> > Roland
> >> >  
> >>
> >> --
> >> My email is as private as my paper mail. I therefore support encrypting
> >> and signing email messages. Get my PGP key from https://urldefense.com/v3/__http://pgp.mit.edu__;!!DZ3fjg!6AKe0V5ww0am4Al2yttt_J0jlb9QxoSIlmt7-krRMsZOPCIG_Mo_95py9qR5wZ7lc_UNn5p5hqSBzROP7fbSgjJHL7k$  .
> >>  
> >  


-- 
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20220609/56e89792/attachment.bin 


More information about the Users mailing list