[Users] Failing tests on Expanse
Gabriele Bozzola
bozzola.gabriele at gmail.com
Thu Jun 2 12:31:15 CDT 2022
Hello,
Reverting to srun fixes the problem. I updated the master branches for the
testsuite
results and simfactory.
Gabriele
On Thu, Jun 2, 2022 at 9:12 AM Gabriele Bozzola <bozzola.gabriele at gmail.com>
wrote:
> Hi Roland,
>
> That sounds reasonable. I think I was originally using srun, but was
> recommended
> to move to ibrun. I will try with srun to see if it works, in which case I
> will update the
> simfactory entry and the testsuite results.
>
> Gabrieel
>
> On Thu, Jun 2, 2022 at 8:03 AM Roland Haas <rhaas at illinois.edu> wrote:
>
>> Hello Gabriele,
>>
>> ok, I can at least partially answer this. Indeed RNS's A2 test is code
>> to use only 1 MPI rank:
>>
>> TEST rnsA2
>> {
>> PROCS 1
>> }
>>
>> and thus the most likely reason is that ibrun just pulls the number of
>> MPI ranks from SLURM rather than from whatever simfactory tries to use.
>>
>> Since ibrun is no longer documented on the SDSC page (at least I do not
>> see it on https://www.sdsc.edu/support/user_guides/expanse.html), maybe
>> the easiest fix is to remove it and use the srun command they document
>> now?
>>
>> Yours,
>> Roland
>>
>> > Hello Gabriele,
>> >
>> > hmm.
>> >
>> > > /home/sbozzolo/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:148:
>> > > -> The environment variable CACTUS_NUM_PROCS is set to 1, but there
>> are 2
>> > > MPI processes. This may indicate a severe problem with the MPI startup
>> > > mechanism.
>> >
>> > > IBRUN: launch command: srun -n 2 --ntasks-per-node 2
>> > >
>> /expanse/lustre/projects/uic383/sbozzolo/ettests_2proc/SIMFACTORY/exe/cactus_sim
>>
>> >
>> > Looking at these, I would have expected that CACTUS_NUM_PROCS is set to
>> > 2 given that -n is 2 (being the number of MPI ranks).
>> >
>> > The current submitscript uses ibrun though current documentation uses
>> > srun. Maybe changing to srun helps? Though the srun command does seem
>> > to have 2 MPI procs in the way you expect to.
>> >
>> > Can you check (in the RunScript in
>> > simulations/foo/output-0000/SIMFACTORY) what CACTUS_NUM_PROCS is set to?
>> >
>> > If this works with "regular" runs but fails with the testsuite using
>> > --testsuite then the issue is most likely related to the complicated
>> > method simfactory has to use to set the number of MPI ranks.
>> >
>> > I would check if the failing test is actually runnable only on 1 MPI
>> > rank (set in test.ccl). In that case, Cactus will try to run in it in a
>> > 2 MPI rank test suite but use only 1 MPI rank. Possibly ibrun ignores
>> > Cactus' request and uses only information provided by SLURM.
>> >
>> > Yours,
>> > Roland
>> >
>>
>> --
>> My email is as private as my paper mail. I therefore support encrypting
>> and signing email messages. Get my PGP key from http://pgp.mit.edu .
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20220602/8d2a3b90/attachment.html
More information about the Users
mailing list