[Users] Failing tests on Expanse

Roland Haas rhaas at illinois.edu
Thu Jun 2 09:57:00 CDT 2022


Hello Gabriele,

hmm.

> /home/sbozzolo/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:148:
>   -> The environment variable CACTUS_NUM_PROCS is set to 1, but there are 2  
> MPI processes. This may indicate a severe problem with the MPI startup
> mechanism.

> IBRUN:  launch command: srun -n 2 --ntasks-per-node 2
> /expanse/lustre/projects/uic383/sbozzolo/ettests_2proc/SIMFACTORY/exe/cactus_sim

Looking at these, I would have expected that CACTUS_NUM_PROCS is set to
2 given that -n is 2 (being the number of MPI ranks). 

The current submitscript uses ibrun though current documentation uses
srun. Maybe changing to srun helps? Though the srun command does seem
to have 2 MPI procs in the way you expect to.

Can you check (in the RunScript in
simulations/foo/output-0000/SIMFACTORY) what CACTUS_NUM_PROCS is set to?

If this works with "regular" runs but fails with the testsuite using
--testsuite then the issue is most likely related to the complicated
method simfactory has to use to set the number of MPI ranks.

I would check if the failing test is actually runnable only on 1 MPI
rank (set in test.ccl). In that case, Cactus will try to run in it in a
2 MPI rank test suite but use only 1 MPI rank. Possibly ibrun ignores
Cactus' request and uses only information provided by SLURM.

Yours,
Roland

-- 
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20220602/41149ff8/attachment-0001.bin 


More information about the Users mailing list