[Users] Failing tests on Expanse

Gabriele Bozzola bozzola.gabriele at gmail.com
Thu May 26 10:25:37 CDT 2022


Hello,

Some network configurations were recently changed on SDSC's
Expanse and I wanted to update the Simfactory entry to add an env
variable (as recommended by XSEDE's help desk).

I did so and ran the tests and found numerous failures. According to an
expanse_2_64.log file I have on my computer, these tests did not fail in
the past. The tests fail only for 2 MPI processes.

An example of a test that fails is rnsA2, and this is the the tail of the
log
file:

INFO (Carpet): MPI is enabled
INFO (Carpet): Carpet is running on 2 processes
WARNING level 0 from host exp-4-26.expanse.sdsc.edu process 0
  in thorn Carpet, file
/home/sbozzolo/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:148:
  -> The environment variable CACTUS_NUM_PROCS is set to 1, but there are 2
MPI processes. This may indicate a severe problem with the MPI startup
mechanism.
Rank 0 with PID 1194507 received signal 6
cactus_sim:
/home/sbozzolo/Cactus/arrangements/Carpet/Carpet/src/helpers.cc:275: int
Carpet::Abort(const cGH*, int): Assertion `0' failed.
Writing backtrace to rnsA2/backtrace.0.txt
Rank 1 with PID 1194508 received signal 6
Writing backtrace to rnsA2/backtrace.1.txt
srun: error: exp-4-26: tasks 0-1: Aborted (core dumped)
IBRUN:  launch command: srun -n 2 --ntasks-per-node 2
/expanse/lustre/projects/uic383/sbozzolo/ettests_2proc/SIMFACTORY/exe/cactus_sim
-L 3
/expanse/lustre/projects/uic383/sbozzolo/ettests_2proc/output-0000/arrangements/EinsteinInitialData/Hydro_RNSID/test/rnsA2.par

IBRUN:  MPI job exited with code: 134

Other tests behave similarly, e.g. Vaidya2:

INFO (Carpet): MPI is enabled
INFO (Carpet): Carpet is running on 2 processes
WARNING level 0 from host exp-4-26.expanse.sdsc.edu process 0
  in thorn Carpet, file
/home/sbozzolo/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:148:
  -> The environment variable CACTUS_NUM_PROCS is set to 1, but there are 2
MPI processes. This may indicate a severe problem with the MPI startup
mechanism.
Rank 0 with PID 1183519 received signal 6
Writing backtrace to Vaidya2/backtrace.0.txt
cactus_sim:
/home/sbozzolo/Cactus/arrangements/Carpet/Carpet/src/helpers.cc:275: int
Carpet::Abort(const cGH*, int): Assertion `0' failed.
Rank 1 with PID 1183520 received signal 6
Writing backtrace to Vaidya2/backtrace.1.txt
srun: error: exp-4-26: tasks 0-1: Aborted (core dumped)
IBRUN:  launch command: srun -n 2 --ntasks-per-node 2
/expanse/lustre/projects/uic383/sbozzolo/ettests_2proc/SIMFACTORY/exe/cactus_sim
-L 3
/expanse/lustre/projects/uic383/sbozzolo/ettests_2proc/output-0000/arrangements/EinsteinExact/EinsteinExact_Test/test/Vaidya2.par

IBRUN:  MPI job exited with code: 134

Given that I see in the testsuite_results repo the same failing tests
(as run by Roland), I can exclude that the new env variable that I
added is the reason for the failures.

Any idea of what is going on?

Thanks,
Gabriele
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20220526/2c099871/attachment.html 


More information about the Users mailing list