[Users] XSEDE's Expanse and failing tests

Gabriele Bozzola bozzola.gabriele at gmail.com
Wed Aug 18 10:02:08 CDT 2021


Two days ago, I opened a PR to the simfactory repo to add Expanse,
the newest machine at the San Diego Supercomputing Center, based on
AMD Epyc "Rome" CPUs and part of XSEDE. In the meantime, I realized
that some tests are failing miserably, but I couldn't figure out why.

Before I describe what I found, let me start with a side node on AMD

<side node>

There are four compilers available on Expanse: GNU, Intel, AMD, and PGI.
I did not touch the PGI compilers. I briefly tried (and failed) to compile
the AMD compilers (aocc and flang). I did not try hard, and it seems that
most of the libraries on Expanse are compiled with gcc anyways.

A first step to support these compilers is adding the lines:

   elif test "`$F90 --version 2>&1 | grep AMD`" ; then

 elif test "`$CC --version 2>&1 | grep AMD`" ; then

 elif test "`$CC --version 2>&1 | grep AMD`" ; then

in the obvious places in flesh/lib/make/known-architecture/linux.

</side node>

I successfully compiled the Einstein Toolkit with
- gcc 10.2.0 and OpenMPI 4.0.4
- gcc 9.2.0 and OpenMPI 4.0.4
- intel 2019 and Intel MPI 2019

I noticed that some tests, like ADMMass/tov_carpet.par, gave
completely incorrect results. For example, the expected value is 1.3,
but I would find 1.6.

I disabled all the optimizations, but the test would keep failing. At the
end, I noticed that if I ran with 8/16/32 MPI processes per node, and
the corresponding number of OpenMP threads (128/N_MPI), the test
would fail, but if I ran with 4/2/1 MPI processes, the test would pass.

Most of my experiments were with gcc 10, but the test fails also with
the Intel suite.

I tried increasing the OMP_STACK_SIZE to a very large value, but
it didn't help.

Any idea of what the problem might be?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20210818/d7dfa0fd/attachment-0001.html 

More information about the Users mailing list