[Users] XSEDE's Expanse and failing tests

Roland Haas rhaas at illinois.edu
Wed Aug 18 13:00:27 CDT 2021


Hello Gabriele,

> Frontera and Stampede2 use 24/28 MPI processes, but the tests still pass.
> I am particularly looking at the test ADMMass/tov_carpet.par, where the
> numbers are off, but no error is thrown. Another example is
> Exact/de_Sitter.par.
Hmm. Would be interesting to see if the same error happens eg on a
workstation where one compiles with gcc but then runs with say 32 MPI
ranks (even if that oversubscribes the workstation). It is possible. of
course, that there is a long surviving race condition or bug in these
thorns.

We also have some issues with some tests failing on Blue Waters but I
have no idea why and it is not reproducible on any other system and is
deep in some F77 code.

> Other tests do fail because of Carpet errors, which might be what you are
> describing.
ok. 


>> Can you create a pull request for the "linux" architecture file with
> > the changes for the AMD compiler you found, please? So far it sees you
> > mostly only changed the detection part, does it then not also require
> > some changes in the "set values" part of the file? Eg default values
> > for optimization, preprocessor or so?  
> 
> 
> Where is the repo?
The repository is the Cactus "flesh" repo. You can find out which one
is is using git. eg:

cd lib/make/known-architectures
git remote -v

which in this case reports:

https://bitbucket.org/cactuscode/cactus.git

The directory of the checkout can be obtained by either looking at the
symbolic links, or via:

cd lib/make/known-architectures
pwd -P

which shows the full, all links resolved path to the current working
directory.

> I am not too familiar with what that file is supposed to set. But, I only
> changed
> what was needed to at least start the compilation.
If you are using an option list then I would have hoped that nothing
in the file requires changing. You might expect to get a warning about
this being an untested architecture, but that should have been it. So in
your case it actually failed b/c it could not identify the compiler?
That is somewhat annoying.... and indeed the case looking at the
fragment:

if ! test "x$LINUX_C_COMP" = "xunknown" ; then
  echo "Internal error: did not expect Linux C compiler to be
$LINUX_C_COMP" exit 2
fi

What would need adjusting would be the cases statements and you should
add options similar to eg what is being provided for GNU. Something
like:

: ${CFLAGS='-std=gnu99'}
: ${C_OPTIMISE_FLAGS='-O3'}
CC_VERSION="`$CC -v 2>&1 | grep -i "AOCC version" | head -n1`"
: ${C_OPENMP_FLAGS='-fopenmp'}

where I am not adding any support for an AOCC compiler that does not
even support OpenMP. I do not think we will find any such compiler
nowadays (since we require compilers to support eg c++11 I find it very
unlikely that we would find a compiler suite that does support C++11
but not OpenMP).

The colon ":" is the POSIX compliant name for "true" and we do not
really care about it, only about the ${FOO=bar} default value variable
assignment that the shell performs before executing "true".

Yours,
Roland

-- 
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20210818/0a3e5d7a/attachment.bin 


More information about the Users mailing list