[Users] XSEDE's Expanse and failing tests

Gabriele Bozzola bozzola.gabriele at gmail.com
Wed Aug 18 14:04:21 CDT 2021


Hi Roland,

Hmm. Would be interesting to see if the same error happens eg on a
> workstation where one compiles with gcc but then runs with say 32 MPI
> ranks (even if that oversubscribes the workstation). It is possible. of
> course, that there is a long surviving race condition or bug in these
> thorns.


I realized I was not comparing the same things. In fact, on Frontera I ran
the
tests with up to 2 MPI processes. When I restrict to 1/2 MPI processes,
almost all tests pass on Expanse, so I guess that mine was a false alarm
and everything is all right. I can upload the test results on the repo.

 So in your case it actually failed b/c it could not identify the compiler?


Yes, correct.

I set all the other variables in the option list, but at the end I didn't
end
up compiling with aocc because of some issues with external libraries
(if I remember correctly).

I can add the code for detecting aocc, but I would leave everything else to
someone that knows exactly what variables should be defined and how.

Gabriele



On Wed, Aug 18, 2021 at 11:00 AM Roland Haas <rhaas at illinois.edu> wrote:

> Hello Gabriele,
>
> > Frontera and Stampede2 use 24/28 MPI processes, but the tests still pass.
> > I am particularly looking at the test ADMMass/tov_carpet.par, where the
> > numbers are off, but no error is thrown. Another example is
> > Exact/de_Sitter.par.
> Hmm. Would be interesting to see if the same error happens eg on a
> workstation where one compiles with gcc but then runs with say 32 MPI
> ranks (even if that oversubscribes the workstation). It is possible. of
> course, that there is a long surviving race condition or bug in these
> thorns.
>
> We also have some issues with some tests failing on Blue Waters but I
> have no idea why and it is not reproducible on any other system and is
> deep in some F77 code.
>
> > Other tests do fail because of Carpet errors, which might be what you are
> > describing.
> ok.
>
>
> >> Can you create a pull request for the "linux" architecture file with
> > > the changes for the AMD compiler you found, please? So far it sees you
> > > mostly only changed the detection part, does it then not also require
> > > some changes in the "set values" part of the file? Eg default values
> > > for optimization, preprocessor or so?
> >
> >
> > Where is the repo?
> The repository is the Cactus "flesh" repo. You can find out which one
> is is using git. eg:
>
> cd lib/make/known-architectures
> git remote -v
>
> which in this case reports:
>
> https://bitbucket.org/cactuscode/cactus.git
>
> The directory of the checkout can be obtained by either looking at the
> symbolic links, or via:
>
> cd lib/make/known-architectures
> pwd -P
>
> which shows the full, all links resolved path to the current working
> directory.
>
> > I am not too familiar with what that file is supposed to set. But, I only
> > changed
> > what was needed to at least start the compilation.
> If you are using an option list then I would have hoped that nothing
> in the file requires changing. You might expect to get a warning about
> this being an untested architecture, but that should have been it. So in
> your case it actually failed b/c it could not identify the compiler?
> That is somewhat annoying.... and indeed the case looking at the
> fragment:
>
> if ! test "x$LINUX_C_COMP" = "xunknown" ; then
>   echo "Internal error: did not expect Linux C compiler to be
> $LINUX_C_COMP" exit 2
> fi
>
> What would need adjusting would be the cases statements and you should
> add options similar to eg what is being provided for GNU. Something
> like:
>
> : ${CFLAGS='-std=gnu99'}
> : ${C_OPTIMISE_FLAGS='-O3'}
> CC_VERSION="`$CC -v 2>&1 | grep -i "AOCC version" | head -n1`"
> : ${C_OPENMP_FLAGS='-fopenmp'}
>
> where I am not adding any support for an AOCC compiler that does not
> even support OpenMP. I do not think we will find any such compiler
> nowadays (since we require compilers to support eg c++11 I find it very
> unlikely that we would find a compiler suite that does support C++11
> but not OpenMP).
>
> The colon ":" is the POSIX compliant name for "true" and we do not
> really care about it, only about the ${FOO=bar} default value variable
> assignment that the shell performs before executing "true".
>
> Yours,
> Roland
>
> --
> My email is as private as my paper mail. I therefore support encrypting
> and signing email messages. Get my PGP key from http://pgp.mit.edu .
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20210818/479e3ce5/attachment-0001.html 


More information about the Users mailing list