<div dir="ltr">Hi Roland,<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hmm. Would be interesting to see if the same error happens eg on a<br>workstation where one compiles with gcc but then runs with say 32 MPI<br>ranks (even if that oversubscribes the workstation). It is possible. of<br>course, that there is a long surviving race condition or bug in these<br>thorns.</blockquote><div><div><br></div><div>I realized I was not comparing the same things. In fact, on Frontera I ran the</div></div><div>tests with up to 2 MPI processes. When I restrict to 1/2 MPI processes, </div><div>almost all tests pass on Expanse, so I guess that mine was a false alarm</div><div>and everything is all right. I can upload the test results on the repo.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> So in your case it actually failed b/c it could not identify the compiler?</blockquote><div><br></div><div>Yes, correct. </div><div><br></div><div>I set all the other variables in the option list, but at the end I didn't end</div><div>up compiling with aocc because of some issues with external libraries</div><div>(if I remember correctly). </div><div><br></div><div>I can add the code for detecting aocc, but I would leave everything else to</div><div>someone that knows exactly what variables should be defined and how.</div><div><br></div><div>Gabriele</div><div><br></div><div> </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 18, 2021 at 11:00 AM Roland Haas <<a href="mailto:rhaas@illinois.edu">rhaas@illinois.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello Gabriele,<br>
<br>
> Frontera and Stampede2 use 24/28 MPI processes, but the tests still pass.<br>
> I am particularly looking at the test ADMMass/tov_carpet.par, where the<br>
> numbers are off, but no error is thrown. Another example is<br>
> Exact/de_Sitter.par.<br>
Hmm. Would be interesting to see if the same error happens eg on a<br>
workstation where one compiles with gcc but then runs with say 32 MPI<br>
ranks (even if that oversubscribes the workstation). It is possible. of<br>
course, that there is a long surviving race condition or bug in these<br>
thorns.<br>
<br>
We also have some issues with some tests failing on Blue Waters but I<br>
have no idea why and it is not reproducible on any other system and is<br>
deep in some F77 code.<br>
<br>
> Other tests do fail because of Carpet errors, which might be what you are<br>
> describing.<br>
ok. <br>
<br>
<br>
>> Can you create a pull request for the "linux" architecture file with<br>
> > the changes for the AMD compiler you found, please? So far it sees you<br>
> > mostly only changed the detection part, does it then not also require<br>
> > some changes in the "set values" part of the file? Eg default values<br>
> > for optimization, preprocessor or so? <br>
> <br>
> <br>
> Where is the repo?<br>
The repository is the Cactus "flesh" repo. You can find out which one<br>
is is using git. eg:<br>
<br>
cd lib/make/known-architectures<br>
git remote -v<br>
<br>
which in this case reports:<br>
<br>
<a href="https://bitbucket.org/cactuscode/cactus.git" rel="noreferrer" target="_blank">https://bitbucket.org/cactuscode/cactus.git</a><br>
<br>
The directory of the checkout can be obtained by either looking at the<br>
symbolic links, or via:<br>
<br>
cd lib/make/known-architectures<br>
pwd -P<br>
<br>
which shows the full, all links resolved path to the current working<br>
directory.<br>
<br>
> I am not too familiar with what that file is supposed to set. But, I only<br>
> changed<br>
> what was needed to at least start the compilation.<br>
If you are using an option list then I would have hoped that nothing<br>
in the file requires changing. You might expect to get a warning about<br>
this being an untested architecture, but that should have been it. So in<br>
your case it actually failed b/c it could not identify the compiler?<br>
That is somewhat annoying.... and indeed the case looking at the<br>
fragment:<br>
<br>
if ! test "x$LINUX_C_COMP" = "xunknown" ; then<br>
echo "Internal error: did not expect Linux C compiler to be<br>
$LINUX_C_COMP" exit 2<br>
fi<br>
<br>
What would need adjusting would be the cases statements and you should<br>
add options similar to eg what is being provided for GNU. Something<br>
like:<br>
<br>
: ${CFLAGS='-std=gnu99'}<br>
: ${C_OPTIMISE_FLAGS='-O3'}<br>
CC_VERSION="`$CC -v 2>&1 | grep -i "AOCC version" | head -n1`"<br>
: ${C_OPENMP_FLAGS='-fopenmp'}<br>
<br>
where I am not adding any support for an AOCC compiler that does not<br>
even support OpenMP. I do not think we will find any such compiler<br>
nowadays (since we require compilers to support eg c++11 I find it very<br>
unlikely that we would find a compiler suite that does support C++11<br>
but not OpenMP).<br>
<br>
The colon ":" is the POSIX compliant name for "true" and we do not<br>
really care about it, only about the ${FOO=bar} default value variable<br>
assignment that the shell performs before executing "true".<br>
<br>
Yours,<br>
Roland<br>
<br>
-- <br>
My email is as private as my paper mail. I therefore support encrypting<br>
and signing email messages. Get my PGP key from <a href="http://pgp.mit.edu" rel="noreferrer" target="_blank">http://pgp.mit.edu</a> .<br>
</blockquote></div>