<div dir="ltr"><div dir="ltr">Hi Roland,<div><br></div><div>thanks for your answer.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The test suites are quick running parfiles with small grids, so running<br>them on large numbers of MPI ranks (they are designed for 1 or 2 MPI<br>ranks) can lead to unexpected situations (such as an MPI rank having no<br>grid points at all).</blockquote><div><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Generally, if the tests work for 1,2,4 ranks (4 being the largest<br>number of procs requested by any test.ccl file) then this is sufficient.</blockquote><div><br></div><div>Frontera and Stampede2 use 24/28 MPI processes, but the tests still pass.</div></div><div>I am particularly looking at the test ADMMass/tov_carpet.par, where the</div><div>numbers are off, but no error is thrown. Another example is Exact/de_Sitter.par.</div><div>Other tests do fail because of Carpet errors, which might be what you are</div><div>describing.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Can you create a pull request for the "linux" architecture file with<br>the changes for the AMD compiler you found, please? So far it sees you<br>mostly only changed the detection part, does it then not also require<br>some changes in the "set values" part of the file? Eg default values<br>for optimization, preprocessor or so?</blockquote><div><br></div><div>Where is the repo?</div><div><br></div></div><div>I am not too familiar with what that file is supposed to set. But, I only changed</div><div>what was needed to at least start the compilation. </div><div><br></div><div>Gabriele</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 18, 2021 at 8:20 AM Roland Haas <<a href="mailto:rhaas@illinois.edu">rhaas@illinois.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello Gabriele,<br>
<br>
Thank you for contributing these.<br>
<br>
The test suites are quick running parfiles with small grids, so running<br>
them on large numbers of MPI ranks (they are designed for 1 or 2 MPI<br>
ranks) can lead to unexpected situations (such as an MPI rank having no<br>
grid points at all).<br>
<br>
Generally, if the tests work for 1,2,4 ranks (4 being the largest<br>
number of procs requested by any test.ccl file) then this is sufficient.<br>
<br>
In principle even running on more MPI ranks should work, so if you know<br>
which tests fail with the larger number of MPI ranks and were to list<br>
them in a ticket, maybe someone could look into this.<br>
<br>
Note that you can undersubscribe compute node, in particular for<br>
tests, if you do not need / want to use all cores.<br>
<br>
Can you create a pull request for the "linux" architecture file with<br>
the changes for the AMD compiler you found, please? So far it sees you<br>
mostly only changed the detection part, does it then not also require<br>
some changes in the "set values" part of the file? Eg default values<br>
for optimization, preprocessor or so?<br>
<br>
Yours,<br>
Roland<br>
<br>
> Hello,<br>
> <br>
> Two days ago, I opened a PR to the simfactory repo to add Expanse,<br>
> the newest machine at the San Diego Supercomputing Center, based on<br>
> AMD Epyc "Rome" CPUs and part of XSEDE. In the meantime, I realized<br>
> that some tests are failing miserably, but I couldn't figure out why.<br>
> <br>
> Before I describe what I found, let me start with a side node on AMD<br>
> compilers.<br>
> <br>
> <side node><br>
> <br>
> There are four compilers available on Expanse: GNU, Intel, AMD, and PGI.<br>
> I did not touch the PGI compilers. I briefly tried (and failed) to compile<br>
> with<br>
> the AMD compilers (aocc and flang). I did not try hard, and it seems that<br>
> most of the libraries on Expanse are compiled with gcc anyways.<br>
> <br>
> A first step to support these compilers is adding the lines:<br>
> <br>
> elif test "`$F90 --version 2>&1 | grep AMD`" ; then<br>
> LINUX_F90_COMP=AMD<br>
> else<br>
> <br>
> elif test "`$CC --version 2>&1 | grep AMD`" ; then<br>
> LINUX_C_COMP=AMD<br>
> fi<br>
> <br>
> elif test "`$CC --version 2>&1 | grep AMD`" ; then<br>
> LINUX_CXX_COMP=AMD<br>
> fi<br>
> <br>
> in the obvious places in flesh/lib/make/known-architecture/linux.<br>
> <br>
> </side node><br>
> <br>
> I successfully compiled the Einstein Toolkit with<br>
> - gcc 10.2.0 and OpenMPI 4.0.4<br>
> - gcc 9.2.0 and OpenMPI 4.0.4<br>
> - intel 2019 and Intel MPI 2019<br>
> <br>
> I noticed that some tests, like ADMMass/tov_carpet.par, gave<br>
> completely incorrect results. For example, the expected value is 1.3,<br>
> but I would find 1.6.<br>
> <br>
> I disabled all the optimizations, but the test would keep failing. At the<br>
> end, I noticed that if I ran with 8/16/32 MPI processes per node, and<br>
> the corresponding number of OpenMP threads (128/N_MPI), the test<br>
> would fail, but if I ran with 4/2/1 MPI processes, the test would pass.<br>
> <br>
> Most of my experiments were with gcc 10, but the test fails also with<br>
> the Intel suite.<br>
> <br>
> I tried increasing the OMP_STACK_SIZE to a very large value, but<br>
> it didn't help.<br>
> <br>
> Any idea of what the problem might be?<br>
> <br>
> Gabriele<br>
<br>
<br>
<br>
-- <br>
My email is as private as my paper mail. I therefore support encrypting<br>
and signing email messages. Get my PGP key from <a href="http://pgp.mit.edu" rel="noreferrer" target="_blank">http://pgp.mit.edu</a> .<br>
</blockquote></div></div>