[Users] failure of CCTK_WARN on Frontera cluster
Roland Haas
rhaas at illinois.edu
Wed Jan 6 12:36:31 CST 2021
Hello all,
I have never seen that before. Looks like a memory corruption (write
beyond the allocated memory block) which tends to not get noticed until
the next free().
My suggestion would be to look for malloc debug options:
https://www.gnu.org/software/libc/manual/html_node/Heap-Consistency-Checking.html
in particular the section on the MALLOC_CHECK_ env variable:
--8<--
When MALLOC_CHECK_ is set to a non-zero value, a special (less efficient) implementation is used which is designed to be tolerant against simple errors, such as double calls of free with the same argument, or overruns of a single byte (off-by-one bugs).
--8<--
You may also want to add:
-roe -b no
to the cactus command line in the run script to disable output
buffering and get one out and err file per MPI rank.
Yours
Roland
> We are having some issues running the Einstein Toolkit with ourSpritz code
> on the Frontera cluster at TACC and I'm wondering if any of you ever saw
> the error you can find in the stderr attached to this email.
>
> The error is
> *corrupted size vs. prev_size: 0x00002b3728001060 ****
> and it looks like the code crashes within a call to CCTK_WARN.
>
> We used both the May and November 2020 versions of the Einstein Toolkit
> and they both produce the same error. We are currently a bit lost, also
> because the same code and par file run fine on another cluster (MarconiA3
> at CINECA in Italy). We actually also tried a different parfile (that calls
> anyway the same routine) and the same problem happens on Frontera (but not
> on MarconiA3).
>
> It may well be a bug within Spritz when calling the CCTK_WARN function
> (that for some reason does not show up on other machines) or a compilation
> problem on Frontera. Have you seen a similar error? Do you have some
> suggestions on how to debug it?
>
> Thanks,
> Bruno
>
--
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20210106/7ce59925/attachment.bin
More information about the Users
mailing list