[Users] failure of CCTK_WARN on Frontera cluster

Roland Haas rhaas at illinois.edu
Wed Jan 6 12:36:31 CST 2021


Hello all,

I have never seen that before. Looks like a memory corruption (write
beyond the allocated memory block) which tends to not get noticed until
the next free().

My suggestion would be to look for malloc debug options:

https://www.gnu.org/software/libc/manual/html_node/Heap-Consistency-Checking.html

in particular the section on the MALLOC_CHECK_ env variable:

--8<--
When MALLOC_CHECK_ is set to a non-zero value, a special (less efficient) implementation is used which is designed to be tolerant against simple errors, such as double calls of free with the same argument, or overruns of a single byte (off-by-one bugs). 
--8<--

You may also want to add:

-roe -b no

to the cactus command line in the run script to disable output
buffering and get one out and err file per MPI rank.

Yours
Roland

> We are having some issues running the Einstein Toolkit with ourSpritz code
> on the Frontera cluster at TACC and I'm wondering if any of you ever saw
> the error you can find in the stderr attached to this email.
> 
>    The error is
> *corrupted size vs. prev_size: 0x00002b3728001060 ****
> and it looks like the code crashes within a call to CCTK_WARN.
> 
>    We used both the May and November 2020 versions of the Einstein Toolkit
> and they both produce the same error. We are currently a bit lost, also
> because the same code and par file run fine on another cluster (MarconiA3
> at CINECA in Italy). We actually also tried a different parfile (that calls
> anyway the same routine) and the same problem happens on Frontera (but not
> on MarconiA3).
> 
>   It may well be a bug within Spritz when calling the CCTK_WARN function
> (that for some reason does not show up on other machines) or a compilation
> problem on Frontera. Have you seen a similar error? Do you have some
> suggestions on how to debug it?
> 
> Thanks,
> Bruno
> 


-- 
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20210106/7ce59925/attachment.bin 


More information about the Users mailing list