[ET Trac] [Einstein Toolkit] #1473: Carpet segfaults in Shutdown
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Thu May 1 04:10:47 CDT 2014
#1473: Carpet segfaults in Shutdown
---------------------+------------------------------------------------------
Reporter: knarf | Owner: eschnett
Type: defect | Status: new
Priority: minor | Milestone:
Component: Carpet | Version: development version
Resolution: | Keywords:
---------------------+------------------------------------------------------
Comment (by rhaas):
ok. This is non trivial.
The reason for the segfault is actually due to the way Cacuts/Carpet
interact when initially creating the group Carpet::timing_levels which is
defined as
{{{
CCTK_REAL timing_levels TYPE=array DIM=1 SIZE=max_refinement_levels
DISTRIB=constant TAGS='checkpoint="no"'
{
time_level time_level_count
} "Per-level timing information"
}}}
Notice that its SIZE depends on the Carpet parameter
max_refinement_levels. This is valid and often used.
The problem occurs when we recover from a checkpoint using a very minimal
recover parfile (such as the test does) that does not set
Carpet::max_refinement_levels in the recovery parfile. In that case the
group is created so early (basically when Carpet is activated) inside of
configs/bns_all/bindings/Variables/Carpet.c that max_refinement_levels has
not yet been recovered from the checkpoint (since the thorns are just
being activated) so still has its default value of 1.
Carpet then contains a loop
{{{
for (int rl=0; rl<max_refinement_levels; ++rl) {
time_level [rl] = 0.0;
time_level_count[rl] = 0.0;
}
}}}
in Carpet/src/Timing.cc line 202. This loop writes past the end of the
allocated data for time_level (since only 1 entry was allocated due to
max_refinement_level having been 1 when the group was created but being
recovered to its original value of 4 from the checkpoint). Since this is
just a short array, we don't get a segfault but just overwrite malloc's
control data which leads to a double free() error message and eventually a
segfault when we free the data in the array during SHUTDOWN.
So: this is actually dangerous and also quite hard to fix since it
basically requires that Carpet restores the grid structure of not just
grid functions but also of grid arrays.
A simple fix for the test is to add max_refinement_levels = 4 to the
recovery parfile.
Thankfully I doubt that this bug is commonly triggered in the wild since
we usually use the same parfile for checkpointing and recovery.
I was actually hoping that this would give me some insight into a nasty
checkpoint recovery bug on bluewaters, but from the looks of it, this
seems unlikely ;-).
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1473#comment:4>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list