[ET Trac] [Einstein Toolkit] #1473: Carpet segfaults in Shutdown

Einstein Toolkit trac-noreply at einsteintoolkit.org
Thu May 1 04:10:47 CDT 2014


#1473: Carpet segfaults in Shutdown
---------------------+------------------------------------------------------
  Reporter:  knarf   |       Owner:  eschnett           
      Type:  defect  |      Status:  new                
  Priority:  minor   |   Milestone:                     
 Component:  Carpet  |     Version:  development version
Resolution:          |    Keywords:                     
---------------------+------------------------------------------------------

Comment (by rhaas):

 ok. This is non trivial.

 The reason for the segfault is actually due to the way Cacuts/Carpet
 interact when initially creating the group Carpet::timing_levels which is
 defined as
 {{{
 CCTK_REAL timing_levels TYPE=array DIM=1 SIZE=max_refinement_levels
 DISTRIB=constant TAGS='checkpoint="no"'
 {
   time_level time_level_count
 } "Per-level timing information"
 }}}
 Notice that its SIZE depends on the Carpet parameter
 max_refinement_levels. This is valid and often used.

 The problem occurs when we recover from a checkpoint using a very minimal
 recover parfile (such as the test does) that does not set
 Carpet::max_refinement_levels in the recovery parfile. In that case the
 group is created so early (basically when Carpet is activated) inside of
 configs/bns_all/bindings/Variables/Carpet.c that max_refinement_levels has
 not yet been recovered from the checkpoint (since the thorns are just
 being activated) so still has its default value of 1.

 Carpet then contains a loop
 {{{
     for (int rl=0; rl<max_refinement_levels; ++rl) {
       time_level      [rl] = 0.0;
       time_level_count[rl] = 0.0;
     }
 }}}
 in Carpet/src/Timing.cc line 202. This loop writes past the end of the
 allocated data for time_level (since only 1 entry was allocated due to
 max_refinement_level having been 1 when the group was created but being
 recovered to its original value of 4 from the checkpoint). Since this is
 just a short array, we don't get a segfault but just overwrite malloc's
 control data which leads to a double free() error message and eventually a
 segfault when we free the data in the array during SHUTDOWN.

 So: this is actually dangerous and also quite hard to fix since it
 basically requires that Carpet restores the grid structure of not just
 grid functions but also of grid arrays.

 A simple fix for the test is to add max_refinement_levels = 4 to the
 recovery parfile.

 Thankfully I doubt that this bug is commonly triggered in the wild since
 we usually use the same parfile for checkpointing and recovery.

 I was actually hoping that this would give me some insight into a nasty
 checkpoint recovery bug on bluewaters, but from the looks of it, this
 seems unlikely ;-).

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1473#comment:4>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list