[ET Trac] [Einstein Toolkit] #626: Recovery fails in AHFinderDirect RecoverML with out-of-bounds assertion in CarpetLib

Einstein Toolkit trac-noreply at einsteintoolkit.org
Tue May 22 11:25:19 CDT 2012


#626: Recovery fails in AHFinderDirect RecoverML with out-of-bounds assertion in
CarpetLib
-----------------------+----------------------------------------------------
  Reporter:  rhaas     |       Owner:  eschnett
      Type:  defect    |      Status:  new     
  Priority:  critical  |   Milestone:          
 Component:  Carpet    |     Version:          
Resolution:            |    Keywords:          
-----------------------+----------------------------------------------------

Comment (by eschnett):

 I have tested this with the parameter files checkpointML.par and
 recoverML.par from AHFinderDirect's test suite.
 - Of course, as they are, the parameter files pass.
 - When I comment out the "ML_BSSN::timelevels = 2" in both parameter
 files, I see an error upon recovery (more below).
 - When I then re-introduce this setting for recovering (using the "bad"
 checkpoint file), everything seems fine again.

 The error Carpet reports is because Carpet cannot determine a "current
 time" associated with the oldest time level. Because of sub-cycling, these
 times are generally different for each refinement level. How many "current
 times" Carpet stores depends on the parameter "prolongation_order_time".
 It is unfortunate that there is a disconnect between this parameter and
 the number of time level that the flesh is allocating for variables. A
 work-around seems possible, but I don't want to introduce this before this
 release unless necessary. This work-around would likely consist of a flag,
 passed down into CarpetLib, indicating that the current time is not known,
 which still allows synchronising, but disallows e.g. prolongation or
 restriction. This will likely avoid this problem.

 However, since I cannot reproduce the problem, and since allocating 3
 timelevels in a unigrid run is most likely an oversight anyway, the
 cleaner solution seems to be to set "ML_BSSN::timelevels = 2" in the
 parameter file, either all the time, or before recovering. If I change
 this parameter only before recovering, CarpetIOHDF5 warns about unused
 datasets in the checkpoint file.

 If you believe that there is a problem that I'm just not seeing, then
 please:
 - Try to reproduce my steps above to see whether you obtain different
 results
 - Report exactly which version of Carpet and AHFinderDirect you are using
 (revision numbers)
 - Please post your parameter files (both), as well as the output you
 obtain (both stdout and stderr for both)
 - Describe other relevant details, e.g. the number of MPI processes, the
 machine you are using, etc.
 - If you give a backtrace, please either use gdb, or use "addr2line" to
 convert hex addresses to line numbers

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/626#comment:15>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list