[ET Trac] [Einstein Toolkit] #626: Recovery fails in AHFinderDirect RecoverML with out-of-bounds assertion in CarpetLib
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Tue May 22 11:25:19 CDT 2012
#626: Recovery fails in AHFinderDirect RecoverML with out-of-bounds assertion in
CarpetLib
-----------------------+----------------------------------------------------
Reporter: rhaas | Owner: eschnett
Type: defect | Status: new
Priority: critical | Milestone:
Component: Carpet | Version:
Resolution: | Keywords:
-----------------------+----------------------------------------------------
Comment (by eschnett):
I have tested this with the parameter files checkpointML.par and
recoverML.par from AHFinderDirect's test suite.
- Of course, as they are, the parameter files pass.
- When I comment out the "ML_BSSN::timelevels = 2" in both parameter
files, I see an error upon recovery (more below).
- When I then re-introduce this setting for recovering (using the "bad"
checkpoint file), everything seems fine again.
The error Carpet reports is because Carpet cannot determine a "current
time" associated with the oldest time level. Because of sub-cycling, these
times are generally different for each refinement level. How many "current
times" Carpet stores depends on the parameter "prolongation_order_time".
It is unfortunate that there is a disconnect between this parameter and
the number of time level that the flesh is allocating for variables. A
work-around seems possible, but I don't want to introduce this before this
release unless necessary. This work-around would likely consist of a flag,
passed down into CarpetLib, indicating that the current time is not known,
which still allows synchronising, but disallows e.g. prolongation or
restriction. This will likely avoid this problem.
However, since I cannot reproduce the problem, and since allocating 3
timelevels in a unigrid run is most likely an oversight anyway, the
cleaner solution seems to be to set "ML_BSSN::timelevels = 2" in the
parameter file, either all the time, or before recovering. If I change
this parameter only before recovering, CarpetIOHDF5 warns about unused
datasets in the checkpoint file.
If you believe that there is a problem that I'm just not seeing, then
please:
- Try to reproduce my steps above to see whether you obtain different
results
- Report exactly which version of Carpet and AHFinderDirect you are using
(revision numbers)
- Please post your parameter files (both), as well as the output you
obtain (both stdout and stderr for both)
- Describe other relevant details, e.g. the number of MPI processes, the
machine you are using, etc.
- If you give a backtrace, please either use gdb, or use "addr2line" to
convert hex addresses to line numbers
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/626#comment:15>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list