[ET Trac] [Einstein Toolkit] #626: Recovery fails in AHFinderDirect RecoverML with out-of-bounds assertion in CarpetLib
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Tue Oct 18 12:32:53 CDT 2011
#626: Recovery fails in AHFinderDirect RecoverML with out-of-bounds assertion in
CarpetLib
---------------------+------------------------------------------------------
Reporter: rhaas | Owner: eschnett
Type: defect | Status: new
Priority: major | Milestone: ET_2011_10
Component: Carpet | Version:
Resolution: | Keywords:
---------------------+------------------------------------------------------
Comment (by eschnett):
Thank you.
This is an unfortunate coincidence of events.
Cactus itself does not have a notion of a maximum number of timelevels;
this number can be different for each grid group. However, for proper time
interpolation etc., Carpet needs to assume such a global maximum, and it
uses "prolongation_order_time+1" for this. (This parameter should probably
be renamed.) That is, this is the maximum time level that Carpet can
handle for interpolation; individual grid groups may have more timelevels,
but Carpet will not store/provide any meta-data for them, and the
application can only access these as they are (i.e. in the same way as
with PUGH).
In this particular parameter file, prolongation_order_time is 2 (the
default), but the metric seems to have 3 time level. That means that the
metric has a pre-previous time level that is most likely never used, but
which nevertheless has storage, and which is checkpointed and recovered. I
assume that nothing in the code actually uses the 3rd timelevel, except
that timelevel rotation will move old data into it.
During recovery, all timelevels need to be synchronised. This is special,
since during evolution only the current timelevel would be synchronised.
All timelevels need to be synchronised because the number of processes may
change, and the ghost zones need to be filled. It would instead be
possible to read in the ghost zone data from file, but synchronising is
probably much more efficient.
The major difference between Carpet-git and Carpet-hg is that Carpet now
explicitly stores which time is associated with each timelevel, instead of
assuming a constant delta time. This can save time interpolation during
regridding (leading to slightly different, but still valid results), and
also allows e.g. changing the time step size during evolution. However,
this also means that the times for the past timelevels need to be
initialised correctly, so Carpet needs to know how many old timelevels
there can be.
And this is where things go wrong: Carpet has no meta-data for the 3rd
(unused) timelevel. During synchronisation, Carpet accesses (although this
is not actually needed) the time associated with this timelevel, and this
operation fails.
There are several ways to correct this:
* Allocate only 2 timelevels for the metric (since only 2 are used anyway)
* Set prolongation_order_time to 3 instead of 2 (since you want to use 3
timelevels, you may as well tell Carpet about this)
* Introduce a special case for synchronising after recovering that somehow
doesn't access Carpet's timelevel metadata (which are not be required for
synchronising)
* Wrap accessing timelevels during communication in an if statement, so
that the error is not triggered
I would obviously prefer one of the first two options. In addition, the
error message should obviously be improved.
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/626#comment:1>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list