[Users] CarpetIOHDF5 recover failure with manual topology

Yosef Zlochower yosef at astro.rit.edu
Fri Sep 13 09:27:54 CDT 2019


Thanks. The issue seems to be that with manual topology a region_t 
structure has it's map entry incorrectly set

What happens is, the in

bool gh::recompose there is the check
   bool const do_recompose = level_did_change(rl);

In level_did_change, the level is considered to change because

the new region_t is
 
region_t(extent=([41,0,0]:[80,10,10]:[1,1,1]/[41,0,0]:[80,10,10]/[40,11,11]/4840),outer_boundaries=[[0,1,1],[1,1,1]],map=51,processor=1)

while the old 
isregion_t(extent=([41,0,0]:[80,10,10]:[1,1,1]/[41,0,0]:[80,10,10]/[40,11,11]/4840),outer_boundaries=[[0,1,1],[1,1,1]],map=0,processor=1)

The only difference is the new map is 51.

If I add a line Carpet/src/Recompose.cc:SplitRegions_AsSpecified
to force the map entry to be zero, then all seems to work.


Without the change, Carpet recomposes the grid but never calls the 
postregrid functions. Hence the Nans in grid::x



On 9/12/19 2:37 PM, Steven R. Brandt wrote:
> I said on the call there was an easy way to trace what function call you
> are in...
> 
> Add this to your thornlist...
> 
> !TARGET  = $ARR
> !TYPE = git
> !URL = https://github.com/stevenrbrandt/ReadWriteDiagnostics.git
> !REPO_PATH=$2
> !CHECKOUT =
> ReadWriteDiagnostics/FCall
> 
> Then add FCall to your ActiveThorns and you'll see a message printed
> before and after each scheduled function.
> 
> --Steve
> 
> On 9/10/2019 3:03 PM, Yosef Zlochower wrote:
>> It seems that there may be multiple issues. The parfile I sent before
>> tests for NaNs in grid::x. grid::x is not a checkpointed variable. It
>> seems that with manual topology, the grid::x is filled with nans during
>> the recover step (the pointer is actually pointing to a new area of
>> memory). With standard topology, the array pointer and contents do not
>> change on recover. I have also seen NaNs in the recovered variables, but
>> this parfile doesn't show that.
>>
>>
>>
>> On 9/9/19 4:24 PM, Yosef Zlochower wrote:
>>> Hi,
>>>
>>>      I have been trying to debug why some runs I was performing could not
>>> recover from a checkpoint file, but would otherwise proceed as normal.
>>>
>>> I attached a minimalist parfile showing the problem. A small grid is
>>> manually distributed over 8 processors and terminates at iteration 2. An
>>> attempt at recover fails with nans on grid::x. If the manual topology
>>> section is commented out, no problems are seen.
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at einsteintoolkit.org
>>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>>
>> _______________________________________________
>> Users mailing list
>> Users at einsteintoolkit.org
>> http://lists.einsteintoolkit.org/mailman/listinfo/users
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
> 


More information about the Users mailing list