[Users] CarpetIOHDF5 recover failure with manual topology

Steven R. Brandt sbrandt at cct.lsu.edu
Fri Sep 13 14:10:28 CDT 2019


Do you want to submit a PR?

--Steve

On 9/13/2019 9:27 AM, Yosef Zlochower wrote:
> Thanks. The issue seems to be that with manual topology a region_t 
> structure has it's map entry incorrectly set
>
> What happens is, the in
>
> bool gh::recompose there is the check
>   bool const do_recompose = level_did_change(rl);
>
> In level_did_change, the level is considered to change because
>
> the new region_t is
>
> region_t(extent=([41,0,0]:[80,10,10]:[1,1,1]/[41,0,0]:[80,10,10]/[40,11,11]/4840),outer_boundaries=[[0,1,1],[1,1,1]],map=51,processor=1) 
>
>
> while the old 
> isregion_t(extent=([41,0,0]:[80,10,10]:[1,1,1]/[41,0,0]:[80,10,10]/[40,11,11]/4840),outer_boundaries=[[0,1,1],[1,1,1]],map=0,processor=1)
>
> The only difference is the new map is 51.
>
> If I add a line Carpet/src/Recompose.cc:SplitRegions_AsSpecified
> to force the map entry to be zero, then all seems to work.
>
>
> Without the change, Carpet recomposes the grid but never calls the 
> postregrid functions. Hence the Nans in grid::x
>
>
>
> On 9/12/19 2:37 PM, Steven R. Brandt wrote:
>> I said on the call there was an easy way to trace what function call you
>> are in...
>>
>> Add this to your thornlist...
>>
>> !TARGET  = $ARR
>> !TYPE = git
>> !URL = https://github.com/stevenrbrandt/ReadWriteDiagnostics.git
>> !REPO_PATH=$2
>> !CHECKOUT =
>> ReadWriteDiagnostics/FCall
>>
>> Then add FCall to your ActiveThorns and you'll see a message printed
>> before and after each scheduled function.
>>
>> --Steve
>>
>> On 9/10/2019 3:03 PM, Yosef Zlochower wrote:
>>> It seems that there may be multiple issues. The parfile I sent before
>>> tests for NaNs in grid::x. grid::x is not a checkpointed variable. It
>>> seems that with manual topology, the grid::x is filled with nans during
>>> the recover step (the pointer is actually pointing to a new area of
>>> memory). With standard topology, the array pointer and contents do not
>>> change on recover. I have also seen NaNs in the recovered variables, 
>>> but
>>> this parfile doesn't show that.
>>>
>>>
>>>
>>> On 9/9/19 4:24 PM, Yosef Zlochower wrote:
>>>> Hi,
>>>>
>>>>      I have been trying to debug why some runs I was performing 
>>>> could not
>>>> recover from a checkpoint file, but would otherwise proceed as normal.
>>>>
>>>> I attached a minimalist parfile showing the problem. A small grid is
>>>> manually distributed over 8 processors and terminates at iteration 
>>>> 2. An
>>>> attempt at recover fails with nans on grid::x. If the manual topology
>>>> section is commented out, no problems are seen.
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at einsteintoolkit.org
>>>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at einsteintoolkit.org
>>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>> _______________________________________________
>> Users mailing list
>> Users at einsteintoolkit.org
>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>


More information about the Users mailing list