[Users] Problem with CarpetRegrid2/AMR
Erik Schnetter
schnetter at cct.lsu.edu
Tue Sep 6 19:03:55 CDT 2011
Hal
This is where numbers are assigned to components. The communication
schedule decides which component needs to send data to which other
component (which may be located on another process or not); this
schedule is created for each refinement level independently, and may
(if there is an error) refer to component numbers that don't exist.
This schedule is set up in dh.cc.
Can you send me the example you are currently running (your source
code and parameter file)? I will try to give it a try.
-erik
On Tue, Sep 6, 2011 at 7:44 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> On Thu, 2011-09-01 at 15:16 -0400, Erik Schnetter wrote:
>> On Thu, Sep 1, 2011 at 3:05 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> > On Thu, 2011-09-01 at 14:25 -0400, Erik Schnetter wrote:
>> >> On Thu, Sep 1, 2011 at 11:51 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>> >> > On Thu, 2011-09-01 at 11:37 -0400, Erik Schnetter wrote:
>> >> >> On Thu, Sep 1, 2011 at 10:53 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>> >> >> > On Tue, 2011-08-30 at 21:06 -0400, Erik Schnetter wrote:
>> >> >> >> On Tue, Aug 30, 2011 at 5:28 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> >> >> >> > Could I also decrease the block size? I currently have
>> >> >> >> > CarpetRegrid2::adaptive_block_size = 4, could it be smaller than that?
>> >> >> >> > Is there a restriction based on the number of ghost points?
>> >> >> >>
>> >> >> >> Yes, you can reduce the block size. I assume that both the regridding
>> >> >> >> operation and the time evolution will become slower if you do that,
>> >> >> >> because more blocks will have to be handled.
>> >> >> >
>> >> >> > Regardless of what I do, once we get past the first coarse time step,
>> >> >> > the program seems to "hang" at "INFO (Carpet): [ml=0][rl=0][m=0][tl=0]
>> >> >> > Regridding map 0...".
>> >> >> >
>> >> >> > Overall, it is in dh::regrid(do_init=true). It spends most of its time
>> >> >> > in bboxset<int, 3>::normalize() and, specifically, mostly in the loop:
>> >> >> > for (typename bset::iterator nsi = nbs.begin(); nsi != nbs.end(); ++
>> >> >> > nsi). The normalize() function does exit, however, so it is not hanging
>> >> >> > in that function.
>> >> >> >
>> >> >> > The core problem seems to be that it takes a long time to execute:
>> >> >> > boxes = boxes .shift(-dir) - boxes;
>> >> >> > in dh::regrid(do_init=true). Probably because boxes has 129064 elements.
>> >> >> > The coarse grid is now only 30^3 and I've left the regrid box size at 4.
>> >> >> > I'd think, then, that the coarse grid should have a maximum of 30^3/4^3
>> >> >> > ~ 420 refinement regions.
>> >> >> >
>> >> >> > What is the best way to figure out what is going on?
>> >> >>
>> >> >> Hal
>> >> >>
>> >> >> Yes, this function is very slow. I did not expect it to be
>> >> >> prohibitively slow. Are you compiling with optimisation enabled?
>> >> >
>> >> > I've tried with optimizations enabled (and without for debugging).
>> >> >
>> >> >>
>> >> >> The bboxset represents the set of refined regions, and it is
>> >> >> internally represented as a list of bboxes (regions). Carpet performs
>> >> >> set operations on these (intersection, union, complement, etc.) to
>> >> >> determine the communication schedule, i.e. which ghost zones of which
>> >> >> bbox need to be filled from which other bbox. Unfortunately, the
>> >> >> algorithm used for this is O(n^2) in the number of refined regions,
>> >> >> and set operations when implemented via lists themselves are O(n^2) in
>> >> >> the set size, leading to a rather unfortunate overall complexity. The
>> >> >> only cure is to reduce the number of bboxes (make them larger) and to
>> >> >> regrid fewer times.
>> >> >
>> >> > This is what I suspected, but nevertheless, is there something wrong?
>> >> > How many boxes do you expect that I should have? The reason that it does
>> >> > not finish, even with optimizations, is that there are 129K boxes in the
>> >> > loop (that's at least 16 billion box normalizations?).
>> >> >
>> >> > The coarse grid is only 30^3, and the regrid box size is 4, so at
>> >> > maximum, there should be ~400 level one boxes. Even if some of those
>> >> > have level 2 boxes, I don't understand how there could be 129K boxes.
>> >>
>> >> The refinement structure itself should have one bbox per refined 4^3
>> >> box, and both CarpetRegrid2 and CarpetLib would try to combine these
>> >> into fewer boxes where possible, i.e. where one can form rectangles or
>> >> larger cubes. I would thus expect no more than (30/4)^2 = 64 bboxes on
>> >> level one.
>> >
>> > That makes sense. I think that there is a bug somewhere which is causing
>> > the box set to be much too big. Furthermore, it does not happen on every
>> > run, only sometimes. When it does not happen, I hit another bug after a
>> > few coarse timesteps:
>> >
>> > I get a range-check exception from std::vector in a call to:
>> > gh::get_local_component (rl=1, c=8)
>> > the problem is that this returns:
>> > local_components_.AT(rl).AT(c);
>> > and local_components_[1].size() is 8
>> > The call to get_local_component is coming from ggf::transfer_from_all
>> > at:
>> > int const lc2 = h.get_local_component(rl2,c2);
>> > where c2 is from psend.component.
>> > So it looks like there is an off-by-one error somewhere.
>>
>> Very strange. This code should be quite solid by now. psend is set in
>> the file dh.cc in thorn Carpet/CarpetLib; there is one (large) routine
>> that calculates the communication schedule. Some of the indexing
>> errors there in the past included confusing the number of components
>> on different refinement levels, which led to indexing errors such as
>> the one you describe.
>
> The bad component numbers are not coming from:
> preg.component = tmpncomps.AT(m)++;
> in Carpet/src/Recompose.cc
>
> Where else are the component numbers assigned?
>
> Thanks again,
> Hal
>
>>
>> >> Internally, one of the expensive operations is to determine which
>> >> points are buffer points. But before I explain this algorithm one
>> >> important question: Are you using buffer points? Those become
>> >> expensive, and you should be able to avoid them. First, you should be
>> >> able to just switch off this parameter; if this leads to loss of
>> >> convergence, you should be able to rewrite your equations into a
>> >> first-order system (only first spatial derivatives by adding another
>> >> evolved variable), which will most likely be stable. Not using buffer
>> >> zones should make things much cheaper.
>> >
>> > I am computing second spatial derivatives, and while I am curious about
>> > the algorithm, etc. I am not sure this has anything to do with the
>> > problem. I've changed the threshold so that I'm not generating
>> > level_mask values >= 2 and I still sometimes end up in that loop with
>> > 170K boxes. That is clearly just wrong. If you can tell me where the
>> > boxes are generated (in which functions), I can add some printf
>> > statements and breakpoints and I'll try and see where they are all
>> > coming from.
>>
>> Buffer zones increase the refined region. Try
>> Carpet::use_buffer_zones=no, which is also the default.
>>
>> The level mask is interpreted in the file amr.cc, function
>> evaluate_level_mask in thorn Carpet/CarpetRegrid2. These boxes are
>> then post-processed in file regrid.cc, lines 315 to 415. The
>> individual post-processing steps are defined in file property.cc. The
>> post-processing enforces certain grid structure properties, such as
>> e.g. proper nesting, certain symmetries (this is also where
>> periodicity would be enforced), or adding buffer zones.
>>
>> -erik
>>
>> > Thanks again,
>> > Hal
>> >
>> >>
>> >> -erik
>> >>
>> >
>> > --
>> > Hal Finkel
>> > Postdoctoral Appointee
>> > Leadership Computing Facility
>> > Argonne National Laboratory
>> > 1-630-252-0023
>> > hfinkel at anl.gov
>> >
>> >
>>
>>
>>
>
> --
> Hal Finkel
> Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory
> 1-630-252-0023
> hfinkel at anl.gov
>
>
--
Erik Schnetter <schnetter at cct.lsu.edu> http://www.cct.lsu.edu/~eschnett/
More information about the Users
mailing list