[Users] Problem with CarpetRegrid2/AMR
Hal Finkel
hfinkel at anl.gov
Thu Sep 1 14:56:44 CDT 2011
On Thu, 2011-09-01 at 15:16 -0400, Erik Schnetter wrote:
> On Thu, Sep 1, 2011 at 3:05 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> > On Thu, 2011-09-01 at 14:25 -0400, Erik Schnetter wrote:
> >> On Thu, Sep 1, 2011 at 11:51 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> >> > On Thu, 2011-09-01 at 11:37 -0400, Erik Schnetter wrote:
> >> >> On Thu, Sep 1, 2011 at 10:53 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> >> >> > On Tue, 2011-08-30 at 21:06 -0400, Erik Schnetter wrote:
> >> >> >> On Tue, Aug 30, 2011 at 5:28 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> >> >> >> > Could I also decrease the block size? I currently have
> >> >> >> > CarpetRegrid2::adaptive_block_size = 4, could it be smaller than that?
> >> >> >> > Is there a restriction based on the number of ghost points?
> >> >> >>
> >> >> >> Yes, you can reduce the block size. I assume that both the regridding
> >> >> >> operation and the time evolution will become slower if you do that,
> >> >> >> because more blocks will have to be handled.
> >> >> >
> >> >> > Regardless of what I do, once we get past the first coarse time step,
> >> >> > the program seems to "hang" at "INFO (Carpet): [ml=0][rl=0][m=0][tl=0]
> >> >> > Regridding map 0...".
> >> >> >
> >> >> > Overall, it is in dh::regrid(do_init=true). It spends most of its time
> >> >> > in bboxset<int, 3>::normalize() and, specifically, mostly in the loop:
> >> >> > for (typename bset::iterator nsi = nbs.begin(); nsi != nbs.end(); ++
> >> >> > nsi). The normalize() function does exit, however, so it is not hanging
> >> >> > in that function.
> >> >> >
> >> >> > The core problem seems to be that it takes a long time to execute:
> >> >> > boxes = boxes .shift(-dir) - boxes;
> >> >> > in dh::regrid(do_init=true). Probably because boxes has 129064 elements.
> >> >> > The coarse grid is now only 30^3 and I've left the regrid box size at 4.
> >> >> > I'd think, then, that the coarse grid should have a maximum of 30^3/4^3
> >> >> > ~ 420 refinement regions.
> >> >> >
> >> >> > What is the best way to figure out what is going on?
> >> >>
> >> >> Hal
> >> >>
> >> >> Yes, this function is very slow. I did not expect it to be
> >> >> prohibitively slow. Are you compiling with optimisation enabled?
> >> >
> >> > I've tried with optimizations enabled (and without for debugging).
> >> >
> >> >>
> >> >> The bboxset represents the set of refined regions, and it is
> >> >> internally represented as a list of bboxes (regions). Carpet performs
> >> >> set operations on these (intersection, union, complement, etc.) to
> >> >> determine the communication schedule, i.e. which ghost zones of which
> >> >> bbox need to be filled from which other bbox. Unfortunately, the
> >> >> algorithm used for this is O(n^2) in the number of refined regions,
> >> >> and set operations when implemented via lists themselves are O(n^2) in
> >> >> the set size, leading to a rather unfortunate overall complexity. The
> >> >> only cure is to reduce the number of bboxes (make them larger) and to
> >> >> regrid fewer times.
> >> >
> >> > This is what I suspected, but nevertheless, is there something wrong?
> >> > How many boxes do you expect that I should have? The reason that it does
> >> > not finish, even with optimizations, is that there are 129K boxes in the
> >> > loop (that's at least 16 billion box normalizations?).
> >> >
> >> > The coarse grid is only 30^3, and the regrid box size is 4, so at
> >> > maximum, there should be ~400 level one boxes. Even if some of those
> >> > have level 2 boxes, I don't understand how there could be 129K boxes.
> >>
> >> The refinement structure itself should have one bbox per refined 4^3
> >> box, and both CarpetRegrid2 and CarpetLib would try to combine these
> >> into fewer boxes where possible, i.e. where one can form rectangles or
> >> larger cubes. I would thus expect no more than (30/4)^2 = 64 bboxes on
> >> level one.
> >
> > That makes sense. I think that there is a bug somewhere which is causing
> > the box set to be much too big. Furthermore, it does not happen on every
> > run, only sometimes. When it does not happen, I hit another bug after a
> > few coarse timesteps:
> >
> > I get a range-check exception from std::vector in a call to:
> > gh::get_local_component (rl=1, c=8)
> > the problem is that this returns:
> > local_components_.AT(rl).AT(c);
> > and local_components_[1].size() is 8
> > The call to get_local_component is coming from ggf::transfer_from_all
> > at:
> > int const lc2 = h.get_local_component(rl2,c2);
> > where c2 is from psend.component.
> > So it looks like there is an off-by-one error somewhere.
>
> Very strange. This code should be quite solid by now. psend is set in
> the file dh.cc in thorn Carpet/CarpetLib; there is one (large) routine
> that calculates the communication schedule. Some of the indexing
> errors there in the past included confusing the number of components
> on different refinement levels, which led to indexing errors such as
> the one you describe.
>
> >> Internally, one of the expensive operations is to determine which
> >> points are buffer points. But before I explain this algorithm one
> >> important question: Are you using buffer points? Those become
> >> expensive, and you should be able to avoid them. First, you should be
> >> able to just switch off this parameter; if this leads to loss of
> >> convergence, you should be able to rewrite your equations into a
> >> first-order system (only first spatial derivatives by adding another
> >> evolved variable), which will most likely be stable. Not using buffer
> >> zones should make things much cheaper.
> >
> > I am computing second spatial derivatives, and while I am curious about
> > the algorithm, etc. I am not sure this has anything to do with the
> > problem. I've changed the threshold so that I'm not generating
> > level_mask values >= 2 and I still sometimes end up in that loop with
> > 170K boxes. That is clearly just wrong. If you can tell me where the
> > boxes are generated (in which functions), I can add some printf
> > statements and breakpoints and I'll try and see where they are all
> > coming from.
>
> Buffer zones increase the refined region. Try
> Carpet::use_buffer_zones=no, which is also the default.
I did not have that turned on.
What is the different between boxes.size() and boxes.setsize()? Perhaps
I was misinterpreting these:
(gdb) p boxes.setsize()
$1 = 24
(gdb) p boxes.size()
$2 = 175175
(gdb)
So maybe I don't have too many boxes, maybe it is just hanging sometimes
(and then otherwise crashing with that indexing error).
In terms of non-default Carpet parameters, I did have:
regrid_in_level_mode = no (to suppress some warning early on). Changing
that back to its default value of "yes" does not seem to make a
difference.
I now have:
Carpet::verbose = yes
Carpet::veryverbose = yes
Carpet::schedule_barriers = no
Carpet::storage_verbose = no
CarpetLib::output_bboxes = no
Carpet::output_timers_every = 512
CarpetLib::print_timestats_every = 512
CarpetLib::print_memstats_every = 512
Carpet::domain_from_coordbase = yes
Carpet::prolongation_order_time = 2
Carpet::prolongation_order_space = 3
driver::ghost_size = 3
Carpet::poison_new_timelevels = yes
CarpetLib::poison_new_memory = yes
Carpet::init_3_timelevels = yes
Carpet::max_refinement_levels = 10
Carpet::refine_timestep = no
Carpet::use_buffer_zones = no
CarpetRegrid2::verbose = yes
CarpetRegrid2::veryverbose = yes
CarpetRegrid2::regrid_every = 512
CarpetRegrid2::adaptive_refinement = yes
CarpetRegrid2::adaptive_block_size = 4
CarpetRegrid2::add_levels_automatically = yes
Thanks again,
Hal
>
> The level mask is interpreted in the file amr.cc, function
> evaluate_level_mask in thorn Carpet/CarpetRegrid2. These boxes are
> then post-processed in file regrid.cc, lines 315 to 415. The
> individual post-processing steps are defined in file property.cc. The
> post-processing enforces certain grid structure properties, such as
> e.g. proper nesting, certain symmetries (this is also where
> periodicity would be enforced), or adding buffer zones.
>
> -erik
>
> > Thanks again,
> > Hal
> >
> >>
> >> -erik
> >>
> >
> > --
> > Hal Finkel
> > Postdoctoral Appointee
> > Leadership Computing Facility
> > Argonne National Laboratory
> > 1-630-252-0023
> > hfinkel at anl.gov
> >
> >
>
>
>
--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
1-630-252-0023
hfinkel at anl.gov
More information about the Users
mailing list