[Users] Problem with CarpetRegrid2/AMR

Thu Sep 1 10:51:44 CDT 2011

On Thu, 2011-09-01 at 11:37 -0400, Erik Schnetter wrote:
> On Thu, Sep 1, 2011 at 10:53 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> > On Tue, 2011-08-30 at 21:06 -0400, Erik Schnetter wrote:
> >> On Tue, Aug 30, 2011 at 5:28 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> >> > Could I also decrease the block size? I currently have
> >> > CarpetRegrid2::adaptive_block_size = 4, could it be smaller than that?
> >> > Is there a restriction based on the number of ghost points?
> >>
> >> Yes, you can reduce the block size. I assume that both the regridding
> >> operation and the time evolution will become slower if you do that,
> >> because more blocks will have to be handled.
> >
> > Regardless of what I do, once we get past the first coarse time step,
> > the program seems to "hang" at "INFO (Carpet): [ml=0][rl=0][m=0][tl=0]
> > Regridding map 0...".
> >
> > Overall, it is in dh::regrid(do_init=true). It spends most of its time
> > in bboxset<int, 3>::normalize() and, specifically, mostly in the loop:
> > for (typename bset::iterator nsi = nbs.begin(); nsi != nbs.end(); ++
> > nsi). The normalize() function does exit, however, so it is not hanging
> > in that function.
> >
> > The core problem seems to be that it takes a long time to execute:
> > boxes  = boxes .shift(-dir) - boxes;
> > in dh::regrid(do_init=true). Probably because boxes has 129064 elements.
> > The coarse grid is now only 30^3 and I've left the regrid box size at 4.
> > I'd think, then, that the coarse grid should have a maximum of 30^3/4^3
> > ~ 420 refinement regions.
> >
> > What is the best way to figure out what is going on?
> 
> Hal
> 
> Yes, this function is very slow. I did not expect it to be
> prohibitively slow. Are you compiling with optimisation enabled?

I've tried with optimizations enabled (and without for debugging).

> 
> The bboxset represents the set of refined regions, and it is
> internally represented as a list of bboxes (regions). Carpet performs
> set operations on these (intersection, union, complement, etc.) to
> determine the communication schedule, i.e. which ghost zones of which
> bbox need to be filled from which other bbox. Unfortunately, the
> algorithm used for this is O(n^2) in the number of refined regions,
> and set operations when implemented via lists themselves are O(n^2) in
> the set size, leading to a rather unfortunate overall complexity. The
> only cure is to reduce the number of bboxes (make them larger) and to
> regrid fewer times.

This is what I suspected, but nevertheless, is there something wrong?
How many boxes do you expect that I should have? The reason that it does
not finish, even with optimizations, is that there are 129K boxes in the
loop (that's at least 16 billion box normalizations?).

The coarse grid is only 30^3, and the regrid box size is 4, so at
maximum, there should be ~400 level one boxes. Even if some of those
have level 2 boxes, I don't understand how there could be 129K boxes.

> 
> We are running on several hundred MPI processes, and are therefore
> dealing with set sizes of several hundred in production mode.
> (Carpet's domain decomposition splits boxes, so that each process
> receives a set of boxes.) It may be that the irregularity of the AMR
> box distribution leads to a worse case than our domain decomposition.
> 
> We have implemented / are implementing set operations based on a tree
> datastructure, which should be much more efficient. This is the work
> of Ashley Zebrowski, a graduate student at LSU who just tried
> (unsuccessfully -- a hurricane was in the way) to attend the ParCo
> 2011 conference in Belgium to present this work.

Sounds great!

Thanks again,
Hal

> 
> Frank, could you give a brief update of the status of Ashley's work?
> 
> -erik
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
1-630-252-0023
hfinkel at anl.gov