[Users] reported vs real memory usage + CarpetRegrid2?

Kelly, Bernard J. (GSFC-660.0)[UNIVERSITY OF MARYLAND BALTIMORE COUNTY] bernard.j.kelly at nasa.gov
Wed Sep 11 05:09:41 CDT 2013


Hi Ian. Thanks for your response. To answer some of your questions:

* the behaviour is global --- it's affecting *every* core on *every* node, so it's not just a matter of uneven allocation of resources.

* I'm trying to get OpenMP to work on this machine, but my initial tests with it showed only modest improvements in memory efficiency (and didn't prevent this global memory increase)

* These aren't really very small memory effects --- the 1.5% and 3.0% are percentages of the *total* node memory. I was originally doing a vacuum + matter run, involving ~ twice as much memory to begin with. The doubling after first regridding then brought me to something close to 100% of the nominal available memory on the node, and it was enough to kill the run. I removed the matter components to see if they were misbehaving, but they're not (at least not any more than the vacuum). Obviously, I can just use more nodes, but I'm trying to understand the problem.

I'll try out SystemStatistics to see what it tells me.

Thanks,

Bernard

From: Ian Hinder <ian.hinder at aei.mpg.de<mailto:ian.hinder at aei.mpg.de>>
Date: Wednesday, September 11, 2013 4:23 AM
To: Bernard Kelly <bernard.j.kelly at nasa.gov<mailto:bernard.j.kelly at nasa.gov>>
Cc: "users at einsteintoolkit.org<mailto:users at einsteintoolkit.org>" <users at einsteintoolkit.org<mailto:users at einsteintoolkit.org>>
Subject: Re: [Users] reported vs real memory usage + CarpetRegrid2?


On 10 Sep 2013, at 18:34, "Kelly, Bernard J. (GSFC-660.0)[UNIVERSITY OF MARYLAND BALTIMORE COUNTY]" <bernard.j.kelly at nasa.gov<mailto:bernard.j.kelly at nasa.gov>> wrote:

Hi.

I'm running a vacuum BHB evolution with a larger-than-usual set of inner
refinement regions (levels 8, 9, 10 11 have radii of 12M, 8M, 6M, and 4M,
respectively)  and consequently the memory usage is a bit higher than
normal. But I'm finding that it jumps up almost 100% after the first
regridding, and stays there.

My diagnostic for this is the result of top on each of the nodes (via
"qtop.pl", a script on the machine I'm using). Sampled before the first
regridding, it shows each core using ~ 1.5% of the node's total memory,
while after regridding, it's more like 2.9% (these are Sandy Bridge nodes,
with 16 available cores).

However, the periodic output message from Carpet reporting the Grid
structure etc. shows regions only marginally larger than before, and ---
crucially for me --- has a marginally larger "Total required memory" (164
GB -> 167 GB, for instance).

So (a) what's using the extra memory, and (b) why isn't Carpet reporting
it? How seriously should I be taking that "Total required memory" message?

I see this with executables generated from both the last (ET_2012_11) and
current (ET_2013_05) stable releases, BTW. I'm attaching the current
parameter file and SCROUT from a run.

I know there was a problem related to drastically increased memory usage, but I thought that this was introduced to the trunk after ET_2013_05, and that Erik had already fixed it.  There was another problem in (I believe) ET_2012_11 where Carpet was always collapsing multiple grids on a refinement level into the smallest enclosing box, leading to huge memory usage, but that was also fixed, and I believe backported to ET_2012_11.  Have you tried using the SystemStatistics thorn to monitor memory usage?  This should be easier than using top on the nodes.

How is the grid distributed among the nodes?  Even if the total required memory is roughly constant, it's possible that the grid is distributed unevenly between nodes.  Is every node showing the increased memory usage? Given that you are still using a very small amount of memory, it's possible that Carpet is just "overallocating" on the first regridding as it anticipates that it might need more memory later, and memory allocation might be expensive.  The amount of overallocation is presumably small in comparison to the total available memory, but might be of the order of 1%, as you are seeing.

I wouldn't worry about this small amount of increased memory usage unless you can reproduce the problem on a more heavily-loaded system.  From your description, I suspect you are not using OpenMP.  Why is that?  Using pure MPI leads to an unnecessary memory overhead.

--
Ian Hinder
http://numrel.aei.mpg.de/people/hinder

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20130911/082fd2f5/attachment.html 


More information about the Users mailing list