[ET Trac] [Einstein Toolkit] #1854: Very large global grid points leading termination
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Mon Feb 8 04:23:30 CST 2016
#1854: Very large global grid points leading termination
--------------------------------------------+-------------------------------
Reporter: himanshukharkwal765@… | Owner: himanshukharkwal765@…
Type: defect | Status: new
Priority: major | Milestone: ET_2015_11
Component: Cactus | Version: development version
Resolution: | Keywords: grid global terminate
--------------------------------------------+-------------------------------
Comment (by hinder):
Thanks for the output. The file "top1.png", from n = 350, already shows
that simulation is using much more memory than is available on the node.
You see that there is 99 GB total memory, and 99 GB of this are "used".
Similarly, the swap usage is very high, at 8 GB. As far as I can see,
there is no evidence of there being anything wrong with this simulation;
it is just too large to fit on the machine. Note that "350 global grid
points" means 350^3 because this is 3D. When you increase that to 700,
the memory requirements increase by a factor of 2^3 = 8. So if the n=350
run doesn't fit, the n=700 run has no chance.
According to the output1.out file, SystemStatistics is reporting that each
process is using about 15 GB (this means that this amount of data is
resident in physical memory currently; it's very possible that the run
needs even more than this, and the rest is in swap). You have 6
processes, so the total memory usage is 90 GB, which is about the same as
the total memory available (accounting for other processes, overheads, and
general subtleties in measuring memory usage). I'm a bit confused about
why SystemStatistics is reporting a swap usage of 0, when the "top" output
shows 8 GB. Was the "top" output recorded at the same time as the
simulation which generated "output1.out"?
To summarise, the machine on which you are running does not have enough
memory for the size of job you are trying to run. You either need to
restrict to a smaller number of grid points, or run on more nodes.
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1854#comment:4>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list