[ET Trac] [Einstein Toolkit] #1854: Very large global grid points leading termination

Mon Feb 8 04:23:30 CST 2016

#1854: Very large global grid points leading termination
--------------------------------------------+-------------------------------
  Reporter:  himanshukharkwal765@…          |       Owner:  himanshukharkwal765@…        
      Type:  defect                         |      Status:  new                          
  Priority:  major                          |   Milestone:  ET_2015_11                   
 Component:  Cactus                         |     Version:  development version          
Resolution:                                 |    Keywords:  grid global terminate        
--------------------------------------------+-------------------------------

Comment (by hinder):

 Thanks for the output.  The file "top1.png", from n = 350, already shows
 that simulation is using much more memory than is available on the node.
 You see that there is 99 GB total memory, and 99 GB of this are "used".
 Similarly, the swap usage is very high, at 8 GB.  As far as I can see,
 there is no evidence of there being anything wrong with this simulation;
 it is just too large to fit on the machine.  Note that "350 global grid
 points" means 350^3 because this is 3D.  When you increase that to 700,
 the memory requirements increase by a factor of 2^3 = 8.  So if the n=350
 run doesn't fit, the n=700 run has no chance.

 According to the output1.out file, SystemStatistics is reporting that each
 process is using about 15 GB (this means that this amount of data is
 resident in physical memory currently; it's very possible that the run
 needs even more than this, and the rest is in swap).  You have 6
 processes, so the total memory usage is 90 GB, which is about the same as
 the total memory available (accounting for other processes, overheads, and
 general subtleties in measuring memory usage).  I'm a bit confused about
 why SystemStatistics is reporting a swap usage of 0, when the "top" output
 shows 8 GB.  Was the "top" output recorded at the same time as the
 simulation which generated "output1.out"?

 To summarise, the machine on which you are running does not have enough
 memory for the size of job you are trying to run.  You either need to
 restrict to a smaller number of grid points, or run on more nodes.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1854#comment:4>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit