[ET Trac] [Einstein Toolkit] #2008: NaNs when running static tov on >40 cores

Wed Feb 22 07:52:50 CST 2017

#2008: NaNs when running static tov on >40 cores
------------------------------------+---------------------------------------
  Reporter:  allgwy001@…            |       Owner:  knarf     
      Type:  defect                 |      Status:  assigned  
  Priority:  unset                  |   Milestone:            
 Component:  Other                  |     Version:  ET_2016_05
Resolution:                         |    Keywords:            
------------------------------------+---------------------------------------

Comment (by hinder):

 This is a small example, and I don't think the results you are seeing are
 surprising.  Think of the number of points in the grid on each refinement
 level, then divide that by the number of cores.  Each core will have to
 evolve that number of points.  If you are using standard MPI without
 OpenMP, this block of points will be surrounded by a shell of ghost points
 which need to be synchronised with the other processes.  As the size of
 the component on each process decreases, the ratio between the number of
 ghost points which need to be communicated and the number of real points
 which need to be evolved will increase.  Eventually, you will have so few
 points evolved on each process that the communication cost dominates (and
 is not expected to scale well with the number of cores).  I suggest that
 you calculate how many points are evolved on each process in your tests.

 One way to improve scalability may be to use OpenMP parallelisation in
 addition to MPI.  Are you doing this already?

 Note that the scaling of this example is probably a separate issue to the
 problem of it crashing on >40 cores.

 Also, just because this specific example doesn't scale effectively beyond
 20 cores, this doesn't say anything about other examples, or cases that
 you want to run for science.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/2008#comment:3>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit