[ET Trac] [Einstein Toolkit] #2008: NaNs when running static tov on >40 cores

Wed Feb 22 15:56:53 CST 2017

#2008: NaNs when running static tov on >40 cores
------------------------------------+---------------------------------------
  Reporter:  allgwy001@…            |       Owner:  knarf     
      Type:  defect                 |      Status:  assigned  
  Priority:  major                  |   Milestone:            
 Component:  Carpet                 |     Version:  ET_2016_05
Resolution:                         |    Keywords:            
------------------------------------+---------------------------------------

Comment (by allgwy001@…):

 Thank you very much for looking into it!

 We can't seem to disable OpenMP in the compile. However, it's effectively
 disabled by means of OMP_NUM_THREADS=1. Since "OpenMP does not play nicely
 with other software, especially in the hybridized domain of combining
 OpenMPI and OpenMP, where multiple users share nodes", they are not
 willing to enable it for me.

 Excluding boundary points and symmetry points, I find 31 evolved points in
 each spatial direction for the most refined region. That gives 29 791
 points in three dimensions. For each of the other four regions I find
 25 695 points; that's 132 571 in total. Does Carpet provide any output I
 can use to verify this?

 My contact writes the following: "I fully understand the [comments about
 the scalability]. However we see a similar decrement with
 cctk_final_time=1000 [he initially tested with smaller times] and hence I
 would assume a larger solution space. Unless your problem is what is
 called embarrassingly parallel you will always be faced with a
 communication issue."

 This is incorrect, right? My understanding is that the size of the
 solution space should remain the same regardless of cctk_final_time.

 Replying to [comment:3 hinder]:
 > This is a small example, and I don't think the results you are seeing
 are surprising.  Think of the number of points in the grid on each
 refinement level, then divide that by the number of cores.  Each core will
 have to evolve that number of points.  If you are using standard MPI
 without OpenMP, this block of points will be surrounded by a shell of
 ghost points which need to be synchronised with the other processes.  As
 the size of the component on each process decreases, the ratio between the
 number of ghost points which need to be communicated and the number of
 real points which need to be evolved will increase.  Eventually, you will
 have so few points evolved on each process that the communication cost
 dominates (and is not expected to scale well with the number of cores).  I
 suggest that you calculate how many points are evolved on each process in
 your tests.
 >
 > One way to improve scalability may be to use OpenMP parallelisation in
 addition to MPI.  Are you doing this already?
 >
 > Note that the scaling of this example is probably a separate issue to
 the problem of it crashing on >40 cores.
 >
 > Also, just because this specific example doesn't scale effectively
 beyond 20 cores, this doesn't say anything about other examples, or cases
 that you want to run for science.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/2008#comment:6>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit