[ET Trac] [Einstein Toolkit] #2008: NaNs when running static tov on >40 cores
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Wed Feb 22 15:56:53 CST 2017
#2008: NaNs when running static tov on >40 cores
------------------------------------+---------------------------------------
Reporter: allgwy001@… | Owner: knarf
Type: defect | Status: assigned
Priority: major | Milestone:
Component: Carpet | Version: ET_2016_05
Resolution: | Keywords:
------------------------------------+---------------------------------------
Comment (by allgwy001@…):
Thank you very much for looking into it!
We can't seem to disable OpenMP in the compile. However, it's effectively
disabled by means of OMP_NUM_THREADS=1. Since "OpenMP does not play nicely
with other software, especially in the hybridized domain of combining
OpenMPI and OpenMP, where multiple users share nodes", they are not
willing to enable it for me.
Excluding boundary points and symmetry points, I find 31 evolved points in
each spatial direction for the most refined region. That gives 29 791
points in three dimensions. For each of the other four regions I find
25 695 points; that's 132 571 in total. Does Carpet provide any output I
can use to verify this?
My contact writes the following: "I fully understand the [comments about
the scalability]. However we see a similar decrement with
cctk_final_time=1000 [he initially tested with smaller times] and hence I
would assume a larger solution space. Unless your problem is what is
called embarrassingly parallel you will always be faced with a
communication issue."
This is incorrect, right? My understanding is that the size of the
solution space should remain the same regardless of cctk_final_time.
Replying to [comment:3 hinder]:
> This is a small example, and I don't think the results you are seeing
are surprising. Think of the number of points in the grid on each
refinement level, then divide that by the number of cores. Each core will
have to evolve that number of points. If you are using standard MPI
without OpenMP, this block of points will be surrounded by a shell of
ghost points which need to be synchronised with the other processes. As
the size of the component on each process decreases, the ratio between the
number of ghost points which need to be communicated and the number of
real points which need to be evolved will increase. Eventually, you will
have so few points evolved on each process that the communication cost
dominates (and is not expected to scale well with the number of cores). I
suggest that you calculate how many points are evolved on each process in
your tests.
>
> One way to improve scalability may be to use OpenMP parallelisation in
addition to MPI. Are you doing this already?
>
> Note that the scaling of this example is probably a separate issue to
the problem of it crashing on >40 cores.
>
> Also, just because this specific example doesn't scale effectively
beyond 20 cores, this doesn't say anything about other examples, or cases
that you want to run for science.
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/2008#comment:6>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list