[ET Trac] [Einstein Toolkit] #2008: NaNs when running static tov on >40 cores
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Wed Feb 22 07:52:50 CST 2017
#2008: NaNs when running static tov on >40 cores
------------------------------------+---------------------------------------
Reporter: allgwy001@… | Owner: knarf
Type: defect | Status: assigned
Priority: unset | Milestone:
Component: Other | Version: ET_2016_05
Resolution: | Keywords:
------------------------------------+---------------------------------------
Comment (by hinder):
This is a small example, and I don't think the results you are seeing are
surprising. Think of the number of points in the grid on each refinement
level, then divide that by the number of cores. Each core will have to
evolve that number of points. If you are using standard MPI without
OpenMP, this block of points will be surrounded by a shell of ghost points
which need to be synchronised with the other processes. As the size of
the component on each process decreases, the ratio between the number of
ghost points which need to be communicated and the number of real points
which need to be evolved will increase. Eventually, you will have so few
points evolved on each process that the communication cost dominates (and
is not expected to scale well with the number of cores). I suggest that
you calculate how many points are evolved on each process in your tests.
One way to improve scalability may be to use OpenMP parallelisation in
addition to MPI. Are you doing this already?
Note that the scaling of this example is probably a separate issue to the
problem of it crashing on >40 cores.
Also, just because this specific example doesn't scale effectively beyond
20 cores, this doesn't say anything about other examples, or cases that
you want to run for science.
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/2008#comment:3>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list