[ET Trac] [Einstein Toolkit] #2008: NaNs when running static tov on >40 cores
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Thu Feb 23 03:34:08 CST 2017
#2008: NaNs when running static tov on >40 cores
------------------------------------+---------------------------------------
Reporter: allgwy001@… | Owner: knarf
Type: defect | Status: assigned
Priority: major | Milestone:
Component: Carpet | Version: ET_2016_05
Resolution: | Keywords:
------------------------------------+---------------------------------------
Comment (by hinder):
Replying to [comment:6 allgwy001@…]:
> Thank you very much for looking into it!
>
> We can't seem to disable OpenMP in the compile. However, it's
effectively disabled by means of OMP_NUM_THREADS=1. Since "OpenMP does not
play nicely with other software, especially in the hybridized domain of
combining OpenMPI and OpenMP, where multiple users share nodes", they are
not willing to enable it for me.
I have never run on a system where multiple users share nodes; it doesn't
really fit very well with the sort of application that Cactus is. You
don't want to be worrying about whether other processes are competing with
you for memory, memory bandwidth, or cores. When you have exclusive
access to each node, OpenMP is usually a good idea. By the way: what sort
of interconnect do you have? Gigabit ethernet, or infiniband, or
something else? If users are sharing nodes, then I suspect that this
cluster is gigabit ethernet only, and you may be limited to small jobs,
since the performance of gigabit ethernet will quickly become your
bottleneck. What cluster are you using? From your email address, I'm
guessing that it is one of the ones at http://hpc.uct.ac.za? If so, or
you are using a similar scheduler, then you should be able to do this, as
in their documentation:
NB3: If your software prefers to use all cores on a computer then make
sure that you reserve these cores. For example running on an 800 series
server which has 8 cores per server change the directive line in your
script as follows:
#PBS -l nodes=1:ppn=8:series800
Once you are the exclusive user of a node, I don't see a problem with
enabling OpenMP. Also note: OpenMP is not something that needs to be
enabled by the system administrator; it is determined by your compilation
flags (on by default in the ET) and activated with OMP_NUM_THREADS. Is it
possible that there was a confusion, and the admins were talking about
hyperthreading instead, which is a very different thing, and which I agree
you probably don't want to have enabled (it would have to be enabled by
the admins)?
>
> Excluding boundary points and symmetry points, I find 31 evolved points
in each spatial direction for the most refined region. That gives 29 791
points in three dimensions. For each of the other four regions I find
25 695 points; that's 132 571 in total. Does Carpet provide any output I
can use to verify this?
Carpet provides a lot of output :) You may get something useful by
setting
CarpetLib::output_bboxes = "yes"
On the most refined region, making the approximation that the domain is
divided into N identical cubical regions, then for N = 40, you would have
29791/40 = 745 ~ 9^3^, so about 9 evolved points in each direction. The
volume of ghost plus evolved points would be (9+3+3)^3^ = 15^3^, so the
number of ghost points is 15^3^ - 9^3^, and the ratio of ghost to evolved
points is (15^3^ - 9^3^)/9^3^ = (15/9)^3^ - 1 = 3.6. So you have 3.6
times as many points being communicated as you have being evolved.
Especially if the interconnect is only gigabit ethernet, I'm not surprised
that the scaling flattens off by this point. Note that if you use OpenMP,
this ratio will be much smaller, because openmp threads communicate using
shared memory, not ghost zones. Essentially you will have fewer
processes, each with a larger cube, and multiple threads working on that
cube in shared memory.
> My contact writes the following: "I fully understand the [comments about
the scalability]. However we see a similar decrement with
cctk_final_time=1000 [he initially tested with smaller times] and hence I
would assume a larger solution space. Unless your problem is what is
called embarrassingly parallel you will always be faced with a
communication issue."
>
> This is incorrect, right? My understanding is that the size of the
solution space should remain the same regardless of cctk_final_time.
Yes - it looks like your contact doesn't know that the code is iterative,
and cctk_final_time simply counts the number of iterations. In order to
test with a larger problem size, you would need to reduce CoordBase::dx,
dy and dz, so that there is higher overall resolution, and hence more
points. I would expect the scalability to improve with larger problem
sizes.
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/2008#comment:7>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list