[Users] Question about GW150914.rpar simulation

Wed Oct 24 13:01:53 CDT 2018

> On 23 Oct 2018, at 08:57, Benjamin Chardi Marco <bchardim at redhat.com> wrote:
> 
> Dear friends,
> 
> We are trying to use the EinsteinToolKit GW150914.rpar binary balckhole merge simulation  as use case to test that our container orchestration product OpenShift can be used for HPC. 
> Our test environment only has 30 CPUs so we need to execute that simulation in a reasonable time.

Hi,

30 CPUs is quite a lot; do you really mean 30 CPUs, or 30 cores?  What CPU are you using, and how many cores does it have?  Also, what is the interconnect between the nodes?  Infiniband, omnipath, gigabit ethernet, etc?

> Please can you tell us how to modify  GW150914.rpar in order to get a less precise simulation executed in a 30 CPUs cluster in a reasonable time (~ few days). 

You can run a lower resolution by changing the

  --define N 28

to something else.  This must be a multiple of 4, and can probably go as low as 20 without the simulation crashing.  [Roland: you mentioned 24 in your email.  Did you try 20 and it crashed?  I seem to remember 20 working at one point.]  This is a measure of the number of grid cells across the black holes, so increasing it gives you more cells, higher resolution, and the run goes more slowly.

Roland's suggestions are also good, but I would make a couple of changes to what he recommended.  

The original boundary condition in GW150914.rpar sets the time derivative ("rhs") of all fields to 0 on the outer spherical boundary.  This fixes the fields to their initial value, so can be considered a Dirichlet (or "scalar") boundary condition: ML_BSSN::rhs_boundary_condition     = "scalar".  This is in general a bad thing to do (you will get almost perfect reflections of all outgoing waves), but the boundary was placed far enough away that it could not influence the waveform.  This is generally very cheap with the spherical outer grid used here, and was done because we had not implemented radiative boundary conditions that worked with the spherical grids in McLachlan at the time.

The improvement that I believe Roland meant to make was to change the boundary condition to radiative (not Robin), which has now been implemented in the code.  This makes the fields obey an advection-type equation on the outer boundary, assuming that all fields are solutions to outgoing radial wave equations.  In Roland's parameter file, he set

NewRad::z_is_radial                 = yes

but this is a technical change to trick NewRad into working with the spherical grids that we use here.  To change the boundary condition itself, you need to set

ML_BSSN::rhs_boundary_condition     = "NewRad"

rather than "scalar".

The other change Roland made was to change final_time to half its current value:

final_time = waveform_length + outermost_detector
->
final_time = 0.5*(waveform_length + outermost_detector)

This doesn't seem correct.  It is true that final_time is used to set sphere_outer_radius, but this will not halve the size of the domain.  Further, it will halve the runtime of the simulation, so the simulation will stop before the BHs have merged.  Instead, I would change sphere_outer_radius as follows:

-sphere_outer_radius = int((outermost_detector + final_time)/(i*hr))*i*hr
+sphere_outer_radius = int((1000)/(i*hr))*i*hr

This might make the waveform noiser, but with the changed boundary condition, it shouldn't be too bad.

> Now we can run the simulation  GW150914.rpar using OpenMPI + EinsteinToolKit, but it takes so long to be executed (~ weeks).

This sounds liked quite a long time!  It sounds too long.  On the page describing the simulation, https://einsteintoolkit.org/gallery/bbh/index.html, it says that the simulation takes 2.8 days on 128 cores of an Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz (Haswell).  Assuming that you mean you are using 30 cores, and if you are using a similar CPU, then it should take 2.8 * 128/30 = 11.9 days.  Is this about what you see?  What speed is reported?  You can see this in the output file GW150914_*.out:

----------------------------------------------------------------------------------------------------------
Iteration      Time | *me_per_hour |              ML_BSSN::phi | *TISTICS::maxrss_mb | *TICS::swap_used_mb
                    |              |      minimum      maximum |   minimum   maximum |   minimum   maximum
----------------------------------------------------------------------------------------------------------
   114640   246.602 |   10.5254966 |    0.0149352    0.9995490 |      3748      5289 |         0         0
   114644   246.611 |   10.5255173 |    0.0144565    0.9995490 |      3748      5289 |         0         0

The third column is the speed of the simulation in coordinate time per hour (it is a truncation of "physical_time_per_hour").  

It's possible that the OpenMP or MPI configuration is not correct.  Please could you post the standard output file (GW150914_*.out) to https://pastebin.com so we can take a look at it?

> We believe that  GW150914.rpar EinsteinToolKit is a great use case to test OpenShift for HPC, and of course we will reference to EinsteinToolKit is our final report as a use case for Openshift in HPC mode.

Great; it sounds interesting!  There are instructions for the papers which should be cited if you use this parameter file and code at the top of the parameter file:
# Copyright Barry Wardell, Ian Hinder, Eloisa Bentivegna

# We ask that if you make use of the parameter file or the example
# data, then please cite

# Simulation of GW150914 binary black hole merger using the
# Einstein Toolkit - https://doi.org/10.5281/zenodo.155394

# as well as the Einstein Toolkit, the Llama multi-block
# infrastructure, the Carpet mesh-refinement driver, the apparent
# horizon finder AHFinderDirect, the TwoPunctures initial data code,
# QuasiLocalMeasures, Cactus, and the McLachlan spacetime evolution
# code, the Kranc code generation package, and the Simulation Factory.

# An appropriate bibtex file, etgw150914.bib, is provided with this
# parameter file.
and the bibtex file is at https://einsteintoolkit.org/gallery/bbh/etgw150914.bib. 

-- 
Ian Hinder
https://ianhinder.net

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20181024/4a0e36c0/attachment.html