[Users] Question about GW150914.rpar simulation

Wed Oct 24 13:39:38 CDT 2018

Hi Ian,

You are right, I have 30 cores not 30 CPUs.
Indeed they are 30 virtual cores because I am running the simulation in a
10 VM Cluster (with one vCPU with 3 cores each VM)  built with RHEL7 +
OpenMPI.
Many thanks for your suggestion, I am going to try it and see if I can
reach 10 M//hr (actual value is 7M/hr).

Cheers,
Benja

On Wed, Oct 24, 2018 at 8:02 PM <ian.hinder at aei.mpg.de> wrote:

>
>
> On 23 Oct 2018, at 08:57, Benjamin Chardi Marco <bchardim at redhat.com>
> wrote:
>
> Dear friends,
>
> We are trying to use the EinsteinToolKit GW150914.rpar binary balckhole
> merge simulation  as use case to test that our container orchestration
> product OpenShift can be used for HPC.
> Our test environment only has 30 CPUs so we need to execute that
> simulation in a reasonable time.
>
>
> Hi,
>
> 30 CPUs is quite a lot; do you really mean 30 CPUs, or 30 cores?  What CPU
> are you using, and how many cores does it have?  Also, what is the
> interconnect between the nodes?  Infiniband, omnipath, gigabit ethernet,
> etc?
>
> Please can you tell us how to modify  GW150914.rpar in order to get a less
> precise simulation executed in a 30 CPUs cluster in a reasonable time (~
> few days).
>
>
> You can run a lower resolution by changing the
>
>   --define N 28
>
> to something else.  This must be a multiple of 4, and can probably go as
> low as 20 without the simulation crashing.  [Roland: you mentioned 24 in
> your email.  Did you try 20 and it crashed?  I seem to remember 20 working
> at one point.]  This is a measure of the number of grid cells across the
> black holes, so increasing it gives you more cells, higher resolution, and
> the run goes more slowly.
>
> Roland's suggestions are also good, but I would make a couple of changes
> to what he recommended.
>
> The original boundary condition in GW150914.rpar sets the time derivative
> ("rhs") of all fields to 0 on the outer spherical boundary.  This fixes the
> fields to their initial value, so can be considered a Dirichlet (or
> "scalar") boundary condition: ML_BSSN::rhs_boundary_condition     =
> "scalar".  This is in general a bad thing to do (you will get almost
> perfect reflections of all outgoing waves), but the boundary was placed far
> enough away that it could not influence the waveform.  This is generally
> very cheap with the spherical outer grid used here, and was done because we
> had not implemented radiative boundary conditions that worked with the
> spherical grids in McLachlan at the time.
>
> The improvement that I believe Roland meant to make was to change the
> boundary condition to radiative (not Robin), which has now been implemented
> in the code.  This makes the fields obey an advection-type equation on the
> outer boundary, assuming that all fields are solutions to outgoing radial
> wave equations.  In Roland's parameter file, he set
>
> NewRad::z_is_radial                 = yes
>
> but this is a technical change to trick NewRad into working with the
> spherical grids that we use here.  To change the boundary condition itself,
> you need to set
>
> ML_BSSN::rhs_boundary_condition     = "NewRad"
>
> rather than "scalar".
>
> The other change Roland made was to change final_time to half its current
> value:
>
> final_time = waveform_length + outermost_detector
> ->
> final_time = 0.5*(waveform_length + outermost_detector)
>
> This doesn't seem correct.  It is true that final_time is used to set
> sphere_outer_radius, but this will not halve the size of the domain.
> Further, it will halve the runtime of the simulation, so the simulation
> will stop before the BHs have merged.  Instead, I would change
> sphere_outer_radius as follows:
>
> -sphere_outer_radius = int((outermost_detector + final_time)/(i*hr))*i*hr
> +sphere_outer_radius = int((1000)/(i*hr))*i*hr
>
> This might make the waveform noiser, but with the changed boundary
> condition, it shouldn't be too bad.
>
> Now we can run the simulation  GW150914.rpar using OpenMPI +
> EinsteinToolKit, but it takes so long to be executed (~ weeks).
>
>
> This sounds liked quite a long time!  It sounds too long.  On the page
> describing the simulation,
> https://einsteintoolkit.org/gallery/bbh/index.html, it says that the
> simulation takes 2.8 days on 128 cores of an Intel(R) Xeon(R) CPU E5-2630
> v3 @ 2.40GHz (Haswell).  Assuming that you mean you are using 30 cores, and
> if you are using a similar CPU, then it should take 2.8 * 128/30 = 11.9
> days.  Is this about what you see?  What speed is reported?  You can see
> this in the output file GW150914_*.out:
>
>
> ----------------------------------------------------------------------------------------------------------
> Iteration      Time | *me_per_hour |              ML_BSSN::phi |
> *TISTICS::maxrss_mb | *TICS::swap_used_mb
>                     |              |      minimum      maximum |   minimum
>   maximum |   minimum   maximum
>
> ----------------------------------------------------------------------------------------------------------
>    114640   246.602 |   10.5254966 |    0.0149352    0.9995490 |
> 3748      5289 |         0         0
>    114644   246.611 |   10.5255173 |    0.0144565    0.9995490 |
> 3748      5289 |         0         0
>
> The third column is the speed of the simulation in coordinate time per
> hour (it is a truncation of "physical_time_per_hour").
>
> It's possible that the OpenMP or MPI configuration is not correct.  Please
> could you post the standard output file (GW150914_*.out) to
> https://pastebin.com so we can take a look at it?
>
> We believe that  GW150914.rpar EinsteinToolKit is a great use case to test
> OpenShift for HPC, and of course we will reference to EinsteinToolKit is
> our final report as a use case for Openshift in HPC mode.
>
>
> Great; it sounds interesting!  There are instructions for the papers which
> should be cited if you use this parameter file and code at the top of the
> parameter file:
>
> # Copyright Barry Wardell, Ian Hinder, Eloisa Bentivegna
>
> # We ask that if you make use of the parameter file or the example
> # data, then please cite
>
> # Simulation of GW150914 binary black hole merger using the
> # Einstein Toolkit - https://doi.org/10.5281/zenodo.155394
>
> # as well as the Einstein Toolkit, the Llama multi-block
> # infrastructure, the Carpet mesh-refinement driver, the apparent
> # horizon finder AHFinderDirect, the TwoPunctures initial data code,
> # QuasiLocalMeasures, Cactus, and the McLachlan spacetime evolution
> # code, the Kranc code generation package, and the Simulation Factory.
>
> # An appropriate bibtex file, etgw150914.bib, is provided with this
> # parameter file.
>
> and the bibtex file is at
> https://einsteintoolkit.org/gallery/bbh/etgw150914.bib.
>
> --
> Ian Hinder
> https://ianhinder.net
>
>

-- 
Benjamín Chardí Marco
Senior Red Hat Consultant
RHCE #100-107-341
bchardim at redhat.com
Mobile: 0034 654 344 878
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20181024/04d3936d/attachment.html