[Users] Question about GW150914.rpar simulation
Benjamin Chardi Marco
bchardim at redhat.com
Wed Oct 24 13:39:38 CDT 2018
Hi Ian,
You are right, I have 30 cores not 30 CPUs.
Indeed they are 30 virtual cores because I am running the simulation in a
10 VM Cluster (with one vCPU with 3 cores each VM) built with RHEL7 +
OpenMPI.
Many thanks for your suggestion, I am going to try it and see if I can
reach 10 M//hr (actual value is 7M/hr).
Cheers,
Benja
On Wed, Oct 24, 2018 at 8:02 PM <ian.hinder at aei.mpg.de> wrote:
>
>
> On 23 Oct 2018, at 08:57, Benjamin Chardi Marco <bchardim at redhat.com>
> wrote:
>
> Dear friends,
>
> We are trying to use the EinsteinToolKit GW150914.rpar binary balckhole
> merge simulation as use case to test that our container orchestration
> product OpenShift can be used for HPC.
> Our test environment only has 30 CPUs so we need to execute that
> simulation in a reasonable time.
>
>
> Hi,
>
> 30 CPUs is quite a lot; do you really mean 30 CPUs, or 30 cores? What CPU
> are you using, and how many cores does it have? Also, what is the
> interconnect between the nodes? Infiniband, omnipath, gigabit ethernet,
> etc?
>
> Please can you tell us how to modify GW150914.rpar in order to get a less
> precise simulation executed in a 30 CPUs cluster in a reasonable time (~
> few days).
>
>
> You can run a lower resolution by changing the
>
> --define N 28
>
> to something else. This must be a multiple of 4, and can probably go as
> low as 20 without the simulation crashing. [Roland: you mentioned 24 in
> your email. Did you try 20 and it crashed? I seem to remember 20 working
> at one point.] This is a measure of the number of grid cells across the
> black holes, so increasing it gives you more cells, higher resolution, and
> the run goes more slowly.
>
> Roland's suggestions are also good, but I would make a couple of changes
> to what he recommended.
>
> The original boundary condition in GW150914.rpar sets the time derivative
> ("rhs") of all fields to 0 on the outer spherical boundary. This fixes the
> fields to their initial value, so can be considered a Dirichlet (or
> "scalar") boundary condition: ML_BSSN::rhs_boundary_condition =
> "scalar". This is in general a bad thing to do (you will get almost
> perfect reflections of all outgoing waves), but the boundary was placed far
> enough away that it could not influence the waveform. This is generally
> very cheap with the spherical outer grid used here, and was done because we
> had not implemented radiative boundary conditions that worked with the
> spherical grids in McLachlan at the time.
>
> The improvement that I believe Roland meant to make was to change the
> boundary condition to radiative (not Robin), which has now been implemented
> in the code. This makes the fields obey an advection-type equation on the
> outer boundary, assuming that all fields are solutions to outgoing radial
> wave equations. In Roland's parameter file, he set
>
> NewRad::z_is_radial = yes
>
> but this is a technical change to trick NewRad into working with the
> spherical grids that we use here. To change the boundary condition itself,
> you need to set
>
> ML_BSSN::rhs_boundary_condition = "NewRad"
>
> rather than "scalar".
>
> The other change Roland made was to change final_time to half its current
> value:
>
> final_time = waveform_length + outermost_detector
> ->
> final_time = 0.5*(waveform_length + outermost_detector)
>
> This doesn't seem correct. It is true that final_time is used to set
> sphere_outer_radius, but this will not halve the size of the domain.
> Further, it will halve the runtime of the simulation, so the simulation
> will stop before the BHs have merged. Instead, I would change
> sphere_outer_radius as follows:
>
> -sphere_outer_radius = int((outermost_detector + final_time)/(i*hr))*i*hr
> +sphere_outer_radius = int((1000)/(i*hr))*i*hr
>
> This might make the waveform noiser, but with the changed boundary
> condition, it shouldn't be too bad.
>
> Now we can run the simulation GW150914.rpar using OpenMPI +
> EinsteinToolKit, but it takes so long to be executed (~ weeks).
>
>
> This sounds liked quite a long time! It sounds too long. On the page
> describing the simulation,
> https://einsteintoolkit.org/gallery/bbh/index.html, it says that the
> simulation takes 2.8 days on 128 cores of an Intel(R) Xeon(R) CPU E5-2630
> v3 @ 2.40GHz (Haswell). Assuming that you mean you are using 30 cores, and
> if you are using a similar CPU, then it should take 2.8 * 128/30 = 11.9
> days. Is this about what you see? What speed is reported? You can see
> this in the output file GW150914_*.out:
>
>
> ----------------------------------------------------------------------------------------------------------
> Iteration Time | *me_per_hour | ML_BSSN::phi |
> *TISTICS::maxrss_mb | *TICS::swap_used_mb
> | | minimum maximum | minimum
> maximum | minimum maximum
>
> ----------------------------------------------------------------------------------------------------------
> 114640 246.602 | 10.5254966 | 0.0149352 0.9995490 |
> 3748 5289 | 0 0
> 114644 246.611 | 10.5255173 | 0.0144565 0.9995490 |
> 3748 5289 | 0 0
>
> The third column is the speed of the simulation in coordinate time per
> hour (it is a truncation of "physical_time_per_hour").
>
> It's possible that the OpenMP or MPI configuration is not correct. Please
> could you post the standard output file (GW150914_*.out) to
> https://pastebin.com so we can take a look at it?
>
> We believe that GW150914.rpar EinsteinToolKit is a great use case to test
> OpenShift for HPC, and of course we will reference to EinsteinToolKit is
> our final report as a use case for Openshift in HPC mode.
>
>
> Great; it sounds interesting! There are instructions for the papers which
> should be cited if you use this parameter file and code at the top of the
> parameter file:
>
> # Copyright Barry Wardell, Ian Hinder, Eloisa Bentivegna
>
> # We ask that if you make use of the parameter file or the example
> # data, then please cite
>
> # Simulation of GW150914 binary black hole merger using the
> # Einstein Toolkit - https://doi.org/10.5281/zenodo.155394
>
> # as well as the Einstein Toolkit, the Llama multi-block
> # infrastructure, the Carpet mesh-refinement driver, the apparent
> # horizon finder AHFinderDirect, the TwoPunctures initial data code,
> # QuasiLocalMeasures, Cactus, and the McLachlan spacetime evolution
> # code, the Kranc code generation package, and the Simulation Factory.
>
> # An appropriate bibtex file, etgw150914.bib, is provided with this
> # parameter file.
>
> and the bibtex file is at
> https://einsteintoolkit.org/gallery/bbh/etgw150914.bib.
>
> --
> Ian Hinder
> https://ianhinder.net
>
>
--
Benjamín Chardí Marco
Senior Red Hat Consultant
RHCE #100-107-341
bchardim at redhat.com
Mobile: 0034 654 344 878
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20181024/04d3936d/attachment.html
More information about the Users
mailing list