[Users] Issue with black hole binary evolution employing IGM

Federico Cattorini f.cattorini at campus.unimib.it
Fri Sep 23 08:19:58 CDT 2022


Hi all,

   Thanks for your precious suggestions.
Following Lorenzo's suggestion I managed to fix the problem and currently
my simulation is running smoothly. Luckily, it seems that it was only a
problem of a single point.

Regarding the original issue reported: Always run the Einstein Toolkit
> executable with the -reo option enabled, so that output from all processors
> is written to files. It's very likely that one MPI process (not process
> zero) is erroring out, and you won't see it on process zero.
>

Thank you Zach, I will sure do.

Federico

Il giorno gio 22 set 2022 alle ore 17:57 Zach Etienne <zachetie at gmail.com>
ha scritto:

> Hi Roland,
>
> Leo Werneck created a pull request to address your comment, and I have
> reviewed and merged it:
>
> https://bitbucket.org/zach_etienne/wvuthorns/commits/f3202a79db27673f5083894e6beb3f1a75e08b3d
>
> Regarding the original issue reported: Always run the Einstein Toolkit
> executable with the -reo option enabled, so that output from all processors
> is written to files. It's very likely that one MPI process (not process
> zero) is erroring out, and you won't see it on process zero.
>
> -Zach
>
> *     *     *
> Zachariah Etienne
> Assoc. Prof. of Physics, U. of Idaho
> Adjunct Assoc. Prof. of Physics & Astronomy, West Virginia U.
> https://etienneresearch.com
> https://blackholesathome.net
>
>
> On Thu, Sep 22, 2022 at 6:29 AM Roland Haas <rhaas at illinois.edu> wrote:
>
>> Hello all,
>>
>> The exit(1) really should be a CCTK_ERROR("Failed after Font fix")
>> ideally with output of the offending variable values.
>>
>> Would one of you mind creating a bug report for IllinoisGRMHD so  that
>> the IGM maintainers become aware and can produce a fix before the next
>> release?
>>
>> Yours,
>> Roland
>>
>> > Hello Federico,
>> > I was having the same problem with binary black holes evolutions with
>> > IllinoisGRMHD. In my case, the "exit(1)" statement in
>> > harm_primitives_lowlevel.C in the piece of code you copied above
>> silently
>> > killed my runs when the Font fix failed, pretty much as in your run.
>> Just
>> > commenting it out solved that problem for me, so you can give it a try
>> if
>> > you want.
>> >
>> > On Thu, Sep 22, 2022, 07:37 Federico Cattorini <
>> f.cattorini at campus.unimib.it>
>> > wrote:
>> >
>> > > Hello Erik,
>> > >
>> > >    Thanks for your reply.
>> > >
>> > > I followed your suggestions and found two strings that may be
>> indicative
>> > > of what's going on. In the standard outputs 'CCTK_Proc14.out' and
>> > > 'CCTK_Proc15.out' the last lines read
>> > >
>> > > INFO (IllinoisGRMHD): Font fix failed!
>> > > INFO (IllinoisGRMHD): i,j,k = 67 63 16, stats.failure_checker = 0
>> x,y,z =
>> > > 3.392857e+00 8.892857e+00 2.392857e+00 , index=111739 st_i =
>> -1.002115e+08
>> > > 2.298583e+08 -1.221746e+08, rhostar = 1.573103e+02, Bi = -1.064528e+03
>> > > 1.120144e+03 2.972675e+03, gij = 6.643816e+00 5.521615e-01
>> 4.380688e-01
>> > > 7.244355e+00 -1.685406e-03 6.830374e+00, Psi6 = 1.803534e+01
>> > >
>> > > I assume this means that there are issues in the con2prim of IGM. That
>> > > INFO is printed by harm_primitives_lowlevel.C:
>> > >
>> > >     // Use the new Font fix subroutine
>> > >>     int font_fix_applied=0;
>> > >>     if(check!=0) {
>> > >>       font_fix_applied=1;
>> > >>       CCTK_REAL u_xl=1e100, u_yl=1e100, u_zl=1e100; // Set to insane
>> > >> values to ensure they are overwritten.
>> > >>       if (gamma_equals2==1) {
>> > >>         check =
>> > >>
>> font_fix_gamma_equals2(u_xl,u_yl,u_zl,CONSERVS,PRIMS,METRIC_PHYS,METRIC_LAP_PSI4,eos);
>> > >>       } else {
>> > >> check =
>> > >>
>> font_fix_general_gamma(u_xl,u_yl,u_zl,CONSERVS,PRIMS,METRIC_PHYS,METRIC_LAP_PSI4,eos);
>> > >>       }
>> > >>       //Translate to HARM primitive now:
>> > >>       prim[UTCON1] = METRIC_PHYS[GUPXX]*u_xl +
>> METRIC_PHYS[GUPXY]*u_yl +
>> > >> METRIC_PHYS[GUPXZ]*u_zl;
>> > >>       prim[UTCON2] = METRIC_PHYS[GUPXY]*u_xl +
>> METRIC_PHYS[GUPYY]*u_yl +
>> > >> METRIC_PHYS[GUPYZ]*u_zl;
>> > >>       prim[UTCON3] = METRIC_PHYS[GUPXZ]*u_xl +
>> METRIC_PHYS[GUPYZ]*u_yl +
>> > >> METRIC_PHYS[GUPZZ]*u_zl;
>> > >>       if (check==1) {
>> > >>         CCTK_VInfo(CCTK_THORNSTRING,"Font fix failed!");
>> > >>         CCTK_VInfo(CCTK_THORNSTRING,"i,j,k = %d %d %d,
>> > >> stats.failure_checker = %d x,y,z = %e %e %e , index=%d st_i = %e %e
>> %e,
>> > >> rhostar = %e, Bi = %e %e %e, gij = %e %e %e %e %e %e, Psi6 =
>> > >>
>> %e",i,j,k,stats.failure_checker,X[index],Y[index],Z[index],index,mhd_st_x_orig,mhd_st_y_orig,mhd_st_z_orig,rho_star_orig,PRIMS[BX_CENTER],PRIMS[BY_CENTER],PRIMS[BZ_CENTER],METRIC_PHYS[GXX],METRIC_PHYS[GXY],METRIC_PHYS[GXZ],METRIC_PHYS[GYY],METRIC_PHYS[GYZ],METRIC_PHYS[GZZ],METRIC_LAP_PSI4[PSI6]);
>> > >>         exit(1);  // Let's exit instead of printing potentially GBs
>> of
>> > >> log files. Uncomment if you really want to deal with a mess.
>> > >>       }
>> > >>     }
>> > >>     stats.failure_checker+=font_fix_applied*10000;
>> > >>     stats.font_fixed=font_fix_applied;
>> > >>
>> > >
>> > >
>> > > Can I do anything that may help pinpoint the cause of this error?
>> > >
>> > > Thanks in advance,
>> > >
>> > > Federico
>> > >
>> > > Il giorno gio 15 set 2022 alle ore 15:30 Erik Schnetter <
>> > > schnetter at gmail.com> ha scritto:
>> > >
>> > >> Federico
>> > >>
>> > >> Thanks for including the output, that is helpful.
>> > >>
>> > >> There are parameters "Carpet::verbose" and "Carpet::veryverbose". You
>> > >> can set them to "yes" and recover from a checkpoint. This gives more
>> > >> information about what the code is doing, and thus where it crashes.
>> > >>
>> > >> The output you attached is only from the first MPI process. Other
>> > >> processes' output might contain a clue. You can add the command line
>> > >> option "-roe" to Cactus when you run the simulation. This will
>> collect
>> > >> output from all processes.
>> > >>
>> > >> -erik
>> > >>
>> > >> On Thu, Sep 15, 2022 at 9:20 AM Federico Cattorini
>> > >> <f.cattorini at campus.unimib.it> wrote:
>> > >> >
>> > >> > Hello everyone,
>> > >> >
>> > >> > I am experiencing some issue in a number of GRMHD simulations of
>> black
>> > >> hole binaries employing IllinoisGRMHD.
>> > >> >
>> > >> > As an example, I will write about an unequal-mass BHB
>> configuration
>> > >> (with q = 2) that I'm running.
>> > >> >
>> > >> > After approximately ten orbits, the run stops with no error codes
>> or
>> > >> any other message that could help me identify the issue. The last
>> lines of
>> > >> the standard output are
>> > >> >
>> > >> > INFO (IllinoisGRMHD): ***** Iter. # 353949, Lev: 9, Integrating
>> to
>> > >> time: 3.160260e+03 *****
>> > >> > INFO (IllinoisGRMHD): C2P: Lev: 9 NumPts= 569160 | Fixes: Font=
>> 393 VL=
>> > >> 179 rho*= 2 | Failures: 0 InHoriz= 0 / 0 | Error: 7.124e-02,
>> ErrDenom:
>> > >> 4.838e+13 | 4.51 iters/gridpt
>> > >> > INFO (IllinoisGRMHD): ***** Iter. # 353949, Lev: 9, Integrating
>> to
>> > >> time: 3.160269e+03 *****
>> > >> > Simfactory Done at date: gio 04 ago 2022 11:43:01 CEST
>> > >> >
>> > >> >
>> > >> >
>> > >> > I tried restarting my simulation from the latest checkpoint, but
>> the
>> > >> same sudden stop occurred at the same timestep.
>> > >> >
>> > >> > At first, I thought about some problem with IGM. The last INFO is
>> > >> printed by IllinoisGRMHD_driver_evaluate_MHD_rhs.C, so I put some
>> prints in
>> > >> it to identify the spot where the error occurs.
>> > >> > Unfortunately, I drew a blank, since the stop seems to occur just
>> after
>> > >> the end of IllinoisGRMHD_driver_evaluate_MHD_rhs:
>> > >> >
>> > >> > INFO (IllinoisGRMHD): ***** line 52: entering
>> > >> IllinoisGRMHD_driver_evaluate_MHD_rhs *****
>> > >> > INFO (IllinoisGRMHD): ***** Iter. # 353949, Lev: 10, Integrating
>> to
>> > >> time: 3.160251e+03 *****
>> > >> > INFO (IllinoisGRMHD): ***** line 100:
>> > >> IllinoisGRMHD_driver_evaluate_MHD_rhs *****
>> > >> > INFO (IllinoisGRMHD): ***** line 204: just before
>> > >> reconstruct_set_of_prims_PPM *****
>> > >> > INFO (IllinoisGRMHD): ***** DEBUG END of
>> > >> IllinoisGRMHD_driver_evaluate_MHD_rhs *****
>> > >> > Simfactory Done at date: gio 04 ago 2022 19:44:55 CEST
>> > >> >
>> > >> >
>> > >> > I tried to restart the simulation and run it on pure MPI. It ran
>> for a
>> > >> few more iterations, then stopped as well:
>> > >> >
>> > >> > INFO (IllinoisGRMHD): ***** line 52: entering
>> > >> IllinoisGRMHD_driver_evaluate_MHD_rhs *****
>> > >> > INFO (IllinoisGRMHD): ***** Iter. # 353565, Lev: 10, Integrating
>> to
>> > >> time: 3.156831e+03 *****
>> > >> > INFO (IllinoisGRMHD): ***** line 100:
>> > >> IllinoisGRMHD_driver_evaluate_MHD_rhs *****
>> > >> > INFO (IllinoisGRMHD): ***** line 204: just before
>> > >> reconstruct_set_of_prims_PPM *****
>> > >> > INFO (IllinoisGRMHD): ***** DEBUG END of
>> > >> IllinoisGRMHD_driver_evaluate_MHD_rhs *****
>> > >> > Simfactory Done at date: ven 05 ago 2022 19:00:13 CEST
>> > >> >
>> > >> >
>> > >> > The simulation setup is as follows:
>> > >> >
>> > >> >    Allocated:
>> > >> >       Nodes:                      10
>> > >> >       Cores per node:             48
>> > >> >    SLURM setting
>> > >> >       SLURM_NNODES :  10
>> > >> >       SLURM_NPROCS :  20
>> > >> >       SLURM_NTASKS :  20
>> > >> >       SLURM_CPUS_ON_NODE  :  48
>> > >> >       SLURM_CPUS_PER_TASK :  24
>> > >> >       SLURM_TASKS_PER_NODE:  2(x10)
>> > >> >    Running:
>> > >> >       MPI processes:              20
>> > >> >       OpenMP threads per process: 24
>> > >> >       MPI processes per node:     2.0
>> > >> >       OpenMP threads per core:    1.0
>> > >> >       OpenMP threads per node:    48
>> > >> >
>> > >> >
>> > >> > while the pure-MPI setup is
>> > >> >
>> > >> >    Allocated:
>> > >> >       Nodes:                      10
>> > >> >       Cores per node:             48
>> > >> >    SLURM setting
>> > >> >       SLURM_NNODES :  10
>> > >> >       SLURM_NPROCS :  480
>> > >> >       SLURM_NTASKS :  480
>> > >> >       SLURM_CPUS_ON_NODE  :  48
>> > >> >       SLURM_CPUS_PER_TASK :  1
>> > >> >       SLURM_TASKS_PER_NODE:  48(x10)
>> > >> >    Running:
>> > >> >       MPI processes:              480
>> > >> >       OpenMP threads per process: 1
>> > >> >       MPI processes per node:     48.0
>> > >> >       OpenMP threads per core:    1.0
>> > >> >       OpenMP threads per node:    48
>> > >> >
>> > >> >
>> > >> > I am using The Lorentz version of ET.
>> > >> >
>> > >> > I've had this issue for two binary BH simulations, both
>> unequal-mass
>> > >> with q = 2. My colleague Giacomo Fedrigo experienced the same problem
>> > >> running an equal-mass simulation.
>> > >> >
>> > >> > I attach the q = 2 (s_UUmis_Q2) parameter file and the ET
>> config-info
>> > >> file. Also, I attach the st. error and output of my q = 2 run and of
>> > >> Giacomo's run (b1_UUmis_a12b_pol3_r56_gauss_9). The st. outputs were
>> cut
>> > >> for readability reasons.
>> > >> >
>> > >> > Can someone please help me with this?
>> > >> >
>> > >> > Thanks in advance,
>> > >> >
>> > >> > Federico
>> > >> > _______________________________________________
>> > >> > Users mailing list
>> > >> > Users at einsteintoolkit.org
>> > >> >
>> https://urldefense.com/v3/__http://lists.einsteintoolkit.org/mailman/listinfo/users__;!!DZ3fjg!7hy50UiCZQBt5fMM3BkYAFxBF_QtxwDruO1lrCkhZyGvaCqcGuHp9Eg9PprMhD3OmG4NmwDCZZRZhS79OILzwtdeSg$
>>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Erik Schnetter <schnetter at gmail.com>
>> > >>
>> https://urldefense.com/v3/__http://www.perimeterinstitute.ca/personal/eschnetter/__;!!DZ3fjg!7hy50UiCZQBt5fMM3BkYAFxBF_QtxwDruO1lrCkhZyGvaCqcGuHp9Eg9PprMhD3OmG4NmwDCZZRZhS79OIJtSmdJRA$
>>
>> > >>
>> > > _______________________________________________
>> > > Users mailing list
>> > > Users at einsteintoolkit.org
>> > >
>> https://urldefense.com/v3/__http://lists.einsteintoolkit.org/mailman/listinfo/users__;!!DZ3fjg!7hy50UiCZQBt5fMM3BkYAFxBF_QtxwDruO1lrCkhZyGvaCqcGuHp9Eg9PprMhD3OmG4NmwDCZZRZhS79OILzwtdeSg$
>>
>> > >
>>
>>
>>
>> --
>> My email is as private as my paper mail. I therefore support encrypting
>> and signing email messages. Get my PGP key from http://keys.gnupg.net.
>> _______________________________________________
>> Users mailing list
>> Users at einsteintoolkit.org
>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20220923/db01fa23/attachment-0001.html 


More information about the Users mailing list