[Users] Issue with black hole binary evolution employing IGM

Zach Etienne zachetie at gmail.com
Thu Sep 22 10:56:50 CDT 2022


Hi Roland,

Leo Werneck created a pull request to address your comment, and I have
reviewed and merged it:
https://bitbucket.org/zach_etienne/wvuthorns/commits/f3202a79db27673f5083894e6beb3f1a75e08b3d

Regarding the original issue reported: Always run the Einstein Toolkit
executable with the -reo option enabled, so that output from all processors
is written to files. It's very likely that one MPI process (not process
zero) is erroring out, and you won't see it on process zero.

-Zach

*     *     *
Zachariah Etienne
Assoc. Prof. of Physics, U. of Idaho
Adjunct Assoc. Prof. of Physics & Astronomy, West Virginia U.
https://etienneresearch.com
https://blackholesathome.net


On Thu, Sep 22, 2022 at 6:29 AM Roland Haas <rhaas at illinois.edu> wrote:

> Hello all,
>
> The exit(1) really should be a CCTK_ERROR("Failed after Font fix")
> ideally with output of the offending variable values.
>
> Would one of you mind creating a bug report for IllinoisGRMHD so  that
> the IGM maintainers become aware and can produce a fix before the next
> release?
>
> Yours,
> Roland
>
> > Hello Federico,
> > I was having the same problem with binary black holes evolutions with
> > IllinoisGRMHD. In my case, the "exit(1)" statement in
> > harm_primitives_lowlevel.C in the piece of code you copied above silently
> > killed my runs when the Font fix failed, pretty much as in your run. Just
> > commenting it out solved that problem for me, so you can give it a try if
> > you want.
> >
> > On Thu, Sep 22, 2022, 07:37 Federico Cattorini <
> f.cattorini at campus.unimib.it>
> > wrote:
> >
> > > Hello Erik,
> > >
> > >    Thanks for your reply.
> > >
> > > I followed your suggestions and found two strings that may be
> indicative
> > > of what's going on. In the standard outputs 'CCTK_Proc14.out' and
> > > 'CCTK_Proc15.out' the last lines read
> > >
> > > INFO (IllinoisGRMHD): Font fix failed!
> > > INFO (IllinoisGRMHD): i,j,k = 67 63 16, stats.failure_checker = 0
> x,y,z =
> > > 3.392857e+00 8.892857e+00 2.392857e+00 , index=111739 st_i =
> -1.002115e+08
> > > 2.298583e+08 -1.221746e+08, rhostar = 1.573103e+02, Bi = -1.064528e+03
> > > 1.120144e+03 2.972675e+03, gij = 6.643816e+00 5.521615e-01 4.380688e-01
> > > 7.244355e+00 -1.685406e-03 6.830374e+00, Psi6 = 1.803534e+01
> > >
> > > I assume this means that there are issues in the con2prim of IGM. That
> > > INFO is printed by harm_primitives_lowlevel.C:
> > >
> > >     // Use the new Font fix subroutine
> > >>     int font_fix_applied=0;
> > >>     if(check!=0) {
> > >>       font_fix_applied=1;
> > >>       CCTK_REAL u_xl=1e100, u_yl=1e100, u_zl=1e100; // Set to insane
> > >> values to ensure they are overwritten.
> > >>       if (gamma_equals2==1) {
> > >>         check =
> > >>
> font_fix_gamma_equals2(u_xl,u_yl,u_zl,CONSERVS,PRIMS,METRIC_PHYS,METRIC_LAP_PSI4,eos);
> > >>       } else {
> > >> check =
> > >>
> font_fix_general_gamma(u_xl,u_yl,u_zl,CONSERVS,PRIMS,METRIC_PHYS,METRIC_LAP_PSI4,eos);
> > >>       }
> > >>       //Translate to HARM primitive now:
> > >>       prim[UTCON1] = METRIC_PHYS[GUPXX]*u_xl +
> METRIC_PHYS[GUPXY]*u_yl +
> > >> METRIC_PHYS[GUPXZ]*u_zl;
> > >>       prim[UTCON2] = METRIC_PHYS[GUPXY]*u_xl +
> METRIC_PHYS[GUPYY]*u_yl +
> > >> METRIC_PHYS[GUPYZ]*u_zl;
> > >>       prim[UTCON3] = METRIC_PHYS[GUPXZ]*u_xl +
> METRIC_PHYS[GUPYZ]*u_yl +
> > >> METRIC_PHYS[GUPZZ]*u_zl;
> > >>       if (check==1) {
> > >>         CCTK_VInfo(CCTK_THORNSTRING,"Font fix failed!");
> > >>         CCTK_VInfo(CCTK_THORNSTRING,"i,j,k = %d %d %d,
> > >> stats.failure_checker = %d x,y,z = %e %e %e , index=%d st_i = %e %e
> %e,
> > >> rhostar = %e, Bi = %e %e %e, gij = %e %e %e %e %e %e, Psi6 =
> > >>
> %e",i,j,k,stats.failure_checker,X[index],Y[index],Z[index],index,mhd_st_x_orig,mhd_st_y_orig,mhd_st_z_orig,rho_star_orig,PRIMS[BX_CENTER],PRIMS[BY_CENTER],PRIMS[BZ_CENTER],METRIC_PHYS[GXX],METRIC_PHYS[GXY],METRIC_PHYS[GXZ],METRIC_PHYS[GYY],METRIC_PHYS[GYZ],METRIC_PHYS[GZZ],METRIC_LAP_PSI4[PSI6]);
> > >>         exit(1);  // Let's exit instead of printing potentially GBs of
> > >> log files. Uncomment if you really want to deal with a mess.
> > >>       }
> > >>     }
> > >>     stats.failure_checker+=font_fix_applied*10000;
> > >>     stats.font_fixed=font_fix_applied;
> > >>
> > >
> > >
> > > Can I do anything that may help pinpoint the cause of this error?
> > >
> > > Thanks in advance,
> > >
> > > Federico
> > >
> > > Il giorno gio 15 set 2022 alle ore 15:30 Erik Schnetter <
> > > schnetter at gmail.com> ha scritto:
> > >
> > >> Federico
> > >>
> > >> Thanks for including the output, that is helpful.
> > >>
> > >> There are parameters "Carpet::verbose" and "Carpet::veryverbose". You
> > >> can set them to "yes" and recover from a checkpoint. This gives more
> > >> information about what the code is doing, and thus where it crashes.
> > >>
> > >> The output you attached is only from the first MPI process. Other
> > >> processes' output might contain a clue. You can add the command line
> > >> option "-roe" to Cactus when you run the simulation. This will collect
> > >> output from all processes.
> > >>
> > >> -erik
> > >>
> > >> On Thu, Sep 15, 2022 at 9:20 AM Federico Cattorini
> > >> <f.cattorini at campus.unimib.it> wrote:
> > >> >
> > >> > Hello everyone,
> > >> >
> > >> > I am experiencing some issue in a number of GRMHD simulations of
> black
> > >> hole binaries employing IllinoisGRMHD.
> > >> >
> > >> > As an example, I will write about an unequal-mass BHB
> configuration
> > >> (with q = 2) that I'm running.
> > >> >
> > >> > After approximately ten orbits, the run stops with no error codes
> or
> > >> any other message that could help me identify the issue. The last
> lines of
> > >> the standard output are
> > >> >
> > >> > INFO (IllinoisGRMHD): ***** Iter. # 353949, Lev: 9, Integrating to
> > >> time: 3.160260e+03 *****
> > >> > INFO (IllinoisGRMHD): C2P: Lev: 9 NumPts= 569160 | Fixes: Font= 393
> VL=
> > >> 179 rho*= 2 | Failures: 0 InHoriz= 0 / 0 | Error: 7.124e-02, ErrDenom:
> > >> 4.838e+13 | 4.51 iters/gridpt
> > >> > INFO (IllinoisGRMHD): ***** Iter. # 353949, Lev: 9, Integrating to
> > >> time: 3.160269e+03 *****
> > >> > Simfactory Done at date: gio 04 ago 2022 11:43:01 CEST
> > >> >
> > >> >
> > >> >
> > >> > I tried restarting my simulation from the latest checkpoint, but
> the
> > >> same sudden stop occurred at the same timestep.
> > >> >
> > >> > At first, I thought about some problem with IGM. The last INFO is
> > >> printed by IllinoisGRMHD_driver_evaluate_MHD_rhs.C, so I put some
> prints in
> > >> it to identify the spot where the error occurs.
> > >> > Unfortunately, I drew a blank, since the stop seems to occur just
> after
> > >> the end of IllinoisGRMHD_driver_evaluate_MHD_rhs:
> > >> >
> > >> > INFO (IllinoisGRMHD): ***** line 52: entering
> > >> IllinoisGRMHD_driver_evaluate_MHD_rhs *****
> > >> > INFO (IllinoisGRMHD): ***** Iter. # 353949, Lev: 10, Integrating
> to
> > >> time: 3.160251e+03 *****
> > >> > INFO (IllinoisGRMHD): ***** line 100:
> > >> IllinoisGRMHD_driver_evaluate_MHD_rhs *****
> > >> > INFO (IllinoisGRMHD): ***** line 204: just before
> > >> reconstruct_set_of_prims_PPM *****
> > >> > INFO (IllinoisGRMHD): ***** DEBUG END of
> > >> IllinoisGRMHD_driver_evaluate_MHD_rhs *****
> > >> > Simfactory Done at date: gio 04 ago 2022 19:44:55 CEST
> > >> >
> > >> >
> > >> > I tried to restart the simulation and run it on pure MPI. It ran
> for a
> > >> few more iterations, then stopped as well:
> > >> >
> > >> > INFO (IllinoisGRMHD): ***** line 52: entering
> > >> IllinoisGRMHD_driver_evaluate_MHD_rhs *****
> > >> > INFO (IllinoisGRMHD): ***** Iter. # 353565, Lev: 10, Integrating
> to
> > >> time: 3.156831e+03 *****
> > >> > INFO (IllinoisGRMHD): ***** line 100:
> > >> IllinoisGRMHD_driver_evaluate_MHD_rhs *****
> > >> > INFO (IllinoisGRMHD): ***** line 204: just before
> > >> reconstruct_set_of_prims_PPM *****
> > >> > INFO (IllinoisGRMHD): ***** DEBUG END of
> > >> IllinoisGRMHD_driver_evaluate_MHD_rhs *****
> > >> > Simfactory Done at date: ven 05 ago 2022 19:00:13 CEST
> > >> >
> > >> >
> > >> > The simulation setup is as follows:
> > >> >
> > >> >    Allocated:
> > >> >       Nodes:                      10
> > >> >       Cores per node:             48
> > >> >    SLURM setting
> > >> >       SLURM_NNODES :  10
> > >> >       SLURM_NPROCS :  20
> > >> >       SLURM_NTASKS :  20
> > >> >       SLURM_CPUS_ON_NODE  :  48
> > >> >       SLURM_CPUS_PER_TASK :  24
> > >> >       SLURM_TASKS_PER_NODE:  2(x10)
> > >> >    Running:
> > >> >       MPI processes:              20
> > >> >       OpenMP threads per process: 24
> > >> >       MPI processes per node:     2.0
> > >> >       OpenMP threads per core:    1.0
> > >> >       OpenMP threads per node:    48
> > >> >
> > >> >
> > >> > while the pure-MPI setup is
> > >> >
> > >> >    Allocated:
> > >> >       Nodes:                      10
> > >> >       Cores per node:             48
> > >> >    SLURM setting
> > >> >       SLURM_NNODES :  10
> > >> >       SLURM_NPROCS :  480
> > >> >       SLURM_NTASKS :  480
> > >> >       SLURM_CPUS_ON_NODE  :  48
> > >> >       SLURM_CPUS_PER_TASK :  1
> > >> >       SLURM_TASKS_PER_NODE:  48(x10)
> > >> >    Running:
> > >> >       MPI processes:              480
> > >> >       OpenMP threads per process: 1
> > >> >       MPI processes per node:     48.0
> > >> >       OpenMP threads per core:    1.0
> > >> >       OpenMP threads per node:    48
> > >> >
> > >> >
> > >> > I am using The Lorentz version of ET.
> > >> >
> > >> > I've had this issue for two binary BH simulations, both
> unequal-mass
> > >> with q = 2. My colleague Giacomo Fedrigo experienced the same problem
> > >> running an equal-mass simulation.
> > >> >
> > >> > I attach the q = 2 (s_UUmis_Q2) parameter file and the ET
> config-info
> > >> file. Also, I attach the st. error and output of my q = 2 run and of
> > >> Giacomo's run (b1_UUmis_a12b_pol3_r56_gauss_9). The st. outputs were
> cut
> > >> for readability reasons.
> > >> >
> > >> > Can someone please help me with this?
> > >> >
> > >> > Thanks in advance,
> > >> >
> > >> > Federico
> > >> > _______________________________________________
> > >> > Users mailing list
> > >> > Users at einsteintoolkit.org
> > >> >
> https://urldefense.com/v3/__http://lists.einsteintoolkit.org/mailman/listinfo/users__;!!DZ3fjg!7hy50UiCZQBt5fMM3BkYAFxBF_QtxwDruO1lrCkhZyGvaCqcGuHp9Eg9PprMhD3OmG4NmwDCZZRZhS79OILzwtdeSg$
>
> > >>
> > >>
> > >>
> > >> --
> > >> Erik Schnetter <schnetter at gmail.com>
> > >>
> https://urldefense.com/v3/__http://www.perimeterinstitute.ca/personal/eschnetter/__;!!DZ3fjg!7hy50UiCZQBt5fMM3BkYAFxBF_QtxwDruO1lrCkhZyGvaCqcGuHp9Eg9PprMhD3OmG4NmwDCZZRZhS79OIJtSmdJRA$
>
> > >>
> > > _______________________________________________
> > > Users mailing list
> > > Users at einsteintoolkit.org
> > >
> https://urldefense.com/v3/__http://lists.einsteintoolkit.org/mailman/listinfo/users__;!!DZ3fjg!7hy50UiCZQBt5fMM3BkYAFxBF_QtxwDruO1lrCkhZyGvaCqcGuHp9Eg9PprMhD3OmG4NmwDCZZRZhS79OILzwtdeSg$
>
> > >
>
>
>
> --
> My email is as private as my paper mail. I therefore support encrypting
> and signing email messages. Get my PGP key from http://keys.gnupg.net.
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20220922/4a53fbbd/attachment-0001.html 


More information about the Users mailing list