<div dir="ltr"><div>Hi all,</div><div><br></div><div> Thanks for your precious suggestions.</div><div>Following Lorenzo's suggestion I managed to fix the problem and currently my simulation is running smoothly. Luckily, it seems that it was only a problem of a single point.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><span style="color:rgb(53,28,117)">Regarding the original issue reported: Always run the Einstein Toolkit
executable with the -reo option enabled, so that output from all
processors is written to files. It's very likely that one MPI process
(not process zero) is erroring out, and you won't see it on process
zero.</span></div></blockquote><div><br></div><div>Thank you Zach, I will sure do.</div><div><br></div><div>Federico <br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Il giorno gio 22 set 2022 alle ore 17:57 Zach Etienne <<a href="mailto:zachetie@gmail.com">zachetie@gmail.com</a>> ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi Roland,<div><br></div><div>Leo Werneck created a pull request to address your comment, and I have reviewed and merged it:</div><div><a href="https://bitbucket.org/zach_etienne/wvuthorns/commits/f3202a79db27673f5083894e6beb3f1a75e08b3d" target="_blank">https://bitbucket.org/zach_etienne/wvuthorns/commits/f3202a79db27673f5083894e6beb3f1a75e08b3d</a><br clear="all"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Regarding the original issue reported: Always run the Einstein Toolkit executable with the -reo option enabled, so that output from all processors is written to files. It's very likely that one MPI process (not process zero) is erroring out, and you won't see it on process zero.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">-Zach</div><div style="font-size:12.8px"><br></div><span style="font-size:12.8px">* * *</span><br style="font-size:12.8px"><span style="font-size:12.8px">Zachariah Etienne</span></div><div><span style="font-size:12.8px">Assoc. Prof. of Physics, U. of Idaho</span></div><div><span style="font-size:12.8px">Adjunct Assoc. Prof. of Physics & Astronomy, West Virginia U.</span></div><div dir="ltr"><div><a href="https://etienneresearch.com" target="_blank">https://etienneresearch.com</a></div><div><a href="https://blackholesathome.net/" target="_blank">https://blackholesathome.net</a><br></div></div></div></div></div></div></div></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Sep 22, 2022 at 6:29 AM Roland Haas <<a href="mailto:rhaas@illinois.edu" target="_blank">rhaas@illinois.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello all,<br>
<br>
The exit(1) really should be a CCTK_ERROR("Failed after Font fix")<br>
ideally with output of the offending variable values.<br>
<br>
Would one of you mind creating a bug report for IllinoisGRMHD so that<br>
the IGM maintainers become aware and can produce a fix before the next<br>
release?<br>
<br>
Yours,<br>
Roland<br>
<br>
> Hello Federico,<br>
> I was having the same problem with binary black holes evolutions with<br>
> IllinoisGRMHD. In my case, the "exit(1)" statement in<br>
> harm_primitives_lowlevel.C in the piece of code you copied above silently<br>
> killed my runs when the Font fix failed, pretty much as in your run. Just<br>
> commenting it out solved that problem for me, so you can give it a try if<br>
> you want.<br>
> <br>
> On Thu, Sep 22, 2022, 07:37 Federico Cattorini <<a href="mailto:f.cattorini@campus.unimib.it" target="_blank">f.cattorini@campus.unimib.it</a>><br>
> wrote:<br>
> <br>
> > Hello Erik,<br>
> ><br>
> > Thanks for your reply.<br>
> ><br>
> > I followed your suggestions and found two strings that may be indicative<br>
> > of what's going on. In the standard outputs 'CCTK_Proc14.out' and<br>
> > 'CCTK_Proc15.out' the last lines read<br>
> ><br>
> > INFO (IllinoisGRMHD): Font fix failed!<br>
> > INFO (IllinoisGRMHD): i,j,k = 67 63 16, stats.failure_checker = 0 x,y,z =<br>
> > 3.392857e+00 8.892857e+00 2.392857e+00 , index=111739 st_i = -1.002115e+08<br>
> > 2.298583e+08 -1.221746e+08, rhostar = 1.573103e+02, Bi = -1.064528e+03<br>
> > 1.120144e+03 2.972675e+03, gij = 6.643816e+00 5.521615e-01 4.380688e-01<br>
> > 7.244355e+00 -1.685406e-03 6.830374e+00, Psi6 = 1.803534e+01<br>
> ><br>
> > I assume this means that there are issues in the con2prim of IGM. That<br>
> > INFO is printed by harm_primitives_lowlevel.C:<br>
> ><br>
> > // Use the new Font fix subroutine <br>
> >> int font_fix_applied=0;<br>
> >> if(check!=0) {<br>
> >> font_fix_applied=1;<br>
> >> CCTK_REAL u_xl=1e100, u_yl=1e100, u_zl=1e100; // Set to insane<br>
> >> values to ensure they are overwritten.<br>
> >> if (gamma_equals2==1) {<br>
> >> check =<br>
> >> font_fix_gamma_equals2(u_xl,u_yl,u_zl,CONSERVS,PRIMS,METRIC_PHYS,METRIC_LAP_PSI4,eos);<br>
> >> } else {<br>
> >> check =<br>
> >> font_fix_general_gamma(u_xl,u_yl,u_zl,CONSERVS,PRIMS,METRIC_PHYS,METRIC_LAP_PSI4,eos);<br>
> >> }<br>
> >> //Translate to HARM primitive now:<br>
> >> prim[UTCON1] = METRIC_PHYS[GUPXX]*u_xl + METRIC_PHYS[GUPXY]*u_yl +<br>
> >> METRIC_PHYS[GUPXZ]*u_zl;<br>
> >> prim[UTCON2] = METRIC_PHYS[GUPXY]*u_xl + METRIC_PHYS[GUPYY]*u_yl +<br>
> >> METRIC_PHYS[GUPYZ]*u_zl;<br>
> >> prim[UTCON3] = METRIC_PHYS[GUPXZ]*u_xl + METRIC_PHYS[GUPYZ]*u_yl +<br>
> >> METRIC_PHYS[GUPZZ]*u_zl;<br>
> >> if (check==1) {<br>
> >> CCTK_VInfo(CCTK_THORNSTRING,"Font fix failed!");<br>
> >> CCTK_VInfo(CCTK_THORNSTRING,"i,j,k = %d %d %d,<br>
> >> stats.failure_checker = %d x,y,z = %e %e %e , index=%d st_i = %e %e %e,<br>
> >> rhostar = %e, Bi = %e %e %e, gij = %e %e %e %e %e %e, Psi6 =<br>
> >> %e",i,j,k,stats.failure_checker,X[index],Y[index],Z[index],index,mhd_st_x_orig,mhd_st_y_orig,mhd_st_z_orig,rho_star_orig,PRIMS[BX_CENTER],PRIMS[BY_CENTER],PRIMS[BZ_CENTER],METRIC_PHYS[GXX],METRIC_PHYS[GXY],METRIC_PHYS[GXZ],METRIC_PHYS[GYY],METRIC_PHYS[GYZ],METRIC_PHYS[GZZ],METRIC_LAP_PSI4[PSI6]);<br>
> >> exit(1); // Let's exit instead of printing potentially GBs of<br>
> >> log files. Uncomment if you really want to deal with a mess.<br>
> >> }<br>
> >> }<br>
> >> stats.failure_checker+=font_fix_applied*10000;<br>
> >> stats.font_fixed=font_fix_applied;<br>
> >> <br>
> ><br>
> ><br>
> > Can I do anything that may help pinpoint the cause of this error?<br>
> ><br>
> > Thanks in advance,<br>
> ><br>
> > Federico<br>
> ><br>
> > Il giorno gio 15 set 2022 alle ore 15:30 Erik Schnetter < <br>
> > <a href="mailto:schnetter@gmail.com" target="_blank">schnetter@gmail.com</a>> ha scritto: <br>
> > <br>
> >> Federico<br>
> >><br>
> >> Thanks for including the output, that is helpful.<br>
> >><br>
> >> There are parameters "Carpet::verbose" and "Carpet::veryverbose". You<br>
> >> can set them to "yes" and recover from a checkpoint. This gives more<br>
> >> information about what the code is doing, and thus where it crashes.<br>
> >><br>
> >> The output you attached is only from the first MPI process. Other<br>
> >> processes' output might contain a clue. You can add the command line<br>
> >> option "-roe" to Cactus when you run the simulation. This will collect<br>
> >> output from all processes.<br>
> >><br>
> >> -erik<br>
> >><br>
> >> On Thu, Sep 15, 2022 at 9:20 AM Federico Cattorini<br>
> >> <<a href="mailto:f.cattorini@campus.unimib.it" target="_blank">f.cattorini@campus.unimib.it</a>> wrote: <br>
> >> ><br>
> >> > Hello everyone,<br>
> >> ><br>
> >> > I am experiencing some issue in a number of GRMHD simulations of black <br>
> >> hole binaries employing IllinoisGRMHD. <br>
> >> ><br>
> >> > As an example, I will write about an unequal-mass BHB configuration <br>
> >> (with q = 2) that I'm running. <br>
> >> ><br>
> >> > After approximately ten orbits, the run stops with no error codes or <br>
> >> any other message that could help me identify the issue. The last lines of<br>
> >> the standard output are <br>
> >> ><br>
> >> > INFO (IllinoisGRMHD): ***** Iter. # 353949, Lev: 9, Integrating to <br>
> >> time: 3.160260e+03 ***** <br>
> >> > INFO (IllinoisGRMHD): C2P: Lev: 9 NumPts= 569160 | Fixes: Font= 393 VL= <br>
> >> 179 rho*= 2 | Failures: 0 InHoriz= 0 / 0 | Error: 7.124e-02, ErrDenom:<br>
> >> 4.838e+13 | 4.51 iters/gridpt <br>
> >> > INFO (IllinoisGRMHD): ***** Iter. # 353949, Lev: 9, Integrating to <br>
> >> time: 3.160269e+03 ***** <br>
> >> > Simfactory Done at date: gio 04 ago 2022 11:43:01 CEST<br>
> >> ><br>
> >> ><br>
> >> ><br>
> >> > I tried restarting my simulation from the latest checkpoint, but the <br>
> >> same sudden stop occurred at the same timestep. <br>
> >> ><br>
> >> > At first, I thought about some problem with IGM. The last INFO is <br>
> >> printed by IllinoisGRMHD_driver_evaluate_MHD_rhs.C, so I put some prints in<br>
> >> it to identify the spot where the error occurs. <br>
> >> > Unfortunately, I drew a blank, since the stop seems to occur just after <br>
> >> the end of IllinoisGRMHD_driver_evaluate_MHD_rhs: <br>
> >> ><br>
> >> > INFO (IllinoisGRMHD): ***** line 52: entering <br>
> >> IllinoisGRMHD_driver_evaluate_MHD_rhs ***** <br>
> >> > INFO (IllinoisGRMHD): ***** Iter. # 353949, Lev: 10, Integrating to <br>
> >> time: 3.160251e+03 ***** <br>
> >> > INFO (IllinoisGRMHD): ***** line 100: <br>
> >> IllinoisGRMHD_driver_evaluate_MHD_rhs ***** <br>
> >> > INFO (IllinoisGRMHD): ***** line 204: just before <br>
> >> reconstruct_set_of_prims_PPM ***** <br>
> >> > INFO (IllinoisGRMHD): ***** DEBUG END of <br>
> >> IllinoisGRMHD_driver_evaluate_MHD_rhs ***** <br>
> >> > Simfactory Done at date: gio 04 ago 2022 19:44:55 CEST<br>
> >> ><br>
> >> ><br>
> >> > I tried to restart the simulation and run it on pure MPI. It ran for a <br>
> >> few more iterations, then stopped as well: <br>
> >> ><br>
> >> > INFO (IllinoisGRMHD): ***** line 52: entering <br>
> >> IllinoisGRMHD_driver_evaluate_MHD_rhs ***** <br>
> >> > INFO (IllinoisGRMHD): ***** Iter. # 353565, Lev: 10, Integrating to <br>
> >> time: 3.156831e+03 ***** <br>
> >> > INFO (IllinoisGRMHD): ***** line 100: <br>
> >> IllinoisGRMHD_driver_evaluate_MHD_rhs ***** <br>
> >> > INFO (IllinoisGRMHD): ***** line 204: just before <br>
> >> reconstruct_set_of_prims_PPM ***** <br>
> >> > INFO (IllinoisGRMHD): ***** DEBUG END of <br>
> >> IllinoisGRMHD_driver_evaluate_MHD_rhs ***** <br>
> >> > Simfactory Done at date: ven 05 ago 2022 19:00:13 CEST<br>
> >> ><br>
> >> ><br>
> >> > The simulation setup is as follows:<br>
> >> ><br>
> >> > Allocated:<br>
> >> > Nodes: 10<br>
> >> > Cores per node: 48<br>
> >> > SLURM setting<br>
> >> > SLURM_NNODES : 10<br>
> >> > SLURM_NPROCS : 20<br>
> >> > SLURM_NTASKS : 20<br>
> >> > SLURM_CPUS_ON_NODE : 48<br>
> >> > SLURM_CPUS_PER_TASK : 24<br>
> >> > SLURM_TASKS_PER_NODE: 2(x10)<br>
> >> > Running:<br>
> >> > MPI processes: 20<br>
> >> > OpenMP threads per process: 24<br>
> >> > MPI processes per node: 2.0<br>
> >> > OpenMP threads per core: 1.0<br>
> >> > OpenMP threads per node: 48<br>
> >> ><br>
> >> ><br>
> >> > while the pure-MPI setup is<br>
> >> ><br>
> >> > Allocated:<br>
> >> > Nodes: 10<br>
> >> > Cores per node: 48<br>
> >> > SLURM setting<br>
> >> > SLURM_NNODES : 10<br>
> >> > SLURM_NPROCS : 480<br>
> >> > SLURM_NTASKS : 480<br>
> >> > SLURM_CPUS_ON_NODE : 48<br>
> >> > SLURM_CPUS_PER_TASK : 1<br>
> >> > SLURM_TASKS_PER_NODE: 48(x10)<br>
> >> > Running:<br>
> >> > MPI processes: 480<br>
> >> > OpenMP threads per process: 1<br>
> >> > MPI processes per node: 48.0<br>
> >> > OpenMP threads per core: 1.0<br>
> >> > OpenMP threads per node: 48<br>
> >> ><br>
> >> ><br>
> >> > I am using The Lorentz version of ET.<br>
> >> ><br>
> >> > I've had this issue for two binary BH simulations, both unequal-mass <br>
> >> with q = 2. My colleague Giacomo Fedrigo experienced the same problem<br>
> >> running an equal-mass simulation. <br>
> >> ><br>
> >> > I attach the q = 2 (s_UUmis_Q2) parameter file and the ET config-info <br>
> >> file. Also, I attach the st. error and output of my q = 2 run and of<br>
> >> Giacomo's run (b1_UUmis_a12b_pol3_r56_gauss_9). The st. outputs were cut<br>
> >> for readability reasons. <br>
> >> ><br>
> >> > Can someone please help me with this?<br>
> >> ><br>
> >> > Thanks in advance,<br>
> >> ><br>
> >> > Federico<br>
> >> > _______________________________________________<br>
> >> > Users mailing list<br>
> >> > <a href="mailto:Users@einsteintoolkit.org" target="_blank">Users@einsteintoolkit.org</a><br>
> >> > <a href="https://urldefense.com/v3/__http://lists.einsteintoolkit.org/mailman/listinfo/users__;!!DZ3fjg!7hy50UiCZQBt5fMM3BkYAFxBF_QtxwDruO1lrCkhZyGvaCqcGuHp9Eg9PprMhD3OmG4NmwDCZZRZhS79OILzwtdeSg$" rel="noreferrer" target="_blank">https://urldefense.com/v3/__http://lists.einsteintoolkit.org/mailman/listinfo/users__;!!DZ3fjg!7hy50UiCZQBt5fMM3BkYAFxBF_QtxwDruO1lrCkhZyGvaCqcGuHp9Eg9PprMhD3OmG4NmwDCZZRZhS79OILzwtdeSg$</a> <br>
> >><br>
> >><br>
> >><br>
> >> --<br>
> >> Erik Schnetter <<a href="mailto:schnetter@gmail.com" target="_blank">schnetter@gmail.com</a>><br>
> >> <a href="https://urldefense.com/v3/__http://www.perimeterinstitute.ca/personal/eschnetter/__;!!DZ3fjg!7hy50UiCZQBt5fMM3BkYAFxBF_QtxwDruO1lrCkhZyGvaCqcGuHp9Eg9PprMhD3OmG4NmwDCZZRZhS79OIJtSmdJRA$" rel="noreferrer" target="_blank">https://urldefense.com/v3/__http://www.perimeterinstitute.ca/personal/eschnetter/__;!!DZ3fjg!7hy50UiCZQBt5fMM3BkYAFxBF_QtxwDruO1lrCkhZyGvaCqcGuHp9Eg9PprMhD3OmG4NmwDCZZRZhS79OIJtSmdJRA$</a> <br>
> >> <br>
> > _______________________________________________<br>
> > Users mailing list<br>
> > <a href="mailto:Users@einsteintoolkit.org" target="_blank">Users@einsteintoolkit.org</a><br>
> > <a href="https://urldefense.com/v3/__http://lists.einsteintoolkit.org/mailman/listinfo/users__;!!DZ3fjg!7hy50UiCZQBt5fMM3BkYAFxBF_QtxwDruO1lrCkhZyGvaCqcGuHp9Eg9PprMhD3OmG4NmwDCZZRZhS79OILzwtdeSg$" rel="noreferrer" target="_blank">https://urldefense.com/v3/__http://lists.einsteintoolkit.org/mailman/listinfo/users__;!!DZ3fjg!7hy50UiCZQBt5fMM3BkYAFxBF_QtxwDruO1lrCkhZyGvaCqcGuHp9Eg9PprMhD3OmG4NmwDCZZRZhS79OILzwtdeSg$</a> <br>
> > <br>
<br>
<br>
<br>
-- <br>
My email is as private as my paper mail. I therefore support encrypting<br>
and signing email messages. Get my PGP key from <a href="http://keys.gnupg.net" rel="noreferrer" target="_blank">http://keys.gnupg.net</a>.<br>
_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@einsteintoolkit.org" target="_blank">Users@einsteintoolkit.org</a><br>
<a href="http://lists.einsteintoolkit.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.einsteintoolkit.org/mailman/listinfo/users</a><br>
</blockquote></div>
</blockquote></div>