<div dir="ltr"><div dir="ltr"><div>Dear Roland,</div><div><br></div>Thank you so much for the help. I included your suggestions and tried running the TOV with 16 cores with increased resolution, and it worked successfully. I have submitted a BNSM simulation making similar relevant changes and am awaiting its result.</div><div dir="ltr"><br></div><div dir="ltr">Also, is there any way other than trial and error to calculate how many MPI ranks are too much for a simulation?<br><div><br></div><div>Regards,</div><div>Spandan Sarma</div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Mar 20, 2023 at 9:45 PM Roland Haas <<a href="mailto:rhaas@illinois.edu" target="_blank">rhaas@illinois.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello Spandan Sarma,<br>
<br>
Not having looked very carefully yet, one thing that turned out an<br>
issue in the last while has been that the gallery example (see<br>
<a href="http://einsteintoolkit.org/gallery/bns/index.html" rel="noreferrer" target="_blank">http://einsteintoolkit.org/gallery/bns/index.html</a>) is "small" and set<br>
up to run (see the web-page) 24 hours using 12 cores. Running on many<br>
more cores (MPI ranks really) can lead to these issues.<br>
<br>
So the first step would be to make sure that you run small enough (I<br>
would try for no more than 24 or so MPI ranks, and usually more than 8<br>
threads per MPI rank is not helping) and verify that the example works.<br>
<br>
Then, you can increase the resolution (the dx, dy, dz parameters in the<br>
parameter file *.par) to make sure that that NS are resolved well<br>
(resolution on the refinement level that contains them better than say<br>
200m at least) and slowly scale up the number of cores to use until you<br>
have acceptable run speed.<br>
<br>
Based on your log files there were 16 MPI ranks for the TOV example<br>
(which last ran on 5 MPI ranks) and 144 MPI ranks for BNS (which was<br>
last run on 12 MPI ranks). In particular the latter one is "too many"<br>
and I suspect the error is due to that.<br>
<br>
Yours,<br>
Roland<br>
<br>
> Hello,<br>
> <br>
> I was trying to run the BNSM simulation from the ET gallery on the<br>
> institute cluster KANAD at IISER Bhopal in the short queue (max nodes: 16;<br>
> walltime: 24 hrs) of our queuing system, but the following error came up:<br>
> <br>
> The grid structure is inconsistent. It is impossible to continue.<br>
> <br>
> WARNING level 0 from host n16 process 0<br>
> <br>
> in thorn CarpetLib, file<br>
> /home2/shamims/ET_short1/Cactus/arrangements/Carpet/CarpetLib/src/dh.cc:2105:<br>
> <br>
> -> The grid structure is inconsistent. It is impossible to continue. <br>
> <br>
> cactus_sim:<br>
> /home2/shamims/ET_short1/Cactus/arrangements/Carpet/Carpet/src/helpers.cc:275:<br>
> int Carpet::Abort(const cGH*, int): Assertion `0' failed.<br>
> <br>
> Rank 0 with PID 4473 received signal 6<br>
> <br>
> Writing backtrace to nsnstohmns1/backtrace.0.txt<br>
> <br>
> WARNING level 0 from host n63 process 128<br>
> <br>
> in thorn CarpetLib, file<br>
> /home2/shamims/ET_short1/Cactus/arrangements/Carpet/CarpetLib/src/dh.cc:2105:<br>
> <br>
> -> The grid structure is inconsistent. It is impossible to continue. <br>
> <br>
> cactus_sim:<br>
> /home2/shamims/ET_short1/Cactus/arrangements/Carpet/Carpet/src/helpers.cc:275:<br>
> int Carpet::Abort(const cGH*, int): Assertion `0' failed.<br>
> <br>
> Rank 128 with PID 1350 received signal 6<br>
> <br>
> Writing backtrace to nsnstohmns1/backtrace.128.txt<br>
> <br>
> WARNING level 0 from host n63 process 141<br>
> <br>
> in thorn CarpetLib, file<br>
> /home2/shamims/ET_short1/Cactus/arrangements/Carpet/CarpetLib/src/dh.cc:2105:<br>
> <br>
> -> The grid structure is inconsistent. It is impossible to continue. <br>
> <br>
> cactus_sim:<br>
> /home2/shamims/ET_short1/Cactus/arrangements/Carpet/Carpet/src/helpers.cc:275:<br>
> int Carpet::Abort(const cGH*, int): Assertion `0' failed.<br>
> <br>
> <br>
> After this issue, I tried performing the simulation using the same<br>
> parameter file in the debug queue (max:1 node), and it worked fine. But<br>
> upon trying out the TOV simulation example in the debug queue, the same<br>
> error came:<br>
> <br>
> <br>
> [1mWARNING level 0 from host n85 process 0<br>
> <br>
> in thorn CarpetLib, file<br>
> /home2/shamims/ET_debug/Cactus/arrangements/Carpet/CarpetLib/src/dh.cc:2105:<br>
> <br>
> -> [0m The grid structure is inconsistent. It is impossible to continue. <br>
> <br>
> WARNING level 0 from host n85 process 0<br>
> <br>
> in thorn CarpetLib, file<br>
> /home2/shamims/ET_debug/Cactus/arrangements/Carpet/CarpetLib/src/dh.cc:2105:<br>
> <br>
> -> The grid structure is inconsistent. It is impossible to continue. <br>
> <br>
> cactus_sim:<br>
> /home2/shamims/ET_debug/Cactus/arrangements/Carpet/Carpet/src/helpers.cc:275:<br>
> int Carpet::Abort(const cGH*, int): Assertion `0' failed.<br>
> <br>
> <br>
> I am unable to understand what the issue is. I have attached parameter<br>
> files, the runscript, and the output files for both the simulations (TOV<br>
> and BNSM) for reference. Thanks in advance for the help.<br>
> <br>
> Regards,<br>
<br>
<br>
-- <br>
My email is as private as my paper mail. I therefore support encrypting<br>
and signing email messages. Get my PGP key from <a href="http://pgp.mit.edu" rel="noreferrer" target="_blank">http://pgp.mit.edu</a> .<br>
</blockquote></div><br clear="all"><div><br></div><span>-- </span><br><div dir="ltr"><div dir="ltr">Spandan Sarma<br>BS-MS' 19<div>Department of Physics (4th Year),<br>IISER Bhopal</div></div></div>