[Users] BNSM/TOV simulation error

Spandan Sarma 19306 spandan19 at iiserb.ac.in
Wed Mar 22 00:46:07 CDT 2023


Dear Roland,

Thank you so much for the help. I included your suggestions and tried
running the TOV with 16 cores with increased resolution, and it worked
successfully. I have submitted a BNSM simulation making similar relevant
changes and am awaiting its result.

Also, is there any way other than trial and error to calculate how many MPI
ranks are too much for a simulation?

Regards,
Spandan Sarma

On Mon, Mar 20, 2023 at 9:45 PM Roland Haas <rhaas at illinois.edu> wrote:

> Hello Spandan Sarma,
>
> Not having looked very carefully yet, one thing that turned out an
> issue in the last while has been that the gallery example (see
> http://einsteintoolkit.org/gallery/bns/index.html) is "small" and set
> up to run (see the web-page) 24 hours using 12 cores. Running on many
> more cores (MPI ranks really) can lead to these issues.
>
> So the first step would be to make sure that you run small enough (I
> would try for no more than 24 or so MPI ranks, and usually more than 8
> threads per MPI rank is not helping) and verify that the example works.
>
> Then, you can increase the resolution (the dx, dy, dz parameters in the
> parameter file *.par) to make sure that that NS are resolved well
> (resolution on the refinement level that contains them better than say
> 200m at least) and slowly scale up the number of cores to use until you
> have acceptable run speed.
>
> Based on your log files there were 16 MPI ranks for the TOV example
> (which last ran on 5 MPI ranks) and 144 MPI ranks for BNS (which was
> last run on 12 MPI ranks). In particular the latter one is "too many"
> and I suspect the error is due to that.
>
> Yours,
> Roland
>
> > Hello,
> >
> > I was trying to run the BNSM simulation from the ET gallery on the
> > institute cluster KANAD at IISER Bhopal in the short queue (max nodes:
> 16;
> > walltime: 24 hrs) of our queuing system, but the following error came up:
> >
> > The grid structure is inconsistent.  It is impossible to continue.
> >
> > WARNING level 0 from host n16 process 0
> >
> >   in thorn CarpetLib, file
> >
> /home2/shamims/ET_short1/Cactus/arrangements/Carpet/CarpetLib/src/dh.cc:2105:
> >
> >   -> The grid structure is inconsistent.  It is impossible to continue.
> >
> > cactus_sim:
> >
> /home2/shamims/ET_short1/Cactus/arrangements/Carpet/Carpet/src/helpers.cc:275:
> > int Carpet::Abort(const cGH*, int): Assertion `0' failed.
> >
> > Rank 0 with PID 4473 received signal 6
> >
> > Writing backtrace to nsnstohmns1/backtrace.0.txt
> >
> > WARNING level 0 from host n63 process 128
> >
> >   in thorn CarpetLib, file
> >
> /home2/shamims/ET_short1/Cactus/arrangements/Carpet/CarpetLib/src/dh.cc:2105:
> >
> >   -> The grid structure is inconsistent.  It is impossible to continue.
> >
> > cactus_sim:
> >
> /home2/shamims/ET_short1/Cactus/arrangements/Carpet/Carpet/src/helpers.cc:275:
> > int Carpet::Abort(const cGH*, int): Assertion `0' failed.
> >
> > Rank 128 with PID 1350 received signal 6
> >
> > Writing backtrace to nsnstohmns1/backtrace.128.txt
> >
> > WARNING level 0 from host n63 process 141
> >
> >   in thorn CarpetLib, file
> >
> /home2/shamims/ET_short1/Cactus/arrangements/Carpet/CarpetLib/src/dh.cc:2105:
> >
> >   -> The grid structure is inconsistent.  It is impossible to continue.
> >
> > cactus_sim:
> >
> /home2/shamims/ET_short1/Cactus/arrangements/Carpet/Carpet/src/helpers.cc:275:
> > int Carpet::Abort(const cGH*, int): Assertion `0' failed.
> >
> >
> > After this issue, I tried performing the simulation using the same
> > parameter file in the debug queue (max:1 node), and it worked fine. But
> > upon trying out the TOV simulation example in the debug queue, the same
> > error came:
> >
> >
> > [1mWARNING level 0 from host n85 process 0
> >
> >   in thorn CarpetLib, file
> >
> /home2/shamims/ET_debug/Cactus/arrangements/Carpet/CarpetLib/src/dh.cc:2105:
> >
> >   -> [0m The grid structure is inconsistent.  It is impossible to
> continue.
> >
> > WARNING level 0 from host n85 process 0
> >
> >   in thorn CarpetLib, file
> >
> /home2/shamims/ET_debug/Cactus/arrangements/Carpet/CarpetLib/src/dh.cc:2105:
> >
> >   -> The grid structure is inconsistent.  It is impossible to continue.
> >
> > cactus_sim:
> >
> /home2/shamims/ET_debug/Cactus/arrangements/Carpet/Carpet/src/helpers.cc:275:
> > int Carpet::Abort(const cGH*, int): Assertion `0' failed.
> >
> >
> > I am unable to understand what the issue is. I have attached parameter
> > files, the runscript, and the output files for both the simulations (TOV
> > and BNSM) for reference. Thanks in advance for the help.
> >
> > Regards,
>
>
> --
> My email is as private as my paper mail. I therefore support encrypting
> and signing email messages. Get my PGP key from http://pgp.mit.edu .
>


-- 
Spandan Sarma
BS-MS' 19
Department of Physics (4th Year),
IISER Bhopal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20230322/3a7d10ca/attachment.html 


More information about the Users mailing list