[Users] Thorn setup taking too much time in cluster

Roland Haas rhaas at illinois.edu
Tue Jun 6 08:26:23 CDT 2023


Hello Shamim Haque,

Ok.

Yours,
Roland

> Hello Roland,
> 
> Thanks for pointing me to the links. I'll check out the resources.
> 
> Regards
> Shamim Haque
> Senior Research Fellow (SRF)
> Department of Physics
> IISER Bhopal
> 
>> 
> On Fri, Jun 2, 2023 at 3:24 AM Roland Haas <rhaas at illinois.edu> wrote:
> 
> > Hello Shamim Haque,
> >
> > "the grid structure inconsistent. Impossible to continue" is a fairly
> > generic error that Carpet outputs if it detects that the grid structure
> > has become inconsistent and it is impossible to continue the
> > simulation.
> >
> > This condition is detected by a low level routine in Carpet that no
> > longer has access to the higher level information that was passed to
> > Carpet and that is actually the cause for the inconsistent grid
> > structure.
> >
> > The mailing list has a couple of similar reports that you could take a
> > look at and that may point to a possible solution.
> >
> > For this sort of error, more details would be needed to do any useful
> > diagnosis. At least the full log file (stdout and stderr) of the
> > simulation up to the point when it aborted would be needed along with
> > possibly some more log files that record the grid structure.
> >
> > Here are some possible relevant email threads:
> >
> > https://urldefense.com/v3/__https://lists.einsteintoolkit.org/pipermail/users/2023-March/008881.html__;!!DZ3fjg!7i12qoKBFzgYKbEGMk2b6YLNEpw3ozT2F8ckbK4GcAz2DyvjiD9kRk44tlGqfH28gU8BGYKS0zxiiEfa86nK$ 
> >
> > https://urldefense.com/v3/__https://lists.einsteintoolkit.org/pipermail/users/2021-February/007792.html__;!!DZ3fjg!7i12qoKBFzgYKbEGMk2b6YLNEpw3ozT2F8ckbK4GcAz2DyvjiD9kRk44tlGqfH28gU8BGYKS0zxiiH9Bokui$ 
> >
> > maybe also:
> >
> >
> > https://urldefense.com/v3/__https://bitbucket.org/einsteintoolkit/tickets/issues/2599/nsnstohmns-cannot-be-reproduced-using__;!!DZ3fjg!7i12qoKBFzgYKbEGMk2b6YLNEpw3ozT2F8ckbK4GcAz2DyvjiD9kRk44tlGqfH28gU8BGYKS0zxiiDTjn80-$ 
> >
> >
> > https://urldefense.com/v3/__https://bitbucket.org/einsteintoolkit/tickets/issues/2516/different-simulation-result-with-different__;!!DZ3fjg!7i12qoKBFzgYKbEGMk2b6YLNEpw3ozT2F8ckbK4GcAz2DyvjiD9kRk44tlGqfH28gU8BGYKS0zxiiDGaf7Zj$ 
> >
> > Yours,
> > Roland
> >  
> > > Dear Steve,
> > >
> > > Thank you for your reply. I tried the same simulation with a finer grid,
> > > and the simulation started working fine, even though very slow (looks  
> > like  
> > > due to slow inter-node communication), but it did work out. I could see a
> > > few iterations towards the final couple of hours from the wall time.
> > >
> > > Turns out, a simulation with GRhydro, in such cases (where the grid needs
> > > to be finer), would end with an error saying, "*the grid structure
> > > inconsistent. Impossible to continue*". On the other hand, a simulation
> > > with IllinoisGRMHD stops abruptly during the thorn setup (somewhere  
> > around  
> > > the SpaceMask and AHFinderDirect setup).
> > >
> > > Later I tried to see if I can pace up the simulation, but looks like the
> > > inter-node communication is very slow in the HPC, which may be an  
> > inherent  
> > > problem with the HPC since it is a very old one.
> > >
> > > Regards
> > > Shamim Haque
> > > Senior Research Fellow (SRF)
> > > Department of Physics
> > > IISER Bhopal
> > >
> > > ᐧ
> > >
> > > On Tue, May 23, 2023 at 10:08 PM Steven R. Brandt <sbrandt at cct.lsu.edu>
> > > wrote:
> > >  
> > > > Sorry that no one has replied to you in a while. Are you still
> > > > experiencing this difficulty?
> > > >
> > > > --Steve
> > > > On 4/4/2023 3:08 AM, Shamim Haque 1910511 wrote:
> > > >
> > > > Dear Steven,
> > > >
> > > > I assure you that I submitted the simulation for the first time only. I
> > > > used "sim create-submit" to submit the simulation, which would not  
> > submit  
> > > > the job if the same name was executed earlier.
> > > >
> > > > Secondly, I found this same message appearing in the output files from
> > > > debug queue (1 node, with GRHydro) and high memory node (3 nodes, with
> > > > IllinoisGRMHD), here the simulation ran successfully. I have attached  
> > the  
> > > > output files for reference.
> > > >
> > > > Regards
> > > > Shamim Haque
> > > > Senior Research Fellow (SRF)
> > > > Department of Physics
> > > > IISER Bhopal
> > > >
> > > > ᐧ
> > > >
> > > > On Tue, Apr 4, 2023 at 12:35 AM Steven R. Brandt <sbrandt at cct.lsu.edu>
> > > > wrote:
> > > >  
> > > >> I see this error message in your output:
> > > >>  
> > > >>   -> [0m No HDF5 checkpoint files with basefilename  
> > 'checkpoint.chkpt'  
> > > >> and file extension '.h5' found in recovery directory
> > > >> 'nsns_toy1.2_DDME2BPS_quark_1.2vs1.6M_40km_g25'
> > > >>
> > > >> I suspect you did a "sim submit" for a job, got a failure, and did a
> > > >> second "sim submit" without purging. That immediately triggered the  
> > error.  
> > > >> Then, for some reason, MPI didn't shut down cleanly and the processes  
> > hung  
> > > >> doing nothing until they used up the walltime.
> > > >>
> > > >> --Steve
> > > >> On 4/2/2023 5:16 AM, Shamim Haque 1910511 wrote:
> > > >>
> > > >> Hello,
> > > >>
> > > >> I am trying to run BNSM using IllinoisGRMHD on HPC Kanad at IISER  
> > Bhopal.  
> > > >> While I have tested the parfile to be running fine on debug queue (1  
> > node)  
> > > >> and high memory queue (3 nodes), I am unable to run the simulation in  
> > a  
> > > >> queue with 9 nodes (144 cores).
> > > >>
> > > >> The output file suggests that the setup of listed thorns is not  
> > complete  
> > > >> within 24 hours, which is the max walltime for this queue.
> > > >>
> > > >> Is there a way to sort out this issue? I have attached the parfile and
> > > >> outfile for reference.
> > > >>
> > > >> Regards
> > > >> Shamim Haque
> > > >> Senior Research Fellow (SRF)
> > > >> Department of Physics
> > > >> IISER Bhopal
> > > >> ᐧ
> > > >>
> > > >> _______________________________________________
> > > >> Users mailing listUsers at einsteintoolkit.orghttp://  
> > lists.einsteintoolkit.org/mailman/listinfo/users  
> > > >>
> > > >> _______________________________________________
> > > >> Users mailing list
> > > >> Users at einsteintoolkit.org
> > > >>  
> > https://urldefense.com/v3/__http://lists.einsteintoolkit.org/mailman/listinfo/users__;!!DZ3fjg!_x567GYN6TSCHGzd9qNq7I2vnukVIdWuWrpvklLkBiR2voNBEMX99OkQxtvGmuazb6nd9jcdqRNh8C_eiuyn$  
> > > >>  
> > > >  
> >
> > --
> > My email is as private as my paper mail. I therefore support encrypting
> > and signing email messages. Get my PGP key from https://urldefense.com/v3/__http://pgp.mit.edu__;!!DZ3fjg!7i12qoKBFzgYKbEGMk2b6YLNEpw3ozT2F8ckbK4GcAz2DyvjiD9kRk44tlGqfH28gU8BGYKS0zxiiHEsY4GE$  .
> >  


Yours,
Roland

-- 
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.einsteintoolkit.org/pipermail/users/attachments/20230606/f612bdcf/attachment.sig>


More information about the Users mailing list