[Users] Thorn setup taking too much time in cluster

Steven R. Brandt sbrandt at cct.lsu.edu
Mon Apr 3 14:04:16 CDT 2023


I see this error message in your output:

   -> No HDF5 checkpoint files with basefilename 'checkpoint.chkpt' 
and file extension '.h5' found in recovery directory 
'nsns_toy1.2_DDME2BPS_quark_1.2vs1.6M_40km_g25'

I suspect you did a "sim submit" for a job, got a failure, and did a 
second "sim submit" without purging. That immediately triggered the 
error. Then, for some reason, MPI didn't shut down cleanly and the 
processes hung doing nothing until they used up the walltime.

--Steve

On 4/2/2023 5:16 AM, Shamim Haque 1910511 wrote:
> Hello,
>
> I am trying to run BNSM using IllinoisGRMHD on HPC Kanad at IISER 
> Bhopal. While I have tested the parfile to be running fine on debug 
> queue (1 node) and high memory queue (3 nodes), I am unable to run the 
> simulation in a queue with 9 nodes (144 cores).
>
> The output file suggests that the setup of listed thorns is not 
> complete within 24 hours, which is the max walltime for this queue.
>
> Is there a way to sort out this issue? I have attached the parfile and 
> outfile for reference.
>
> Regards
> Shamim Haque
> Senior Research Fellow (SRF)
> Department of Physics
> IISER Bhopal
>>
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20230403/58e999e3/attachment.html 


More information about the Users mailing list