[ET Trac] [Einstein Toolkit] #534: Checkpointing fails on LoneStar

Einstein Toolkit trac-noreply at einsteintoolkit.org
Fri Sep 9 22:57:10 CDT 2011


#534: Checkpointing fails on LoneStar
---------------------+------------------------------------------------------
  Reporter:  hinder  |       Owner:  eschnett    
      Type:  defect  |      Status:  new         
  Priority:  major   |   Milestone:              
 Component:  Carpet  |     Version:              
Resolution:          |    Keywords:  CarpetIOHDF5
---------------------+------------------------------------------------------

Comment (by knarf):

 Looking closer at my runs, I see that they sometimes (about once every one
 or two days of runtime) stall for some time. I just catched one of these
 times and attaching gdb revealed that the processes stalled in MPI calls
 within reductions called from IOASCII. Typically these stalls eventually
 continue after some time, but the "lost" time is clearly visible in a M/h
 plot. I am not sure if this is I/O or MPI hanging, both are involved in
 your and my case. I don't seem to have problems accessing the files
 though. Maybe it's worth trying openmpi after all. Ian: would you be
 interested to provide such a option list?

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/534#comment:13>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list