[ET Trac] [Einstein Toolkit] #344: Checkpointing + new cleanup procedure

Einstein Toolkit trac-noreply at einsteintoolkit.org
Sat Mar 19 17:51:32 CDT 2011


#344: Checkpointing + new cleanup procedure
--------------------------+-------------------------------------------------
  Reporter:  mthomas      |       Owner:  mthomas
      Type:  enhancement  |      Status:  new    
  Priority:  blocker      |   Milestone:         
 Component:  SimFactory   |     Version:         
Resolution:               |    Keywords:         
--------------------------+-------------------------------------------------

Comment (by hinder):

 I have been using the previous patch which only fixed up checkpointing, so
 I can only comment on that.  I haven't tried this one.  With the previous
 patch, recovery works if the job is not currently in the queue.  If the
 job *is* in the queue, and is running, the new job is chained to the old
 one.  If the job in the queue is only in the Q state, however, it does not
 get detected and the new job is also put into the Q state, not the H
 state.  It might be that the logic for whether to chain or queue the job
 depends on whether it has run or not, which is not correct, I think.

 Barry: can you check if the following four cases work correctly with this
 patch?

 1. If no jobs for that simulation are in the queue;
 2. If a job is in the queue in the Q state;
 3. If a job is in the queue in the R state;
 4. If a job is in the queue in the H state (should only be the case if 2.
 is true as well, but best to check)

 My own tests are on Kraken, which can have long turn-around times, so you
 might want to test somewhere else like Damiana or Datura.  If it works on
 those machines, then the code is probably OK, and at worst the Kraken
 machine entry might need to be fixed up.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/344#comment:3>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list