[ET Trac] [Einstein Toolkit] #344: Checkpointing + new cleanup procedure
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Sat Mar 19 17:51:32 CDT 2011
#344: Checkpointing + new cleanup procedure
--------------------------+-------------------------------------------------
Reporter: mthomas | Owner: mthomas
Type: enhancement | Status: new
Priority: blocker | Milestone:
Component: SimFactory | Version:
Resolution: | Keywords:
--------------------------+-------------------------------------------------
Comment (by hinder):
I have been using the previous patch which only fixed up checkpointing, so
I can only comment on that. I haven't tried this one. With the previous
patch, recovery works if the job is not currently in the queue. If the
job *is* in the queue, and is running, the new job is chained to the old
one. If the job in the queue is only in the Q state, however, it does not
get detected and the new job is also put into the Q state, not the H
state. It might be that the logic for whether to chain or queue the job
depends on whether it has run or not, which is not correct, I think.
Barry: can you check if the following four cases work correctly with this
patch?
1. If no jobs for that simulation are in the queue;
2. If a job is in the queue in the Q state;
3. If a job is in the queue in the R state;
4. If a job is in the queue in the H state (should only be the case if 2.
is true as well, but best to check)
My own tests are on Kraken, which can have long turn-around times, so you
might want to test somewhere else like Damiana or Datura. If it works on
those machines, then the code is probably OK, and at worst the Kraken
machine entry might need to be fixed up.
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/344#comment:3>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list