[ET Trac] [Einstein Toolkit] #64: Refactor/redesign archiving
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Sat Jul 30 05:35:31 CDT 2011
#64: Refactor/redesign archiving
-------------------------+--------------------------------------------------
Reporter: mthomas | Owner: mthomas
Type: defect | Status: new
Priority: minor | Milestone:
Component: SimFactory | Version:
Resolution: | Keywords:
-------------------------+--------------------------------------------------
Comment (by hinder):
Archiving needs to take the following into account:
* On some systems archiving cannot be performed only at the end of the
simulation because the first restarts might be purged before that happens.
On Kraken there is a 30-day purge policy and we have some simulations
which have taken months.
* Archiving can take a long time - longer than an interactive session on
the login node can be expected to last. Some systems, e.g. Kraken,
provide a dedicated archiving queue. We should use such a queue if it is
available, or we could use "screen" on the login node if not.
* There could be both a manual and an automatic archiving method.
Here is a possible implementation:
If a simulation is created with the --archive option, simfactory checks
when each restart runs if there are any previous restarts which have not
been archived and are not currently being archived. If there are any, it
submits a job to the archive queue which archives each restart. Each
restart would be tar/gzipped independently. This is necessary because the
simulation might not have finished yet. It would be very convenient to be
able to add the "archive" option to an existing simulation so that
subsequent restarts will archive the whole simulation. You often don't
know which simulations are going to end up being long-lived until after a
few restarts.
For simulations which are not archived automatically, simfactory could
provide an archive command which performed the archiving immediately.
There could be variants to do this either immediately or using the
queueing system. It's probably best to again archive individual restarts,
to keep the code as simple as possible and to only need to support a
single archiving convention.
There could also be a "restore" command which restored all the restarts of
a simulation. This again might have to be run in an archive queue.
There should be options to exclude specific files from archiving. By
default this would be checkpoint files only, but we could provide
templates for 3D output files as well, as these often are not needed.
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/64#comment:2>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list