[ET Trac] [Einstein Toolkit] #64: Refactor/redesign archiving

Einstein Toolkit trac-noreply at einsteintoolkit.org
Sat Jul 30 05:35:31 CDT 2011


#64: Refactor/redesign archiving
-------------------------+--------------------------------------------------
  Reporter:  mthomas     |       Owner:  mthomas
      Type:  defect      |      Status:  new    
  Priority:  minor       |   Milestone:         
 Component:  SimFactory  |     Version:         
Resolution:              |    Keywords:         
-------------------------+--------------------------------------------------

Comment (by hinder):

 Archiving needs to take the following into account:

 * On some systems archiving cannot be performed only at the end of the
 simulation because the first restarts might be purged before that happens.
 On Kraken there is a 30-day purge policy and we have some simulations
 which have taken months.

 * Archiving can take a long time - longer than an interactive session on
 the login node can be expected to last.  Some systems, e.g. Kraken,
 provide a dedicated archiving queue.  We should use such a queue if it is
 available, or we could use "screen" on the login node if not.

 * There could be both a manual and an automatic archiving method.

 Here is a possible implementation:

 If a simulation is created with the --archive option, simfactory checks
 when each restart runs if there are any previous restarts which have not
 been archived and are not currently being archived.  If there are any, it
 submits a job to the archive queue which archives each restart.  Each
 restart would be tar/gzipped independently.  This is necessary because the
 simulation might not have finished yet.  It would be very convenient to be
 able to add the "archive" option to an existing simulation so that
 subsequent restarts will archive the whole simulation.  You often don't
 know which simulations are going to end up being long-lived until after a
 few restarts.

 For simulations which are not archived automatically, simfactory could
 provide an archive command which performed the archiving immediately.
 There could be variants to do this either immediately or using the
 queueing system.  It's probably best to again archive individual restarts,
 to keep the code as simple as possible and to only need to support a
 single archiving convention.

 There could also be a "restore" command which restored all the restarts of
 a simulation. This again might have to be run in an archive queue.

 There should be options to exclude specific files from archiving.  By
 default this would be checkpoint files only, but we could provide
 templates for 3D output files as well, as these often are not needed.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/64#comment:2>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list