[ET Trac] [Einstein Toolkit] #30: Implement a "get" command

Fri Feb 22 01:28:43 CST 2013

#30: Implement a "get" command
--------------------------+-------------------------------------------------
  Reporter:  hinder       |       Owner:  mthomas
      Type:  enhancement  |      Status:  new    
  Priority:  minor        |   Milestone:         
 Component:  SimFactory   |     Version:         
Resolution:               |    Keywords:         
--------------------------+-------------------------------------------------

Comment (by hinder):

 We will need some way to ensure that the retrieved data is in a consistent
 state.  Truncated ASCII files can be dealt with, though this is not ideal,
 but partially-written HDF5 files cannot.  This can be a serious problem if
 several HDF5 files are being synced, as after each sync, there can be a
 high probability that at least one of them is incompletely written.  Some
 options:

 1. SimFactory (on the remote machine) writes a control file (either into
 the simulation or somewhere else) which indicates to the simulation that
 it should not open any new files for writing, and once it has closed any
 currently open file as part of normal operation, it should record this
 information in the control file, continue running, and only write new
 files once the control file tells it to.  This has the disadvantage of
 locking the entire simulation for the duration of the transfer of the
 active restart.  For slow data transfers, this could be a significant
 amount of time.  This approach has the disadvantage that the different
 files will not be in a consistent state; e.g. one output file may have the
 current iteration but another may not.

 2. Before writing a file, Cactus would move it to a new location
 (file.tmp), and only move it back when it was fully written.  SimFactory
 would not sync tmp files.  Any files which had been renamed to *.tmp in
 the first pass would then be synced in a separate pass using their
 original names, if they exist.  Repeat until all files are synced.  This
 solution also does not maintain a consistent state across multiple files.
 This does not require write access to the simulation directory, so could
 also be used by collaborators who do not own the simulation.

 3. Similar to (1), but only performed at the end of an iteration.
 SimFactory would indicate to Cactus to pause the simulation at the end of
 the current iteration, when all files are presumably valid on disk.
 Cactus would indicate that the simulation had paused in a control file,
 and SimFactory would then transfer the data, and unpause the simulation
 when it was finished.  This would guarantee that the synced data was in a
 consistent state.  We might want to have some mechanism to ensure that
 simulations do not remain paused forever, perhaps by requiring simfactory
 to update the control file periodically if it is still syncing.

 All of the above apply only to the active restart.  I think (3) is the
 simplest and most robust.  It is also the most expensive in SUs.  The
 control file location could be customisable, and placed somewhere that all
 collaborators have write access.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/30#comment:1>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit