[ET Trac] [Einstein Toolkit] #30: Implement a "get" command
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Fri Feb 22 01:28:43 CST 2013
#30: Implement a "get" command
--------------------------+-------------------------------------------------
Reporter: hinder | Owner: mthomas
Type: enhancement | Status: new
Priority: minor | Milestone:
Component: SimFactory | Version:
Resolution: | Keywords:
--------------------------+-------------------------------------------------
Comment (by hinder):
We will need some way to ensure that the retrieved data is in a consistent
state. Truncated ASCII files can be dealt with, though this is not ideal,
but partially-written HDF5 files cannot. This can be a serious problem if
several HDF5 files are being synced, as after each sync, there can be a
high probability that at least one of them is incompletely written. Some
options:
1. SimFactory (on the remote machine) writes a control file (either into
the simulation or somewhere else) which indicates to the simulation that
it should not open any new files for writing, and once it has closed any
currently open file as part of normal operation, it should record this
information in the control file, continue running, and only write new
files once the control file tells it to. This has the disadvantage of
locking the entire simulation for the duration of the transfer of the
active restart. For slow data transfers, this could be a significant
amount of time. This approach has the disadvantage that the different
files will not be in a consistent state; e.g. one output file may have the
current iteration but another may not.
2. Before writing a file, Cactus would move it to a new location
(file.tmp), and only move it back when it was fully written. SimFactory
would not sync tmp files. Any files which had been renamed to *.tmp in
the first pass would then be synced in a separate pass using their
original names, if they exist. Repeat until all files are synced. This
solution also does not maintain a consistent state across multiple files.
This does not require write access to the simulation directory, so could
also be used by collaborators who do not own the simulation.
3. Similar to (1), but only performed at the end of an iteration.
SimFactory would indicate to Cactus to pause the simulation at the end of
the current iteration, when all files are presumably valid on disk.
Cactus would indicate that the simulation had paused in a control file,
and SimFactory would then transfer the data, and unpause the simulation
when it was finished. This would guarantee that the synced data was in a
consistent state. We might want to have some mechanism to ensure that
simulations do not remain paused forever, perhaps by requiring simfactory
to update the control file periodically if it is still syncing.
All of the above apply only to the active restart. I think (3) is the
simplest and most robust. It is also the most expensive in SUs. The
control file location could be customisable, and placed somewhere that all
collaborators have write access.
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/30#comment:1>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list