[ET Trac] [Einstein Toolkit] #1283: Missing data in HDF5 files
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Mon Mar 11 09:53:52 CDT 2013
#1283: Missing data in HDF5 files
---------------------+------------------------------------------------------
Reporter: hinder | Owner:
Type: defect | Status: new
Priority: major | Milestone:
Component: Cactus | Version:
Resolution: | Keywords:
---------------------+------------------------------------------------------
Comment (by hinder):
How about just introducing a convention that file should be renamed as
<file>.tmp.<extension> while they are being written to? If any such files
are present in a restart, those files are very likely to be corrupt, and
simfactory can make its decisions accordingly. We need to be careful to
avoid a conflict with any "safe file write" filename convention, as those
files are really temporary, and their existence does not imply that
anything important is corrupt.
Do you mean "checkpoints" or "restarts" where you have written "backups"?
Are you proposing to archive checkpoint files? Does anybody do this? For
a 500 core job, with 2 GB/core, this is 1 TB of data per checkpoint
iteration (maybe we don't checkpoint everything, so it could be maybe 1/2
that). I don't think the archive services would be very happy with us
regularly archiving TB of checkpoint data.
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1283#comment:11>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list