[Users] Cactus & HDF5: checkpoint recovery failure with ?
Bernard Kelly
physicsbeany at gmail.com
Mon Feb 1 14:39:25 CST 2016
Hi all. I'm having checkpoint/recovery issues with a particular simulation:
An initial short run stopped some time after iteration 32000, leaving
me with checkpoints at it 30000 & 32000. I found I couldn't recover
from the later of these, but as the earlier one *did* allow recovery,
I didn't worry too much about it.
Now the recovered run went until some time after it 124000. I again
have two sets of checkpoint data, from it 122000 and 124000. *Neither*
of these work. I could imagine the later one being corrupted somehow
because of disk space issues, but both?
In each case, the error output in the STDERR consists of multiple
instances of the message below.
* Is this likely due to file corruption?
* What's the best way to check CarpetIOHDF5 files for corruption?
* Can I do anything about this particular run, apart from start
(again) from the "good" 30000 checkpoint?
Thanks,
Bernard
----------------------------------------------------------------------------------------
HDF5-DIAG: Error detected in HDF5 (1.8.8) thread 0:
#000: H5Gdeprec.c line 875 in H5Gget_objinfo(): cannot stat object
major: Invalid arguments to routine
minor: Unable to initialize object
#001: H5Gdeprec.c line 1002 in H5G_get_objinfo(): name doesn't exist
major: Symbol table
minor: Object already exists
#002: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#003: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal operator failed
major: Symbol table
minor: Callback failed
#004: H5Gdeprec.c line 926 in H5G_get_objinfo_cb(): unable to get object info
major: Object header
minor: Can't get value
#005: H5O.c line 2789 in H5O_get_info(): unable to load object header
major: Object header
minor: Unable to protect metadata
#006: H5O.c line 1682 in H5O_protect(): unable to load object header
major: Object header
minor: Unable to protect metadata
#007: H5AC.c line 1322 in H5AC_protect(): H5C_protect() failed.
major: Object cache
minor: Unable to protect metadata
#008: H5C.c line 3567 in H5C_protect(): can't load entry
major: Object cache
minor: Unable to load metadata into cache
#009: H5C.c line 7957 in H5C_load_entry(): unable to load entry
major: Object cache
minor: Unable to load metadata into cache
#010: H5Ocache.c line 275 in H5O_load(): bad object header version number
major: Object header
minor: Wrong version number
WARNING level 1 from host r421i1n0.p4.nas.nasa.gov process 20
while executing schedule bin CCTK_RECOVER_VARIABLES, routine
IOUtil::IOUtil_RecoverGH
in thorn CarpetIOHDF5, file
/nobackupp8/bjkelly1/codes/Cactus.ET_2015_05/configs/vacuum/build/CarpetIOHDF5/Input.cc:1235:
-> HDF5 call 'H5Gget_objinfo (group, objectname, 0, &object_info)'
returned error code -1
--------------------------------------------------------------------------------------------------
More information about the Users
mailing list