[Users] Error to write PittNull during checkpointing

Erik Schnetter schnetter at cct.lsu.edu
Fri Jul 27 11:45:01 CDT 2012


Jakob

I have not heard about such a problem before.

When an HDF5 file is not properly closed, its content may be corrupted.
(This will be addressed in the next major release.) There may be two
reasons for this: either the file is not closed (which would be an error in
the code), or there is a write error (e.g. you run out of disk space). The
latter is the major reason for people encountering corrupted HDF5 files.
Since you don't see error messages, this is either not the case, or these
HDF5 output routines suppress these errors.

The thorn SphericalHarmonicDecomp implements its own HDF5 output routines
and does not use Cactus. I see that it uses a non-standard way to determine
whether the file exists, and that it does not check for errors when writing
or closing. I think that HDF5 errors should cause prominent warnings in
stdout and stderr (did you check?), and if you don't see these, the writing
should have succeeded.

You mention checkpointing. Are you experiencing these problems right after
recovery, i.e. during the first SphericalHarmonicDecomp HDF5 output
afterwards? In this case, did you maybe switch to a new directory where
this file doesn't exist?

If not, then it may be the non-standard way in which the code determines
whether the file already exists, combined with something that may be
special about your file system.

(The "standard" way operates as follows: open the file as if it existed; if
this fails, open it by creating it. The code works differently: it opens
the file as binary file. If this fails, the HDF5 file is created; if it
succeeds, the file is closed and re-openend as HDF5 file. Maybe the quick
closing-then-reopening causes problems?)

-erik

On Friday, July 27, 2012, Jakob Hansen wrote:

> Hi,
>
> I'm experimenting a bit with using the PittNull / SphericalHarmonicDecomp
> thorn but I've been experiencing a few errors in writing to its output
> metric_obs_0_Decomp.h5 file.
>
> On some occations it seems that this output file cannot be opened and
> hence leaving gaps in the data file. This always occurs when carpet is
> doing checkpoininting.  Everything else runs well, though, and the
> simulations finishes fine. However, when trying to fftwfilter the
> metric_obs_0_Decomp.h5 file, I notice the missing data points.
>
> I've checked system logs and there seem to be no hardware failure. Also,
> as far as I can see, I'm not overusing memory allocation either.
>
> Anyone else experienced similar issues?
>
> Here follows output from error file :
>
> -----------------------------------------------------------------------------------------
>
> HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
> #000: H5F.c line 1509 in H5Fopen(): unable to open file
> major: File accessability
> minor: Unable to open file
> #001: H5F.c line 1300 in H5F_open(): unable to read superblock
> major: File accessability
> minor: Read failed
> #002: H5Fsuper.c line 324 in H5F_super_read(): unable to load superblock
> major: Object cache
> minor: Unable to protect metadata
> #003: H5AC.c line 1597 in H5AC_protect(): H5C_protect() failed.
> major: Object cache
> minor: Unable to protect metadata
> #004: H5C.c line 3333 in H5C_protect(): can't load entry
> major: Object cache
> minor: Unable to load metadata into cache
> #005: H5C.c line 8177 in H5C_load_entry(): unable to load entry
> major: Object cache
> minor: Unable to load metadata into cache
> #006: H5Fsuper_cache.c line 469 in H5F_sblock_load(): truncated file
> major: File accessability
> minor: File has been truncated
> HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
> #000: H5Gdeprec.c line 214 in H5Gcreate1(): not a location
> major: Invalid arguments to routine
> minor: Inappropriate type
> #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
> major: Invalid arguments to routine
> minor: Bad value
> HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
> #000: H5Adeprec.c line 153 in H5Acreate1(): not a location
> major: Invalid arguments to routine
> minor: Inappropriate type
> #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
> major: Invalid arguments to routine
> minor: Bad value
> [etc. etc. etc. ...... ]
> -------------------------------------------
>
> Cheers,
> Jakob
>


-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20120727/31f9494b/attachment-0001.html 


More information about the Users mailing list