[Users] Error to write PittNull during checkpointing

Yosef Zlochower yosef at astro.rit.edu
Fri Jul 27 16:39:20 CDT 2012


On 07/27/2012 12:45 PM, Erik Schnetter wrote:
> Jakob
>
> I have not heard about such a problem before.
>
> When an HDF5 file is not properly closed, its content may be corrupted.
> (This will be addressed in the next major release.) There may be two
> reasons for this: either the file is not closed (which would be an error
> in the code), or there is a write error (e.g. you run out of disk
> space). The latter is the major reason for people encountering corrupted
> HDF5 files. Since you don't see error messages, this is either not the
> case, or these HDF5 output routines suppress these errors.
>
> The thorn SphericalHarmonicDecomp implements its own HDF5 output
> routines and does not use Cactus. I see that it uses a non-standard way
> to determine whether the file exists, and that it does not check for
> errors when writing or closing. I think that HDF5 errors should cause
> prominent warnings in stdout and stderr (did you check?), and if you
> don't see these, the writing should have succeeded.
>
> You mention checkpointing. Are you experiencing these problems right
> after recovery, i.e. during the first SphericalHarmonicDecomp HDF5
> output afterwards? In this case, did you maybe switch to a new directory
> where this file doesn't exist?

When I ran CCE, I always restarted in a new directory and recombined
the hdf5 files after the run finished.

>
> If not, then it may be the non-standard way in which the code determines
> whether the file already exists, combined with something that may be
> special about your file system.
>
> (The "standard" way operates as follows: open the file as if it existed;
> if this fails, open it by creating it. The code works differently: it
> opens the file as binary file. If this fails, the HDF5 file is created;
> if it succeeds, the file is closed and re-openend as HDF5 file. Maybe
> the quick closing-then-reopening causes problems?)

I probably should fix this. BTW, carpetIOHDF5 doesn't do this.
Instead, it uses  H5Fis_hdf5(filename) >0;. I'll try to work on
cleaning the code.

>
> -erik
>
> On Friday, July 27, 2012, Jakob Hansen wrote:
>
>     Hi,
>
>     I'm experimenting a bit with using the PittNull /
>     SphericalHarmonicDecomp thorn but I've been experiencing a few
>     errors in writing to its output metric_obs_0_Decomp.h5 file.
>
>     On some occations it seems that this output file cannot be opened
>     and hence leaving gaps in the data file. This always occurs when
>     carpet is doing checkpoininting.  Everything else runs well, though,
>     and the simulations finishes fine. However, when trying to
>     fftwfilter the  metric_obs_0_Decomp.h5 file, I notice the missing
>     data points.
>
>     I've checked system logs and there seem to be no hardware failure.
>     Also, as far as I can see, I'm not overusing memory allocation either.
>
>     Anyone else experienced similar issues?
>
>     Here follows output from error file :
>     -----------------------------------------------------------------------------------------
>
>     HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
>     #000: H5F.c line 1509 in H5Fopen(): unable to open file
>     major: File accessability
>     minor: Unable to open file
>     #001: H5F.c line 1300 in H5F_open(): unable to read superblock
>     major: File accessability
>     minor: Read failed
>     #002: H5Fsuper.c line 324 in H5F_super_read(): unable to load superblock
>     major: Object cache
>     minor: Unable to protect metadata
>     #003: H5AC.c line 1597 in H5AC_protect(): H5C_protect() failed.
>     major: Object cache
>     minor: Unable to protect metadata
>     #004: H5C.c line 3333 in H5C_protect(): can't load entry
>     major: Object cache
>     minor: Unable to load metadata into cache
>     #005: H5C.c line 8177 in H5C_load_entry(): unable to load entry
>     major: Object cache
>     minor: Unable to load metadata into cache
>     #006: H5Fsuper_cache.c line 469 in H5F_sblock_load(): truncated file
>     major: File accessability
>     minor: File has been truncated
>     HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
>     #000: H5Gdeprec.c line 214 in H5Gcreate1(): not a location
>     major: Invalid arguments to routine
>     minor: Inappropriate type
>     #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
>     major: Invalid arguments to routine
>     minor: Bad value
>     HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
>     #000: H5Adeprec.c line 153 in H5Acreate1(): not a location
>     major: Invalid arguments to routine
>     minor: Inappropriate type
>     #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
>     major: Invalid arguments to routine
>     minor: Bad value
>     [etc. etc. etc. ...... ]
>     -------------------------------------------
>
>     Cheers,
>     Jakob
>
>
>
> --
> Erik Schnetter <schnetter at cct.lsu.edu <mailto:schnetter at cct.lsu.edu>>
> http://www.perimeterinstitute.ca/personal/eschnetter/
>
>
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users


-- 
Dr. Yosef Zlochower
Center for Computational Relativity and Gravitation
Assistant Professor
School of Mathematical Sciences
Rochester Institute of Technology
85 Lomb Memorial Drive
Rochester, NY 14623

Office:74-2067
Phone: +1 585-475-6103

yosef at astro.rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including
attachments, is intended only for the person(s) or entity to which it
is addressed and may contain confidential and/or privileged material.
Any review, retransmission, dissemination or other use of, or taking
of any action in reliance upon this information by persons or entities
other than the intended recipient is prohibited. If you received this
in error, please contact the sender and destroy any copies of this
information.


More information about the Users mailing list