[Users] Error to write PittNull during checkpointing

Yosef Zlochower yosef at astro.rit.edu
Fri Jul 27 08:26:11 CDT 2012


Hi,

  SphericalHarmonicDecomp should not be writing output at the
same time as a checkpoint. Are you using an NFS mount? I noticed
issues with NFS servers becoming unresponsive (due to a large number
of blocking io operations) during a checkpoint. Perhaps right after
a checkpoint, the server is still too busy.

This certainly sounds like an issue that needs to be fixed, although
I am not sure how. Perhaps a failed IO operation should be a fatal error
that kills the run.

Could you try setting the output for
metric_obs_0_Decomp.h5 so that it doesn't correspond to a the iteration
immediately before or after a checkpoint?



On 07/27/2012 03:59 AM, Jakob Hansen wrote:
> Hi,
>
> I'm experimenting a bit with using the PittNull /
> SphericalHarmonicDecomp thorn but I've been experiencing a few errors in
> writing to its output metric_obs_0_Decomp.h5 file.
>
> On some occations it seems that this output file cannot be opened and
> hence leaving gaps in the data file. This always occurs when carpet is
> doing checkpoininting.  Everything else runs well, though, and the
> simulations finishes fine. However, when trying to fftwfilter the
> metric_obs_0_Decomp.h5 file, I notice the missing data points.
>
> I've checked system logs and there seem to be no hardware failure. Also,
> as far as I can see, I'm not overusing memory allocation either.
>
> Anyone else experienced similar issues?
>
> Here follows output from error file :
> -----------------------------------------------------------------------------------------
>
> HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
> #000: H5F.c line 1509 in H5Fopen(): unable to open file
> major: File accessability
> minor: Unable to open file
> #001: H5F.c line 1300 in H5F_open(): unable to read superblock
> major: File accessability
> minor: Read failed
> #002: H5Fsuper.c line 324 in H5F_super_read(): unable to load superblock
> major: Object cache
> minor: Unable to protect metadata
> #003: H5AC.c line 1597 in H5AC_protect(): H5C_protect() failed.
> major: Object cache
> minor: Unable to protect metadata
> #004: H5C.c line 3333 in H5C_protect(): can't load entry
> major: Object cache
> minor: Unable to load metadata into cache
> #005: H5C.c line 8177 in H5C_load_entry(): unable to load entry
> major: Object cache
> minor: Unable to load metadata into cache
> #006: H5Fsuper_cache.c line 469 in H5F_sblock_load(): truncated file
> major: File accessability
> minor: File has been truncated
> HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
> #000: H5Gdeprec.c line 214 in H5Gcreate1(): not a location
> major: Invalid arguments to routine
> minor: Inappropriate type
> #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
> major: Invalid arguments to routine
> minor: Bad value
> HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
> #000: H5Adeprec.c line 153 in H5Acreate1(): not a location
> major: Invalid arguments to routine
> minor: Inappropriate type
> #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
> major: Invalid arguments to routine
> minor: Bad value
> [etc. etc. etc. ...... ]
> -------------------------------------------
>
> Cheers,
> Jakob
>
>
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users


-- 
Dr. Yosef Zlochower
Center for Computational Relativity and Gravitation
Assistant Professor
School of Mathematical Sciences
Rochester Institute of Technology
85 Lomb Memorial Drive
Rochester, NY 14623

Office:74-2067
Phone: +1 585-475-6103

yosef at astro.rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including
attachments, is intended only for the person(s) or entity to which it
is addressed and may contain confidential and/or privileged material.
Any review, retransmission, dissemination or other use of, or taking
of any action in reliance upon this information by persons or entities
other than the intended recipient is prohibited. If you received this
in error, please contact the sender and destroy any copies of this
information.


More information about the Users mailing list