[Users] Recovery fails with out-of-memory
Ian Hinder
ian.hinder at aei.mpg.de
Sat Sep 21 11:13:23 CDT 2013
Hi,
Does anyone have a suggestion for how I can make a run recover if HDF5 claims to be out of memory? The original run was on 1056 cores of stampede. Recovery fails both on the same number of cores as well as on 2048. Would a newer version of HDF5 help? Is there some parameter I can set? I am already using CarpetIOHDF5::open_one_input_file_at_a_time = yes. I am using CarpetIOHDF5::compression_level = 9. I wonder if the problem is made worse by having to decompress when reading the file. Logically, I don't see any reason why HDF5 should need more memory to recover than the original run used.
> HDF5-DIAG: Error detected in HDF5 (1.8.10-patch1) thread 0HDF5-DIAG: Error detected in HDF5 (1.8.10-patch1) thread 0:
> #000: H5Dio.c line 174 in H5Dread(): can't read data
> major: Dataset
> minor: Read failed
> #001: H5Dio.c line 449 in H5D__read(): can't read data
> major: Dataset
> minor: Read failed
> #002: H5Dchunk.c line 1735 in H5D__chunk_read(): unable to read raw data chunk
> major: Low-level I/O
> minor: Read failed
> #003: H5Dchunk.c line 2766 in H5D__chunk_lock(): data pipeline read failed
> major: Data filters
> minor: Filter operation failed
> #004: H5Z.c line 1120 in H5Z_pipeline(): filter returned failure during read
> major: Data filters
> minor: Read failed
> #005: H5Zdeflate.c line 136 in H5Z_filter_deflate(): memory allocation failed for deflate uncompression
> major: Resource unavailable
> minor: No space available for allocation
> WARNING[L1,P254] (CarpetIOHDF5): :
> HDF5 call 'H5Dread (dataset, datatype, memspace, filespace, xfer, cctkGH->data[patch->vindex][timelevel])' returned error code -1
--
Ian Hinder
http://numrel.aei.mpg.de/people/hinder
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: Message signed with OpenPGP using GPGMail
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20130921/664013e4/attachment.bin
More information about the Users
mailing list