[Users] Recovery fails with out-of-memory

Erik Schnetter schnetter at cct.lsu.edu
Sat Sep 21 12:44:02 CDT 2013


On 2013-09-21, at 12:13 , Ian Hinder <ian.hinder at aei.mpg.de> wrote:

> Hi,
> 
> Does anyone have a suggestion for how I can make a run recover if HDF5 claims to be out of memory?  The original run was on 1056 cores of stampede.  Recovery fails both on the same number of cores as well as on 2048.    Would a newer version of HDF5 help?  Is there some parameter I can set?  I am already using CarpetIOHDF5::open_one_input_file_at_a_time = yes.  I am using CarpetIOHDF5::compression_level = 9.  I wonder if the problem is made worse by having to decompress when reading the file.  Logically, I don't see any reason why HDF5 should need more memory to recover than the original run used.
> 
>> HDF5-DIAG: Error detected in HDF5 (1.8.10-patch1) thread 0HDF5-DIAG: Error detected in HDF5 (1.8.10-patch1) thread 0:
>>  #000: H5Dio.c line 174 in H5Dread(): can't read data
>>    major: Dataset
>>    minor: Read failed
>>  #001: H5Dio.c line 449 in H5D__read(): can't read data
>>    major: Dataset
>>    minor: Read failed
>>  #002: H5Dchunk.c line 1735 in H5D__chunk_read(): unable to read raw data chunk
>>    major: Low-level I/O
>>    minor: Read failed
>>  #003: H5Dchunk.c line 2766 in H5D__chunk_lock(): data pipeline read failed
>>    major: Data filters
>>    minor: Filter operation failed
>>  #004: H5Z.c line 1120 in H5Z_pipeline(): filter returned failure during read
>>    major: Data filters
>>    minor: Read failed
>>  #005: H5Zdeflate.c line 136 in H5Z_filter_deflate(): memory allocation failed for deflate uncompression
>>    major: Resource unavailable
>>    minor: No space available for allocation
>> WARNING[L1,P254] (CarpetIOHDF5): :
>> HDF5 call 'H5Dread (dataset, datatype, memspace, filespace, xfer, cctkGH->data[patch->vindex][timelevel])' returned error code -1


What is the size of this chunk? Each chunk needs to be decompressed at once; using a smaller chunk size may help. You may be able to modify the chunk size via h5repack (or disable decompression).

I think there are also options or properties one can set to determine how much memory HDF5 uses for its caches. I don't know much about these; an HDF5 mailing list or help ticket may tell you more.

-erik

-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/

My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu/.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: Message signed with OpenPGP using GPGMail
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20130921/a0462402/attachment.bin 


More information about the Users mailing list