[Users] Another restart from checkpoint failure

Oleg Korobkin korobkin at phys.lsu.edu
Thu Feb 17 10:01:18 CST 2011


Hi Jakob,

I had the same problem some time ago. Since your HDF5 files are
compressed, Cactus tries to unpack them before reading, and this
operation somehow is very memory-intensive. Try to repack your
checkpoint files before restarting to reduce compression level to zero.
You can use the standard h5repack utility for that. Here's a small
script to uncompress all checkpoint files and put them to an $OUTDIR
directory:

----------------
#!/bin/bash

ITER=5555            # which iteration to use
OUTDIR=$SCRATCH/tmp  # output directory
for f in checkpoint.chkpt.it_${ITER}.file_*.h5; do
echo processing $f...
  h5repack -i $f -o $OUTDIR/$f -f NONE
done
----------------

This will make your checkpoint files 5-10 times larger, but because they
don't need to be unpacked in RAM, less memory is required to read them.

Also, try the following:
1. set:
   CarpetIOHDF5::open_one_input_file_at_a_time = yes
   IO::abort_on_io_errors = yes
2. use less cores per node.
3. try running on larger number of processors.

Cheers,
 - Oleg Korobkin

Frank Loeffler wrote:
> Hi,
> 
> On Thu, Feb 17, 2011 at 03:24:04PM +0900, Jakob Hansen wrote:
>>   #005: H5Zdeflate.c line 133 in H5Z_filter_deflate(): memory allocation
>> failed for deflate uncompression
>>     major: Resource unavailable
>>     minor: No space available for allocation
> 
>> Any ideas for possible cause and solution to this?
> 
> Looking at these error messages I suggest you first do an h5ls/h5dump to
> see if the files are actually ok. If that is so, what I would suspect next
> is that reading the files takes more memory than is available, as indicated
> by the message above. In this case I suggest to try the workarounds
> mentioned in this thread:
> 
> http://lists.einsteintoolkit.org/pipermail/users/2011-February/000852.html
> 
> If this also doesn't help I would suggest to try again using more
> available memory, e.g. using more nodes.
> 
> Frank
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users



More information about the Users mailing list