[Users] Another restart from checkpoint failure
Oleg Korobkin
korobkin at cct.lsu.edu
Thu Feb 17 10:02:10 CST 2011
Hi Jakob,
I had the same problem some time ago. Since your HDF5 files are
compressed, Cactus tries to unpack them before reading, and this
operation somehow is very memory-intensive. Try to repack your
checkpoint files before restarting to reduce compression level to zero.
You can use the standard h5repack utility for that. Here's a small
script to uncompress all checkpoint files and put them to an $OUTDIR
directory:
----------------
#!/bin/bash
ITER=5555 # which iteration to use
OUTDIR=$SCRATCH/tmp # output directory
for f in checkpoint.chkpt.it_${ITER}.file_*.h5; do
echo processing $f...
h5repack -i $f -o $OUTDIR/$f -f NONE
done
----------------
This will make your checkpoint files 5-10 times larger, but because they
don't need to be unpacked in RAM, less memory is required to read them.
Also, try the following:
1. set:
CarpetIOHDF5::open_one_input_file_at_a_time = yes
IO::abort_on_io_errors = yes
2. use less cores per node.
3. try running on larger number of processors.
Cheers,
- Oleg Korobkin
Frank Loeffler wrote:
> Hi,
>
> On Thu, Feb 17, 2011 at 03:24:04PM +0900, Jakob Hansen wrote:
>> #005: H5Zdeflate.c line 133 in H5Z_filter_deflate(): memory allocation
>> failed for deflate uncompression
>> major: Resource unavailable
>> minor: No space available for allocation
>
>> Any ideas for possible cause and solution to this?
>
> Looking at these error messages I suggest you first do an h5ls/h5dump to
> see if the files are actually ok. If that is so, what I would suspect next
> is that reading the files takes more memory than is available, as indicated
> by the message above. In this case I suggest to try the workarounds
> mentioned in this thread:
>
> http://lists.einsteintoolkit.org/pipermail/users/2011-February/000852.html
>
> If this also doesn't help I would suggest to try again using more
> available memory, e.g. using more nodes.
>
> Frank
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
More information about the Users
mailing list