[ET Trac] [Einstein Toolkit] #2200: some test may be non-deterministic

Tue Oct 9 10:42:59 CDT 2018

#2200: some test may be non-deterministic
--------------------------+---------------------------------
  Reporter:  Roland Haas  |      Owner:  (none)
      Type:  defect       |     Status:  new
  Priority:  major        |  Milestone:
 Component:  Cactus       |    Version:  development version
Resolution:               |   Keywords:
--------------------------+---------------------------------

Comment (by Roland Haas):

 On comet the log file for the failed test contains:
 {{{
 INFO (CarpetIOHDF5): reading grid variables on mglevel 0 reflevel 0
 HDF5-DIAG: Error detected in HDF5 (1.8.14) MPI-process 0:
   #000: H5Dio.c line 173 in H5Dread(): can't read data
     major: Dataset
     minor: Read failed
   #001: H5Dio.c line 550 in H5D__read(): can't read data
     major: Dataset
     minor: Read failed
   #002: H5Dchunk.c line 1872 in H5D__chunk_read(): unable to read raw data
 chunk
     major: Low-level I/O
     minor: Read failed
   #003: H5Dchunk.c line 2902 in H5D__chunk_lock(): data pipeline read
 failed
     major: Data filters
     minor: Filter operation failed
   #004: H5Z.c line 1382 in H5Z_pipeline(): filter returned failure during
 read
     major: Data filters
     minor: Read failed
   #005: H5Zdeflate.c line 136 in H5Z_filter_deflate(): memory allocation
 failed for deflate uncompression
     major: Resource unavailable
     minor: No space available for allocation
 WARNING[L1,P0] (CarpetIOHDF5): HDF5 call 'H5Dread(dataset, datatype,
 memspace, filespace, xfer, cctkGH->data[patch->vindex][timelevel])'
 returned error code -1
 }}}
 indicating an issue with the HDF5 library. There certainly should be no
 issue with running out of memory since the total dataset size in the file
 in question (checkpointML-EE/checkpoint.chkpt.it_1.h5) is quite small:
 {{{
 h5ls -v checkpointML-EE/checkpoint.chkpt.it_1.h5 | gawk '/logical
 bytes/{sum += $2} END{print sum/1e6}'
 19.9976
 }}}
 is only about 20MB. It seems more likely that there is (again) a bug in
 HDF5's gzip code (see #1878).

 We had issues with Comet's file system in the past (#2073) in relation
 with writing and immediately reading files which is more or less what is
 happening for the testsuite data though this does not seem related.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/2200#comment:4>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit