[Users] std::out_of_range error while checkpointing

Erik Schnetter schnetter at cct.lsu.edu
Thu Aug 6 08:54:36 CDT 2015


Miguel

Can you look at <https://trac.einsteintoolkit.org/ticket/1800>? Can you try
replacing "reflevel" in line 816 of the file CarpetIOHDF5/src/Output.cc
with "refinementlevel", and see whether this avoids the problem?

-erik


On Wed, Aug 5, 2015 at 8:11 PM, Miguel Zilhão <mzilhao at ffn.ub.es> wrote:

> hi all,
>
> i'm running latest ET Hilbert on openSUSE tumbleweed and i'm having the
> following issue. upon trying to run a simple head-on collision
> configuration with McLachlan (attached parameter file), i get the error
>
>   INFO (CarpetIOHDF5):
> ---------------------------------------------------------
>   INFO (CarpetIOHDF5): Dumping initial checkpoint at iteration 0,
> simulation time 0
>   INFO (CarpetIOHDF5):
> ---------------------------------------------------------
>   terminate called after throwing an instance of 'std::out_of_range'
>     what():  vector::_M_range_check: __n (which is 1) >= this->size()
> (which is 1)
>     Rank 1 with PID 5958 received signal 6
>
> when writing the checkpoint file.
> this only happens if i run with more than one MPI process; with a single
> processor it runs fine.
>
> i'm compiling with gcc-5, but i find the same problem with gcc-4.8. i was
> running this very same configuration just fine a couple of months ago, so
> it must have been some update i've made in the meantime (either to my OS or
> to ET).
> i've also tried with different configurations and the outcome is the same.
>
> i've ran this through gdb, here's the relevant output:
>
>
> #6  0x00007ffff4f6d4d5 in std::__throw_out_of_range_fmt(char const*, ...)
> ()
>    from /usr/lib64/libstdc++.so.6
> #7  0x00000000005bb398 in _M_range_check (__n=<optimized out>,
>     this=<optimized out>) at /usr/include/c++/5/bits/stl_vector.h:803
> #8  at (__n=<optimized out>, this=<optimized out>)
>     at /usr/include/c++/5/bits/stl_vector.h:824
> #9  CarpetIOHDF5::AddAttributes (cctkGH=cctkGH at entry=0x1b507d0,
>     fullname=fullname at entry=0x3f2434a0 "ML_BSSN::cA", vdim=3,
>     refinementlevel=refinementlevel at entry=0, request=request at entry
> =0x3df96370,
>     bbox=..., dataset=83886080, is_index=false)
>     at
> /home/mzilhao/Trabalho/projectos/ET/Cactus/arrangements/Carpet/CarpetIOHDF5/src/Output.cc:899
> #10 0x00000000005bdea4 in CarpetIOHDF5::WriteVarChunkedParallel (
>     cctkGH=cctkGH at entry=0x1b507d0, outfile=outfile at entry=16777216,
>     io_bytes=@0x7fffffffc980: 1110772, request=0x3df96370,
>     called_from_checkpoint=called_from_checkpoint at entry
> =true,indexfile=indexfile at entry=-1)
>     at
> /home/mzilhao/Trabalho/projectos/ET/Cactus/arrangements/Carpet/CarpetIOHDF5/src/Output.cc:706
> #11 0x00000000005a233e in CarpetIOHDF5::Checkpoint (cctkGH=0x1b507d0,
>     called_from=0)
>     at
> /home/mzilhao/Trabalho/projectos/ET/Cactus/arrangements/Carpet/CarpetIOHDF5/src/CarpetIOHDF5.cc:1277
> #12 0x000000000041f0d5 in CCTK_CallFunction (
>     function=function at entry=0x5a2da0
> <CarpetIOHDF5::CarpetIOHDF5_InitialDataCheckpoint(cGH*)>, fdata=fdata at entry=0x1b4a4e8,
> data=data at entry=0x1b507d0)
>     at
> /home/mzilhao/Trabalho/projectos/ET/Cactus/src/main/ScheduleInterface.c:312
> #13 0x0000000000ef6499 in Carpet::CallScheduledFunction (
>     time_and_mode=time_and_mode at entry=0x1174842 "Meta mode",
>     function=function at entry=0x5a2da0
> <CarpetIOHDF5::CarpetIOHDF5_InitialDataCheckpoint(cGH*)>,
> attribute=attribute at entry=0x1b4a4e8, data=data at entry=0x1b507d0,
>     user_timer=...)
>     at
> /home/mzilhao/Trabalho/projectos/ET/Cactus/arrangements/Carpet/Carpet/src/CallFunction.cc:380
>
>
> so the relevant bits of code seem to be in CarpetIOHDF5/src/Output.cc:706
> and CarpetIOHDF5/src/Output.cc:899
>
> this seems to be triggered when writing hdf5 output in parallel. if i
> remove checkpointing the run goes fine, and i do get regular 2D hdf5
> output. this does not seem to be written in parallel, though, as i get only
> one file per grid function/group. so it seems to be the parallel output
> that triggers the crash.
>
> i have also tried removing all my hdf5 libs and configuring ET with
> HDF5_DIR=BUILD, but the outcome was the same.
>
> has anyone seen such an error before? anything else i could provide to
> help diagnose this?
>
> thanks,
> Miguel
>
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
>
>


-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20150806/c3efdfe2/attachment.html 


More information about the Users mailing list