[Users] std::out_of_range error while checkpointing

Erik Schnetter schnetter at cct.lsu.edu
Fri Aug 7 09:08:11 CDT 2015


Miguel

Thanks for tracking this down. I believe this code looks already different
on the current master. Can you just disable it, as you did? The set of
active grid points is important for some very new post-processing and
visualization tools, but you probably don't need it, and you don't need it
in a checkpoint file unless you want to visualize the data stored there.

-erik


On Fri, Aug 7, 2015 at 8:31 AM, Miguel Zilhão <mzilhao at ffn.ub.es> wrote:

> Erik,
>
> some more info. i briefly went through git history and found the offending
> commit. up to Carpet commit d52042b867eea1e0770b6de87ff3929ab7b9d297
> everything works fine for me. from commit
> 69c73fb12d2ee41f01928122cd178d1bae9f8e13 onward i get the error i mentioned.
>
> indeed, in the current state, if i remove writing of the "active"
> attribute everything works just fine:
>
> diff --git a/CarpetIOHDF5/src/Output.cc b/CarpetIOHDF5/src/Output.cc
> index ec9b090..30c017d 100644
> --- a/CarpetIOHDF5/src/Output.cc
> +++ b/CarpetIOHDF5/src/Output.cc
> @@ -894,10 +894,11 @@ static int AddAttributes (const cGH *const cctkGH,
> const char *fullname,
>      HDF5_ERROR (H5Awrite (attr, H5T_NATIVE_INT, &ioffset[0]));
>           HDF5_ERROR (H5Aclose (attr));
>
> -    ostringstream buf;
> -    buf << (vdd.at(Carpet::map)->
> -            local_boxes.at
> (mglevel).at(refinementlevel).at(component).active);
> -    WriteAttribute(dataset, "active", buf.str().c_str());
> +  //   ostringstream buf;
> +  //   buf << (vdd.at(Carpet::map)->
> +  //           local_boxes.at
> (mglevel).at(refinementlevel).at(component).active);
> +  //   WriteAttribute(dataset, "active", buf.str().c_str());
> +
>    }
>
>    if (is_index) {
>
>
> i can't tell why it breaks once that bit of code is in, though...
>
> thanks,
> Miguel
>
>
>
> On 06/08/15 15:43, Miguel Zilhão wrote:
>
>> hi Erik,
>>
>> Can you look at <https://trac.einsteintoolkit.org/ticket/1800>? Can you
>>> try replacing "reflevel" in
>>> line 816 of the file CarpetIOHDF5/src/Output.cc with "refinementlevel",
>>> and see whether this avoids
>>> the problem?
>>>
>>
>> thanks, but this does not seem to help... i still get the same error and
>> backtrace in gdb:
>>
>> #6  0x00007ffff4f6d4d5 in std::__throw_out_of_range_fmt(char const*, ...)
>> ()
>>      from /usr/lib64/libstdc++.so.6
>> #7  0x0000000000c70402 in _M_range_check (__n=<optimized out>,
>>       this=<optimized out>) at /usr/include/c++/5/bits/stl_vector.h:803
>> #8  at (__n=<optimized out>, this=<optimized out>)
>>       at /usr/include/c++/5/bits/stl_vector.h:824
>> #9  CarpetIOHDF5::AddAttributes (cctkGH=cctkGH at entry=0x1b1b930,
>>       fullname=fullname at entry=0x3e52d760 "ML_BSSN::cA", vdim=3,
>>       refinementlevel=refinementlevel at entry=0, request=request at entry
>> =0x3df4c990,
>>       bbox=..., dataset=83886080, is_index=false)
>>       at
>> /home/mzilhao/Trabalho/projectos/ET/Cactus/arrangements/Carpet/CarpetIOHDF5/src/Output.cc:899
>> #10 0x0000000000c72e5a in CarpetIOHDF5::WriteVarChunkedParallel (
>>       cctkGH=cctkGH at entry=0x1b1b930, outfile=outfile at entry=167
>>
>>
>> Miguel
>>
>>
>> On Wed, Aug 5, 2015 at 8:11 PM, Miguel Zilhão <mzilhao at ffn.ub.es <mailto:
>>> mzilhao at ffn.ub.es>> wrote:
>>>
>>>      hi all,
>>>
>>>      i'm running latest ET Hilbert on openSUSE tumbleweed and i'm having
>>> the following issue. upon
>>>      trying to run a simple head-on collision configuration with
>>> McLachlan (attached parameter file),
>>>      i get the error
>>>
>>>         INFO (CarpetIOHDF5):
>>> ---------------------------------------------------------
>>>         INFO (CarpetIOHDF5): Dumping initial checkpoint at iteration 0,
>>> simulation time 0
>>>         INFO (CarpetIOHDF5):
>>> ---------------------------------------------------------
>>>         terminate called after throwing an instance of
>>> 'std::out_of_range'
>>>           what():  vector::_M_range_check: __n (which is 1) >=
>>> this->size() (which is 1)
>>>           Rank 1 with PID 5958 received signal 6
>>>
>>>      when writing the checkpoint file.
>>>      this only happens if i run with more than one MPI process; with a
>>> single processor it runs fine.
>>>
>>>      i'm compiling with gcc-5, but i find the same problem with gcc-4.8.
>>> i was running this very same
>>>      configuration just fine a couple of months ago, so it must have
>>> been some update i've made in
>>>      the meantime (either to my OS or to ET).
>>>      i've also tried with different configurations and the outcome is
>>> the same.
>>>
>>>      i've ran this through gdb, here's the relevant output:
>>>
>>>
>>>      #6  0x00007ffff4f6d4d5 in std::__throw_out_of_range_fmt(char
>>> const*, ...) ()
>>>          from /usr/lib64/libstdc++.so.6
>>>      #7  0x00000000005bb398 in _M_range_check (__n=<optimized out>,
>>>           this=<optimized out>) at
>>> /usr/include/c++/5/bits/stl_vector.h:803
>>>      #8  at (__n=<optimized out>, this=<optimized out>)
>>>           at /usr/include/c++/5/bits/stl_vector.h:824
>>>      #9  CarpetIOHDF5::AddAttributes (cctkGH=cctkGH at entry=0x1b507d0,
>>>           fullname=fullname at entry=0x3f2434a0 "ML_BSSN::cA", vdim=3,
>>>           refinementlevel=refinementlevel at entry=0, request=request at entry
>>> =0x3df96370,
>>>           bbox=..., dataset=83886080, is_index=false)
>>>           at
>>>
>>>  /home/mzilhao/Trabalho/projectos/ET/Cactus/arrangements/Carpet/CarpetIOHDF5/src/Output.cc:899
>>>      #10 0x00000000005bdea4 in CarpetIOHDF5::WriteVarChunkedParallel (
>>>           cctkGH=cctkGH at entry=0x1b507d0, outfile=outfile at entry=16777216,
>>>           io_bytes=@0x7fffffffc980: 1110772, request=0x3df96370,
>>>           called_from_checkpoint=called_from_checkpoint at entry
>>> =true,indexfile=indexfile at entry=-1)
>>>           at
>>>
>>>  /home/mzilhao/Trabalho/projectos/ET/Cactus/arrangements/Carpet/CarpetIOHDF5/src/Output.cc:706
>>>      #11 0x00000000005a233e in CarpetIOHDF5::Checkpoint
>>> (cctkGH=0x1b507d0,
>>>           called_from=0)
>>>           at
>>>
>>>  /home/mzilhao/Trabalho/projectos/ET/Cactus/arrangements/Carpet/CarpetIOHDF5/src/CarpetIOHDF5.cc:1277
>>>      #12 0x000000000041f0d5 in CCTK_CallFunction (
>>>           function=function at entry=0x5a2da0
>>> <CarpetIOHDF5::CarpetIOHDF5_InitialDataCheckpoint(cGH*)>,
>>>      fdata=fdata at entry=0x1b4a4e8, data=data at entry=0x1b507d0)
>>>           at
>>> /home/mzilhao/Trabalho/projectos/ET/Cactus/src/main/ScheduleInterface.c:312
>>>      #13 0x0000000000ef6499 in Carpet::CallScheduledFunction (
>>>           time_and_mode=time_and_mode at entry=0x1174842 "Meta mode",
>>>           function=function at entry=0x5a2da0
>>> <CarpetIOHDF5::CarpetIOHDF5_InitialDataCheckpoint(cGH*)>,
>>>      attribute=attribute at entry=0x1b4a4e8, data=data at entry=0x1b507d0,
>>>           user_timer=...)
>>>           at
>>>
>>>  /home/mzilhao/Trabalho/projectos/ET/Cactus/arrangements/Carpet/Carpet/src/CallFunction.cc:380
>>>
>>>
>>>      so the relevant bits of code seem to be in
>>> CarpetIOHDF5/src/Output.cc:706 and
>>>      CarpetIOHDF5/src/Output.cc:899
>>>
>>>      this seems to be triggered when writing hdf5 output in parallel. if
>>> i remove checkpointing the
>>>      run goes fine, and i do get regular 2D hdf5 output. this does not
>>> seem to be written in
>>>      parallel, though, as i get only one file per grid function/group.
>>> so it seems to be the parallel
>>>      output that triggers the crash.
>>>
>>>      i have also tried removing all my hdf5 libs and configuring ET with
>>> HDF5_DIR=BUILD, but the
>>>      outcome was the same.
>>>
>>>      has anyone seen such an error before? anything else i could provide
>>> to help diagnose this?
>>>
>>>      thanks,
>>>      Miguel
>>>
>>>      _______________________________________________
>>>      Users mailing list
>>>      Users at einsteintoolkit.org <mailto:Users at einsteintoolkit.org>
>>>      http://lists.einsteintoolkit.org/mailman/listinfo/users
>>>
>>>
>>>
>>>
>>> --
>>> Erik Schnetter <schnetter at cct.lsu.edu <mailto:schnetter at cct.lsu.edu>>
>>> http://www.perimeterinstitute.ca/personal/eschnetter/
>>>
>> _______________________________________________
>> Users mailing list
>> Users at einsteintoolkit.org
>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>
>>


-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20150807/bd17eb60/attachment-0001.html 


More information about the Users mailing list