[Users] std::out_of_range error while checkpointing

Miguel Zilhão mzilhao at ffn.ub.es
Wed Aug 5 19:11:12 CDT 2015

hi all,

i'm running latest ET Hilbert on openSUSE tumbleweed and i'm having the following issue. upon trying 
to run a simple head-on collision configuration with McLachlan (attached parameter file), i get the 

   INFO (CarpetIOHDF5): ---------------------------------------------------------
   INFO (CarpetIOHDF5): Dumping initial checkpoint at iteration 0, simulation time 0
   INFO (CarpetIOHDF5): ---------------------------------------------------------
   terminate called after throwing an instance of 'std::out_of_range'
     what():  vector::_M_range_check: __n (which is 1) >= this->size() (which is 1)
     Rank 1 with PID 5958 received signal 6

when writing the checkpoint file.
this only happens if i run with more than one MPI process; with a single processor it runs fine.

i'm compiling with gcc-5, but i find the same problem with gcc-4.8. i was running this very same 
configuration just fine a couple of months ago, so it must have been some update i've made in the 
meantime (either to my OS or to ET).
i've also tried with different configurations and the outcome is the same.

i've ran this through gdb, here's the relevant output:

#6  0x00007ffff4f6d4d5 in std::__throw_out_of_range_fmt(char const*, ...) ()
    from /usr/lib64/libstdc++.so.6
#7  0x00000000005bb398 in _M_range_check (__n=<optimized out>,
     this=<optimized out>) at /usr/include/c++/5/bits/stl_vector.h:803
#8  at (__n=<optimized out>, this=<optimized out>)
     at /usr/include/c++/5/bits/stl_vector.h:824
#9  CarpetIOHDF5::AddAttributes (cctkGH=cctkGH at entry=0x1b507d0,
     fullname=fullname at entry=0x3f2434a0 "ML_BSSN::cA", vdim=3,
     refinementlevel=refinementlevel at entry=0, request=request at entry=0x3df96370,
     bbox=..., dataset=83886080, is_index=false)
     at /home/mzilhao/Trabalho/projectos/ET/Cactus/arrangements/Carpet/CarpetIOHDF5/src/Output.cc:899
#10 0x00000000005bdea4 in CarpetIOHDF5::WriteVarChunkedParallel (
     cctkGH=cctkGH at entry=0x1b507d0, outfile=outfile at entry=16777216,
     io_bytes=@0x7fffffffc980: 1110772, request=0x3df96370,
     called_from_checkpoint=called_from_checkpoint at entry=true,indexfile=indexfile at entry=-1)
     at /home/mzilhao/Trabalho/projectos/ET/Cactus/arrangements/Carpet/CarpetIOHDF5/src/Output.cc:706
#11 0x00000000005a233e in CarpetIOHDF5::Checkpoint (cctkGH=0x1b507d0,
#12 0x000000000041f0d5 in CCTK_CallFunction (
     function=function at entry=0x5a2da0 <CarpetIOHDF5::CarpetIOHDF5_InitialDataCheckpoint(cGH*)>, 
fdata=fdata at entry=0x1b4a4e8, data=data at entry=0x1b507d0)
     at /home/mzilhao/Trabalho/projectos/ET/Cactus/src/main/ScheduleInterface.c:312
#13 0x0000000000ef6499 in Carpet::CallScheduledFunction (
     time_and_mode=time_and_mode at entry=0x1174842 "Meta mode",
     function=function at entry=0x5a2da0 <CarpetIOHDF5::CarpetIOHDF5_InitialDataCheckpoint(cGH*)>, 
attribute=attribute at entry=0x1b4a4e8, data=data at entry=0x1b507d0,
     at /home/mzilhao/Trabalho/projectos/ET/Cactus/arrangements/Carpet/Carpet/src/CallFunction.cc:380

so the relevant bits of code seem to be in CarpetIOHDF5/src/Output.cc:706 and 

this seems to be triggered when writing hdf5 output in parallel. if i remove checkpointing the run 
goes fine, and i do get regular 2D hdf5 output. this does not seem to be written in parallel, 
though, as i get only one file per grid function/group. so it seems to be the parallel output that 
triggers the crash.

i have also tried removing all my hdf5 libs and configuring ET with HDF5_DIR=BUILD, but the outcome 
was the same.

has anyone seen such an error before? anything else i could provide to help diagnose this?

-------------- next part --------------

# Basic Cactus parameters

Cactus::cctk_run_title                  = $parfile

Cactus::cctk_full_warnings              = yes
Cactus::highlight_warning_messages      = no

# Cactus::terminate                       = "any"
Cactus::terminate                       = "time"
# Cactus::cctk_final_time                 = 100.0
Cactus::cctk_final_time                 = 16.0
# Cactus::max_runtime                     = 5     #  5.0 min

# Basic thorns necessary to run
ActiveThorns = "AEILocalInterp"
ActiveThorns = "Fortran GSL HDF5"
ActiveThorns = "GenericFD"
ActiveThorns = "LocalInterp LoopControl Slab"
ActiveThorns = "InitBase IOUtil"

InitBase::initial_data_setup_method = "init_all_levels"

# Grid setup
ActiveThorns = "Boundary CartGrid3D CoordBase"
ActiveThorns = "ReflectionSymmetry SymBase"
#ActiveThorns = "RotatingSymmetry180"

CoordBase::domainsize                 = "minmax"

# make sure all (xmax - xmin)/dx are integers!
CoordBase::xmin                       =    0.00
CoordBase::ymin                       =    0.00
CoordBase::zmin                       =    0.00
CoordBase::xmax                       =  +16.2849
CoordBase::ymax                       =  +16.2849
CoordBase::zmax                       =  +16.2849
CoordBase::dx                         =    0.2857
CoordBase::dy                         =    0.2857
CoordBase::dz                         =    0.2857

CoordBase::boundary_size_x_lower      = 3
CoordBase::boundary_size_y_lower      = 3
CoordBase::boundary_size_z_lower      = 3
CoordBase::boundary_size_x_upper      = 3
CoordBase::boundary_size_y_upper      = 3
CoordBase::boundary_size_z_upper      = 3

CoordBase::boundary_shiftout_x_lower  = 1
CoordBase::boundary_shiftout_y_lower  = 1
CoordBase::boundary_shiftout_z_lower  = 1

CartGrid3D::type = "coordbase"

ReflectionSymmetry::reflection_x      = yes
ReflectionSymmetry::reflection_y      = yes
ReflectionSymmetry::reflection_z      = yes
ReflectionSymmetry::avoid_origin_x    = no
ReflectionSymmetry::avoid_origin_y    = no
ReflectionSymmetry::avoid_origin_z    = no

# Some Carpet parameters
ActiveThorns = "Carpet CarpetLib CarpetInterp CarpetReduce CarpetSlab"

Carpet::verbose                  = no
Carpet::veryverbose              = no
Carpet::schedule_barriers        = no
Carpet::storage_verbose          = no
#Carpet::timers_verbose           = no
CarpetLib::output_bboxes         = no

Carpet::domain_from_coordbase    = yes
Carpet::max_refinement_levels    = 5

driver::ghost_size               = 3
Carpet::use_buffer_zones         = yes

Carpet::prolongation_order_space = 5
Carpet::prolongation_order_time  = 2

Carpet::convergence_level        = 0

Carpet::init_fill_timelevels     = yes
Carpet::init_3_timelevels        = no

Carpet::poison_new_timelevels    = yes
CarpetLib::poison_new_memory     = yes

Carpet::output_timers_every      = 32
CarpetLib::print_timestats_every = 5120
CarpetLib::print_memstats_every  = 5120

Carpet::grid_structure_filename   = "carpet-grid-structure"
Carpet::grid_coordinates_filename = "carpet-grid-coordinates"

# Check for NaNs
ActiveThorns = "NaNChecker"
NaNChecker::check_every     =  512
#NaNChecker::verbose         = "all"
#NaNChecker::action_if_found = "just warn"
NaNChecker::action_if_found = "terminate"
NaNChecker::check_vars      = "

# Puncture tracking and regrid
ActiveThorns = "SphericalSurface"
ActiveThorns = "ADMBase CarpetRegrid2 PunctureTracker"
ActiveThorns = "CarpetTracker"

SphericalSurface::nsurfaces = 5

CarpetTracker::surface[0] = 0
CarpetTracker::surface[1] = 1

PunctureTracker::track                      [0] = "yes"
PunctureTracker::initial_z                  [0] = 3.001
PunctureTracker::which_surface_to_store_info[0] = 0
PunctureTracker::track                      [1] = "yes"
PunctureTracker::initial_z                  [1] = -3.001
PunctureTracker::which_surface_to_store_info[1] = 1
PunctureTracker::verbose                        = "no"

CarpetRegrid2::regrid_every            = 64
CarpetRegrid2::freeze_unaligned_levels = yes
CarpetRegrid2::symmetry_rotating180    = no
CarpetRegrid2::verbose                 = no

CarpetRegrid2::num_centres = 2

CarpetRegrid2::num_levels_1         =  5
CarpetRegrid2::position_z_1         =  3.0
CarpetRegrid2::radius_1[ 1]         =  8.0
CarpetRegrid2::radius_1[ 2]         =  2.0
CarpetRegrid2::radius_1[ 3]         =  1.0
CarpetRegrid2::radius_1[ 4]         =  0.5
CarpetRegrid2::movement_threshold_1 =   0.16

CarpetRegrid2::num_levels_2         =  5
CarpetRegrid2::position_z_2         = -3.0
CarpetRegrid2::radius_2[ 1]         =  8.0
CarpetRegrid2::radius_2[ 2]         =  2.0
CarpetRegrid2::radius_2[ 3]         =  1.0
CarpetRegrid2::radius_2[ 4]         =  0.5
CarpetRegrid2::movement_threshold_2 =   0.16

# ActiveThorns = "CarpetMask"

# CarpetMask::verbose = no

# CarpetMask::excluded_surface       [0] = 0
# CarpetMask::excluded_surface_factor[0] = 1.0

# CarpetMask::excluded_surface       [1] = 1
# CarpetMask::excluded_surface_factor[1] = 1.0

# CarpetMask::excluded_surface       [2] = 2
# CarpetMask::excluded_surface_factor[2] = 1.0

# Integration method
ActiveThorns = "MoL Time"

MoL::ODE_Method                 = "RK4"
MoL::MoL_Intermediate_Steps     = 4
MoL::MoL_Num_Scratch_Levels     = 1

Carpet::time_refinement_factors = "[1, 2, 4, 8, 16, 32]"

Time::dtfac                     = 0.5

# Initial data
ActiveThorns = "ADMCoupling ADMMacros CoordGauge SpaceMask StaticConformal TmunuBase"
ActiveThorns = "TwoPunctures"

ADMMacros::spatial_order     = 4

ADMBase::metric_type         = "physical"

ADMBase::initial_data        = "twopunctures"
ADMBase::initial_lapse       = "twopunctures-averaged"
ADMBase::initial_shift       = "zero"
ADMBase::initial_dtlapse     = "zero"
ADMBase::initial_dtshift     = "zero"

# needed for AHFinderDirect
ADMBase::metric_timelevels   = 3

TwoPunctures::swap_xz        = "yes"

TwoPunctures::par_b          =  3.001
TwoPunctures::par_m_plus     =  0.5
TwoPunctures::par_m_minus    =  0.5
TwoPunctures::par_P_plus [1] =  -0.0
TwoPunctures::par_P_minus[1] =  0.0

TwoPunctures::TP_epsilon     = 1.0e-10
TwoPunctures::TP_Tiny        = 1.0e-10

# TwoPunctures::verbose = yes

# Thorns for the evolution
ActiveThorns = "ML_BSSN ML_BSSN_Helper NewRad"
ActiveThorns = "ML_ADMConstraints"

ADMBase::evolution_method         = "ML_BSSN"
ADMBase::lapse_evolution_method   = "ML_BSSN"
ADMBase::shift_evolution_method   = "ML_BSSN"
ADMBase::dtlapse_evolution_method = "ML_BSSN"
ADMBase::dtshift_evolution_method = "ML_BSSN"

ML_BSSN::harmonicN                = 1      # 1+log
ML_BSSN::harmonicF                = 2.0    # 1+log
ML_BSSN::ShiftGammaCoeff          = 0.75
ML_BSSN::BetaDriver               = 1.0
ML_BSSN::LapseAdvectionCoeff      = 1.0
ML_BSSN::ShiftAdvectionCoeff      = 1.0

ML_BSSN::MinimumLapse             = 1.0e-8

ML_BSSN::my_initial_boundary_condition = "extrapolate-gammas"
ML_BSSN::my_rhs_boundary_condition     = "NewRad"
Boundary::radpower                     = 2

ML_BSSN::ML_log_confac_bound = "none"
ML_BSSN::ML_metric_bound     = "none"
ML_BSSN::ML_Gamma_bound      = "none"
ML_BSSN::ML_trace_curv_bound = "none"
ML_BSSN::ML_curv_bound       = "none"
ML_BSSN::ML_lapse_bound      = "none"
ML_BSSN::ML_dtlapse_bound    = "none"
ML_BSSN::ML_shift_bound      = "none"
ML_BSSN::ML_dtshift_bound    = "none"

# Numerical dissipation
ActiveThorns = "Dissipation"

Dissipation::order = 5
Dissipation::vars  = "

# Further necessary thorns
ActiveThorns = "SummationByParts"
SummationByParts::order = 4

# # Wave extraction
# #------------------------------------------------------------------------------
# ActiveThorns = "WeylScal4 Multipole"
# #------------------------------------------------------------------------------
# WeylScal4::offset                    = 1e-8
# WeylScal4::fd_order                  = "4th"
# WeylScal4::verbose                   = 0

# Multipole::nradii    = 3
# Multipole::out_every = 128
# Multipole::radius[0] = 60
# Multipole::radius[1] = 80
# Multipole::radius[2] = 100
# Multipole::variables = "WeylScal4::Psi4r{sw=-2 cmplx='WeylScal4::Psi4i' name='Psi4'}"
# Multipole::l_max = 4
# #Multipole::m_mode = 4

# Horizon thorns
ActiveThorns = "AHFinderDirect"

AHFinderDirect::find_every = 128


AHFinderDirect::move_origins            = yes
AHFinderDirect::reshape_while_moving    = yes
AHFinderDirect::predict_origin_movement = yes

AHFinderDirect::geometry_interpolator_name = "Lagrange polynomial interpolation"
AHFinderDirect::geometry_interpolator_pars = "order=4"
AHFinderDirect::surface_interpolator_name  = "Lagrange polynomial interpolation"
AHFinderDirect::surface_interpolator_pars  = "order=4"

AHFinderDirect::output_h_every = 0

AHFinderDirect::N_horizons = 3

AHFinderDirect::origin_z                             [1] = +3.0
AHFinderDirect::initial_guess__coord_sphere__z_center[1] = +3.0
AHFinderDirect::initial_guess__coord_sphere__radius  [1] =  0.25
AHFinderDirect::which_surface_to_store_info          [1] = 2
AHFinderDirect::reset_horizon_after_not_finding      [1] = no
AHFinderDirect::track_origin_from_grid_scalar        [1] = yes
AHFinderDirect::track_origin_source_x                [1] = "PunctureTracker::pt_loc_x[0]"
AHFinderDirect::track_origin_source_y                [1] = "PunctureTracker::pt_loc_y[0]"
AHFinderDirect::track_origin_source_z                [1] = "PunctureTracker::pt_loc_z[0]"
AHFinderDirect::max_allowable_horizon_radius         [1] = 3
#AHFinderDirect::dont_find_after_individual_time      [1] = 30.0

AHFinderDirect::origin_z                             [2] = -3.0
AHFinderDirect::initial_guess__coord_sphere__z_center[2] = -3.0
AHFinderDirect::initial_guess__coord_sphere__radius  [2] =  0.25
AHFinderDirect::which_surface_to_store_info          [2] = 3
AHFinderDirect::reset_horizon_after_not_finding      [2] = no
AHFinderDirect::track_origin_from_grid_scalar        [2] = yes
AHFinderDirect::track_origin_source_x                [2] = "PunctureTracker::pt_loc_x[1]"
AHFinderDirect::track_origin_source_y                [2] = "PunctureTracker::pt_loc_y[1]"
AHFinderDirect::track_origin_source_z                [2] = "PunctureTracker::pt_loc_z[1]"
AHFinderDirect::max_allowable_horizon_radius         [2] = 3
#AHFinderDirect::dont_find_after_individual_time      [2] = 30.0

AHFinderDirect::origin_x                             [3] = 0
AHFinderDirect::find_after_individual_time           [3] = 20.0
AHFinderDirect::initial_guess__coord_sphere__z_center[3] = 0
AHFinderDirect::initial_guess__coord_sphere__radius  [3] = 1.2
AHFinderDirect::which_surface_to_store_info          [3] = 4
AHFinderDirect::reset_horizon_after_not_finding      [3] = no
AHFinderDirect::max_allowable_horizon_radius         [3] = 6

# I/O thorns
ActiveThorns = "CarpetIOBasic"
ActiveThorns = "CarpetIOScalar"
ActiveThorns = "CarpetIOASCII"
Activethorns = "CarpetIOHDF5"
IOBasic::outInfo_every          = 8
# IOBasic::outInfo_reductions     = "norm2"
IOBasic::outInfo_reductions     = "norm2 minimum maximum"
IOBasic::outInfo_vars           = "

IO::out_dir = $parfile

# # for scalar reductions of 3D grid functions
# IOScalar::one_file_per_group    = no
# IOScalar::outScalar_every       = 128
# IOScalar::outScalar_vars        = "
#         ADMBase::lapse
# "

IOASCII::one_file_per_group     = no

# IOASCII::output_symmetry_points = no
# IOASCII::out3D_ghosts           = no

# IOASCII::out1D_every             = 128
IOASCII::out1D_every             = 1
IOASCII::out1D_d                 = "no" 
IOASCII::out1D_vars              = "ADMBase::shift

IOHDF5::out2D_every            = 256
IOHDF5::out2D_vars             = "

# IOHDF5::use_checksums          = yes
# IOHDF5::compression_level      = 1
# IOHDF5::one_file_per_group     = yes

# IOHDF5::output_symmetry_points = no
# IOHDF5::out3D_ghosts           = no

# Checkpointing and recovery
# -----------------------------------------------------------------------------
IOHDF5::checkpoint                  = yes
IO::checkpoint_dir                  = $parfile
IO::checkpoint_ID                   = yes
IO::checkpoint_every_walltime_hours = 6.0
IO::checkpoint_on_terminate         = yes

IO::recover                         = "autoprobe"
IO::recover_dir                     = $parfile

# Formaline and TimerReport
# -----------------------------------------------------------------------------
# ActiveThorns = "Formaline"
ActiveThorns = "TimerReport"
# -----------------------------------------------------------------------------
TimerReport::out_every                  = 5120
TimerReport::out_filename               = "TimerReport"
TimerReport::output_all_timers_together = yes
TimerReport::output_all_timers_readable = yes
TimerReport::n_top_timers               = 20

More information about the Users mailing list