[Users] std::out_of_range error while checkpointing

Miguel Zilhão mzilhao at ffn.ub.es
Wed Aug 5 19:11:12 CDT 2015


hi all,

i'm running latest ET Hilbert on openSUSE tumbleweed and i'm having the following issue. upon trying 
to run a simple head-on collision configuration with McLachlan (attached parameter file), i get the 
error

   INFO (CarpetIOHDF5): ---------------------------------------------------------
   INFO (CarpetIOHDF5): Dumping initial checkpoint at iteration 0, simulation time 0
   INFO (CarpetIOHDF5): ---------------------------------------------------------
   terminate called after throwing an instance of 'std::out_of_range'
     what():  vector::_M_range_check: __n (which is 1) >= this->size() (which is 1)
     Rank 1 with PID 5958 received signal 6

when writing the checkpoint file.
this only happens if i run with more than one MPI process; with a single processor it runs fine.

i'm compiling with gcc-5, but i find the same problem with gcc-4.8. i was running this very same 
configuration just fine a couple of months ago, so it must have been some update i've made in the 
meantime (either to my OS or to ET).
i've also tried with different configurations and the outcome is the same.

i've ran this through gdb, here's the relevant output:


#6  0x00007ffff4f6d4d5 in std::__throw_out_of_range_fmt(char const*, ...) ()
    from /usr/lib64/libstdc++.so.6
#7  0x00000000005bb398 in _M_range_check (__n=<optimized out>,
     this=<optimized out>) at /usr/include/c++/5/bits/stl_vector.h:803
#8  at (__n=<optimized out>, this=<optimized out>)
     at /usr/include/c++/5/bits/stl_vector.h:824
#9  CarpetIOHDF5::AddAttributes (cctkGH=cctkGH at entry=0x1b507d0,
     fullname=fullname at entry=0x3f2434a0 "ML_BSSN::cA", vdim=3,
     refinementlevel=refinementlevel at entry=0, request=request at entry=0x3df96370,
     bbox=..., dataset=83886080, is_index=false)
     at /home/mzilhao/Trabalho/projectos/ET/Cactus/arrangements/Carpet/CarpetIOHDF5/src/Output.cc:899
#10 0x00000000005bdea4 in CarpetIOHDF5::WriteVarChunkedParallel (
     cctkGH=cctkGH at entry=0x1b507d0, outfile=outfile at entry=16777216,
     io_bytes=@0x7fffffffc980: 1110772, request=0x3df96370,
     called_from_checkpoint=called_from_checkpoint at entry=true,indexfile=indexfile at entry=-1)
     at /home/mzilhao/Trabalho/projectos/ET/Cactus/arrangements/Carpet/CarpetIOHDF5/src/Output.cc:706
#11 0x00000000005a233e in CarpetIOHDF5::Checkpoint (cctkGH=0x1b507d0,
     called_from=0)
     at 
/home/mzilhao/Trabalho/projectos/ET/Cactus/arrangements/Carpet/CarpetIOHDF5/src/CarpetIOHDF5.cc:1277
#12 0x000000000041f0d5 in CCTK_CallFunction (
     function=function at entry=0x5a2da0 <CarpetIOHDF5::CarpetIOHDF5_InitialDataCheckpoint(cGH*)>, 
fdata=fdata at entry=0x1b4a4e8, data=data at entry=0x1b507d0)
     at /home/mzilhao/Trabalho/projectos/ET/Cactus/src/main/ScheduleInterface.c:312
#13 0x0000000000ef6499 in Carpet::CallScheduledFunction (
     time_and_mode=time_and_mode at entry=0x1174842 "Meta mode",
     function=function at entry=0x5a2da0 <CarpetIOHDF5::CarpetIOHDF5_InitialDataCheckpoint(cGH*)>, 
attribute=attribute at entry=0x1b4a4e8, data=data at entry=0x1b507d0,
     user_timer=...)
     at /home/mzilhao/Trabalho/projectos/ET/Cactus/arrangements/Carpet/Carpet/src/CallFunction.cc:380


so the relevant bits of code seem to be in CarpetIOHDF5/src/Output.cc:706 and 
CarpetIOHDF5/src/Output.cc:899

this seems to be triggered when writing hdf5 output in parallel. if i remove checkpointing the run 
goes fine, and i do get regular 2D hdf5 output. this does not seem to be written in parallel, 
though, as i get only one file per grid function/group. so it seems to be the parallel output that 
triggers the crash.

i have also tried removing all my hdf5 libs and configuring ET with HDF5_DIR=BUILD, but the outcome 
was the same.

has anyone seen such an error before? anything else i could provide to help diagnose this?

thanks,
Miguel
-------------- next part --------------

# Basic Cactus parameters
#------------------------------------------------------------------------------

Cactus::cctk_run_title                  = $parfile

Cactus::cctk_full_warnings              = yes
Cactus::highlight_warning_messages      = no

# Cactus::terminate                       = "any"
Cactus::terminate                       = "time"
# Cactus::cctk_final_time                 = 100.0
Cactus::cctk_final_time                 = 16.0
# Cactus::max_runtime                     = 5     #  5.0 min


# Basic thorns necessary to run
#------------------------------------------------------------------------------
ActiveThorns = "AEILocalInterp"
ActiveThorns = "Fortran GSL HDF5"
ActiveThorns = "GenericFD"
ActiveThorns = "LocalInterp LoopControl Slab"
ActiveThorns = "InitBase IOUtil"
#------------------------------------------------------------------------------

InitBase::initial_data_setup_method = "init_all_levels"


# Grid setup
#------------------------------------------------------------------------------
ActiveThorns = "Boundary CartGrid3D CoordBase"
ActiveThorns = "ReflectionSymmetry SymBase"
#ActiveThorns = "RotatingSymmetry180"
#------------------------------------------------------------------------------

CoordBase::domainsize                 = "minmax"

# make sure all (xmax - xmin)/dx are integers!
CoordBase::xmin                       =    0.00
CoordBase::ymin                       =    0.00
CoordBase::zmin                       =    0.00
CoordBase::xmax                       =  +16.2849
CoordBase::ymax                       =  +16.2849
CoordBase::zmax                       =  +16.2849
CoordBase::dx                         =    0.2857
CoordBase::dy                         =    0.2857
CoordBase::dz                         =    0.2857

CoordBase::boundary_size_x_lower      = 3
CoordBase::boundary_size_y_lower      = 3
CoordBase::boundary_size_z_lower      = 3
CoordBase::boundary_size_x_upper      = 3
CoordBase::boundary_size_y_upper      = 3
CoordBase::boundary_size_z_upper      = 3

CoordBase::boundary_shiftout_x_lower  = 1
CoordBase::boundary_shiftout_y_lower  = 1
CoordBase::boundary_shiftout_z_lower  = 1

CartGrid3D::type = "coordbase"

ReflectionSymmetry::reflection_x      = yes
ReflectionSymmetry::reflection_y      = yes
ReflectionSymmetry::reflection_z      = yes
ReflectionSymmetry::avoid_origin_x    = no
ReflectionSymmetry::avoid_origin_y    = no
ReflectionSymmetry::avoid_origin_z    = no


# Some Carpet parameters
#------------------------------------------------------------------------------
ActiveThorns = "Carpet CarpetLib CarpetInterp CarpetReduce CarpetSlab"
#------------------------------------------------------------------------------

Carpet::verbose                  = no
Carpet::veryverbose              = no
Carpet::schedule_barriers        = no
Carpet::storage_verbose          = no
#Carpet::timers_verbose           = no
CarpetLib::output_bboxes         = no

Carpet::domain_from_coordbase    = yes
Carpet::max_refinement_levels    = 5

driver::ghost_size               = 3
Carpet::use_buffer_zones         = yes

Carpet::prolongation_order_space = 5
Carpet::prolongation_order_time  = 2

Carpet::convergence_level        = 0

Carpet::init_fill_timelevels     = yes
Carpet::init_3_timelevels        = no

Carpet::poison_new_timelevels    = yes
CarpetLib::poison_new_memory     = yes

Carpet::output_timers_every      = 32
CarpetLib::print_timestats_every = 5120
CarpetLib::print_memstats_every  = 5120

Carpet::grid_structure_filename   = "carpet-grid-structure"
Carpet::grid_coordinates_filename = "carpet-grid-coordinates"


# Check for NaNs
#------------------------------------------------------------------------------
ActiveThorns = "NaNChecker"
#------------------------------------------------------------------------------
NaNChecker::check_every     =  512
#NaNChecker::verbose         = "all"
#NaNChecker::action_if_found = "just warn"
NaNChecker::action_if_found = "terminate"
NaNChecker::check_vars      = "
        ADMBase::metric
        ADMBase::curv
        ADMBase::lapse
        ADMBase::shift
"

# Puncture tracking and regrid
#------------------------------------------------------------------------------
ActiveThorns = "SphericalSurface"
ActiveThorns = "ADMBase CarpetRegrid2 PunctureTracker"
ActiveThorns = "CarpetTracker"
#------------------------------------------------------------------------------

SphericalSurface::nsurfaces = 5

CarpetTracker::surface[0] = 0
CarpetTracker::surface[1] = 1

PunctureTracker::track                      [0] = "yes"
PunctureTracker::initial_z                  [0] = 3.001
PunctureTracker::which_surface_to_store_info[0] = 0
PunctureTracker::track                      [1] = "yes"
PunctureTracker::initial_z                  [1] = -3.001
PunctureTracker::which_surface_to_store_info[1] = 1
PunctureTracker::verbose                        = "no"

CarpetRegrid2::regrid_every            = 64
CarpetRegrid2::freeze_unaligned_levels = yes
CarpetRegrid2::symmetry_rotating180    = no
CarpetRegrid2::verbose                 = no

CarpetRegrid2::num_centres = 2

CarpetRegrid2::num_levels_1         =  5
CarpetRegrid2::position_z_1         =  3.0
CarpetRegrid2::radius_1[ 1]         =  8.0
CarpetRegrid2::radius_1[ 2]         =  2.0
CarpetRegrid2::radius_1[ 3]         =  1.0
CarpetRegrid2::radius_1[ 4]         =  0.5
CarpetRegrid2::movement_threshold_1 =   0.16

CarpetRegrid2::num_levels_2         =  5
CarpetRegrid2::position_z_2         = -3.0
CarpetRegrid2::radius_2[ 1]         =  8.0
CarpetRegrid2::radius_2[ 2]         =  2.0
CarpetRegrid2::radius_2[ 3]         =  1.0
CarpetRegrid2::radius_2[ 4]         =  0.5
CarpetRegrid2::movement_threshold_2 =   0.16



# ActiveThorns = "CarpetMask"

# CarpetMask::verbose = no

# CarpetMask::excluded_surface       [0] = 0
# CarpetMask::excluded_surface_factor[0] = 1.0

# CarpetMask::excluded_surface       [1] = 1
# CarpetMask::excluded_surface_factor[1] = 1.0

# CarpetMask::excluded_surface       [2] = 2
# CarpetMask::excluded_surface_factor[2] = 1.0


# Integration method
#------------------------------------------------------------------------------
ActiveThorns = "MoL Time"
#------------------------------------------------------------------------------

MoL::ODE_Method                 = "RK4"
MoL::MoL_Intermediate_Steps     = 4
MoL::MoL_Num_Scratch_Levels     = 1

Carpet::time_refinement_factors = "[1, 2, 4, 8, 16, 32]"

Time::dtfac                     = 0.5


# Initial data
#------------------------------------------------------------------------------
ActiveThorns = "ADMCoupling ADMMacros CoordGauge SpaceMask StaticConformal TmunuBase"
ActiveThorns = "TwoPunctures"
#------------------------------------------------------------------------------

ADMMacros::spatial_order     = 4

ADMBase::metric_type         = "physical"

ADMBase::initial_data        = "twopunctures"
ADMBase::initial_lapse       = "twopunctures-averaged"
ADMBase::initial_shift       = "zero"
ADMBase::initial_dtlapse     = "zero"
ADMBase::initial_dtshift     = "zero"

# needed for AHFinderDirect
ADMBase::metric_timelevels   = 3

TwoPunctures::swap_xz        = "yes"

TwoPunctures::par_b          =  3.001
TwoPunctures::par_m_plus     =  0.5
TwoPunctures::par_m_minus    =  0.5
TwoPunctures::par_P_plus [1] =  -0.0
TwoPunctures::par_P_minus[1] =  0.0

TwoPunctures::TP_epsilon     = 1.0e-10
TwoPunctures::TP_Tiny        = 1.0e-10

# TwoPunctures::verbose = yes


# Thorns for the evolution
#------------------------------------------------------------------------------
ActiveThorns = "ML_BSSN ML_BSSN_Helper NewRad"
ActiveThorns = "ML_ADMConstraints"
#------------------------------------------------------------------------------

ADMBase::evolution_method         = "ML_BSSN"
ADMBase::lapse_evolution_method   = "ML_BSSN"
ADMBase::shift_evolution_method   = "ML_BSSN"
ADMBase::dtlapse_evolution_method = "ML_BSSN"
ADMBase::dtshift_evolution_method = "ML_BSSN"

ML_BSSN::harmonicN                = 1      # 1+log
ML_BSSN::harmonicF                = 2.0    # 1+log
ML_BSSN::ShiftGammaCoeff          = 0.75
ML_BSSN::BetaDriver               = 1.0
ML_BSSN::LapseAdvectionCoeff      = 1.0
ML_BSSN::ShiftAdvectionCoeff      = 1.0

ML_BSSN::MinimumLapse             = 1.0e-8

ML_BSSN::my_initial_boundary_condition = "extrapolate-gammas"
ML_BSSN::my_rhs_boundary_condition     = "NewRad"
Boundary::radpower                     = 2

ML_BSSN::ML_log_confac_bound = "none"
ML_BSSN::ML_metric_bound     = "none"
ML_BSSN::ML_Gamma_bound      = "none"
ML_BSSN::ML_trace_curv_bound = "none"
ML_BSSN::ML_curv_bound       = "none"
ML_BSSN::ML_lapse_bound      = "none"
ML_BSSN::ML_dtlapse_bound    = "none"
ML_BSSN::ML_shift_bound      = "none"
ML_BSSN::ML_dtshift_bound    = "none"


# Numerical dissipation
#------------------------------------------------------------------------------
ActiveThorns = "Dissipation"
#------------------------------------------------------------------------------

Dissipation::order = 5
Dissipation::vars  = "
        ML_BSSN::ML_log_confac
        ML_BSSN::ML_metric
        ML_BSSN::ML_trace_curv
        ML_BSSN::ML_curv
        ML_BSSN::ML_Gamma
        ML_BSSN::ML_lapse
        ML_BSSN::ML_shift
        ML_BSSN::ML_dtlapse
        ML_BSSN::ML_dtshift
"


# Further necessary thorns
#------------------------------------------------------------------------------
ActiveThorns = "SummationByParts"
#------------------------------------------------------------------------------
SummationByParts::order = 4


# # Wave extraction
# #------------------------------------------------------------------------------
# ActiveThorns = "WeylScal4 Multipole"
# #------------------------------------------------------------------------------
# WeylScal4::offset                    = 1e-8
# WeylScal4::fd_order                  = "4th"
# WeylScal4::verbose                   = 0

# Multipole::nradii    = 3
# Multipole::out_every = 128
# Multipole::radius[0] = 60
# Multipole::radius[1] = 80
# Multipole::radius[2] = 100
# Multipole::variables = "WeylScal4::Psi4r{sw=-2 cmplx='WeylScal4::Psi4i' name='Psi4'}"
# Multipole::l_max = 4
# #Multipole::m_mode = 4


# Horizon thorns
#------------------------------------------------------------------------------
ActiveThorns = "AHFinderDirect"
#------------------------------------------------------------------------------

AHFinderDirect::find_every = 128

AHFinderDirect::run_at_CCTK_POST_RECOVER_VARIABLES = no

AHFinderDirect::move_origins            = yes
AHFinderDirect::reshape_while_moving    = yes
AHFinderDirect::predict_origin_movement = yes

AHFinderDirect::geometry_interpolator_name = "Lagrange polynomial interpolation"
AHFinderDirect::geometry_interpolator_pars = "order=4"
AHFinderDirect::surface_interpolator_name  = "Lagrange polynomial interpolation"
AHFinderDirect::surface_interpolator_pars  = "order=4"

AHFinderDirect::output_h_every = 0

AHFinderDirect::N_horizons = 3

AHFinderDirect::origin_z                             [1] = +3.0
AHFinderDirect::initial_guess__coord_sphere__z_center[1] = +3.0
AHFinderDirect::initial_guess__coord_sphere__radius  [1] =  0.25
AHFinderDirect::which_surface_to_store_info          [1] = 2
AHFinderDirect::reset_horizon_after_not_finding      [1] = no
AHFinderDirect::track_origin_from_grid_scalar        [1] = yes
AHFinderDirect::track_origin_source_x                [1] = "PunctureTracker::pt_loc_x[0]"
AHFinderDirect::track_origin_source_y                [1] = "PunctureTracker::pt_loc_y[0]"
AHFinderDirect::track_origin_source_z                [1] = "PunctureTracker::pt_loc_z[0]"
AHFinderDirect::max_allowable_horizon_radius         [1] = 3
#AHFinderDirect::dont_find_after_individual_time      [1] = 30.0

AHFinderDirect::origin_z                             [2] = -3.0
AHFinderDirect::initial_guess__coord_sphere__z_center[2] = -3.0
AHFinderDirect::initial_guess__coord_sphere__radius  [2] =  0.25
AHFinderDirect::which_surface_to_store_info          [2] = 3
AHFinderDirect::reset_horizon_after_not_finding      [2] = no
AHFinderDirect::track_origin_from_grid_scalar        [2] = yes
AHFinderDirect::track_origin_source_x                [2] = "PunctureTracker::pt_loc_x[1]"
AHFinderDirect::track_origin_source_y                [2] = "PunctureTracker::pt_loc_y[1]"
AHFinderDirect::track_origin_source_z                [2] = "PunctureTracker::pt_loc_z[1]"
AHFinderDirect::max_allowable_horizon_radius         [2] = 3
#AHFinderDirect::dont_find_after_individual_time      [2] = 30.0

AHFinderDirect::origin_x                             [3] = 0
AHFinderDirect::find_after_individual_time           [3] = 20.0
AHFinderDirect::initial_guess__coord_sphere__z_center[3] = 0
AHFinderDirect::initial_guess__coord_sphere__radius  [3] = 1.2
AHFinderDirect::which_surface_to_store_info          [3] = 4
AHFinderDirect::reset_horizon_after_not_finding      [3] = no
AHFinderDirect::max_allowable_horizon_radius         [3] = 6


# I/O thorns
#------------------------------------------------------------------------------
ActiveThorns = "CarpetIOBasic"
ActiveThorns = "CarpetIOScalar"
ActiveThorns = "CarpetIOASCII"
Activethorns = "CarpetIOHDF5"
#------------------------------------------------------------------------------
IOBasic::outInfo_every          = 8
# IOBasic::outInfo_reductions     = "norm2"
IOBasic::outInfo_reductions     = "norm2 minimum maximum"
IOBasic::outInfo_vars           = "
        Carpet::physical_time_per_hour
        ADMBase::lapse
        ML_ADMConstraints::H
"

IO::out_dir = $parfile

# # for scalar reductions of 3D grid functions
# IOScalar::one_file_per_group    = no
# IOScalar::outScalar_every       = 128
# IOScalar::outScalar_vars        = "
#         ADMBase::lapse
# "

IOASCII::one_file_per_group     = no

# IOASCII::output_symmetry_points = no
# IOASCII::out3D_ghosts           = no


# IOASCII::out1D_every             = 128
IOASCII::out1D_every             = 1
IOASCII::out1D_d                 = "no" 
IOASCII::out1D_vars              = "ADMBase::shift
                                    ADMBase::lapse
                                    ML_BSSN::phi
                                    ML_BSSN::ML_metric
                                    ML_BSSN::ML_curv
                                    ML_BSSN::ML_trace_curv
                                    ML_BSSN::ML_Gamma
"

IOHDF5::out2D_every            = 256
IOHDF5::out2D_vars             = "
        ADMBase::metric
        ADMBase::curv
        ADMBase::lapse
        ADMBase::shift
        ML_ADMConstraints::ML_Ham
        ML_ADMConstraints::ML_mom
        ML_BSSN::phi
"


# IOHDF5::use_checksums          = yes
# IOHDF5::compression_level      = 1
# IOHDF5::one_file_per_group     = yes

# IOHDF5::output_symmetry_points = no
# IOHDF5::out3D_ghosts           = no



# Checkpointing and recovery
# -----------------------------------------------------------------------------
IOHDF5::checkpoint                  = yes
IO::checkpoint_dir                  = $parfile
IO::checkpoint_ID                   = yes
IO::checkpoint_every_walltime_hours = 6.0
IO::checkpoint_on_terminate         = yes

IO::recover                         = "autoprobe"
IO::recover_dir                     = $parfile


# Formaline and TimerReport
# -----------------------------------------------------------------------------
# ActiveThorns = "Formaline"
ActiveThorns = "TimerReport"
# -----------------------------------------------------------------------------
TimerReport::out_every                  = 5120
TimerReport::out_filename               = "TimerReport"
TimerReport::output_all_timers_together = yes
TimerReport::output_all_timers_readable = yes
TimerReport::n_top_timers               = 20


More information about the Users mailing list