[ET Trac] [Einstein Toolkit] #2023: CarpetInterp hangs with openmpi

Einstein Toolkit trac-noreply at einsteintoolkit.org
Fri Mar 17 07:46:52 CDT 2017


#2023: CarpetInterp hangs with openmpi
-----------------------------+----------------------------------------------
 Reporter:  anonymous        |       Owner:                     
     Type:  defect           |      Status:  new                
 Priority:  unset            |   Milestone:                     
Component:  Other            |     Version:  development version
 Keywords:  OpenMPI, Carpet  |  
-----------------------------+----------------------------------------------
 On three occasions with two different simulations (TOV, BNS) the code
 hung. In two cases, I could attach a debugger to the running process, and
 in both cases the backtrace looked like this:
 {{{
 #0  0x00007f4ecfd5729a in __GI___pthread_mutex_lock (mutex=0xc40d730) at
 ../nptl/pthread_mutex_lock.c:79
 #1  0x00007f4ec7224807 in ?? () from
 /usr/lib/openmpi/lib/openmpi/mca_btl_openib.so
 #2  0x00007f4ecdae734a in opal_progress () from /usr/lib/libmpi.so.1
 #3  0x00007f4ecda2d3b4 in ompi_request_default_wait_all () from
 /usr/lib/libmpi.so.1
 #4  0x00007f4ec61cb7b7 in ompi_coll_tuned_sendrecv_actual () from
 /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so
 #5  0x00007f4ec61d0df6 in ompi_coll_tuned_alltoallv_intra_pairwise () from
 /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so
 #6  0x00007f4ecda3a00f in PMPI_Alltoallv () from /usr/lib/libmpi.so.1
 #7  0x00000000015458fe in CarpetInterp::Carpet_DriverInterpolate
 (cctkGH_=<optimized out>, N_dims=<optimized out>, local_interp_handle=3,
 param_table_handle=6421, coord_system_handle=0, N_interp_points=224,
 interp_coords_type_code=130, coords_list=<optimized out>,
 N_input_arrays=6, input_array_variable_indices=0x7fff38fbf640,
 N_output_arrays=24, output_array_type_codes=0x7fff38fbf660,
 output_arrays=0x7fff38fbfb00)
 at
 /home/wolfgang.kastaun/ET/Payne/Cactus/arrangements/Carpet/CarpetInterp/src/interp.cc:645
 #8  0x000000000071ee4f in SymBase_SymmetryInterpolateFaces
 (cctkGH_=0xccb1be0, N_dims=<optimized out>, local_interp_handle=3,
 param_table_handle=6421, coord_system_handle=0, N_interp_points=224,
 interp_coords_type=130, interp_coords=0x7fff38fbe1a0, N_input_arrays=6,
 input_array_indices=0x7fff38fbf640, N_output_arrays=24,
 output_array_types=0x7fff38fbf660, output_arrays=0x7fff38fbfb00, faces=0)
 at
 /home/wolfgang.kastaun/ET/Payne/Cactus/arrangements/CactusBase/SymBase/src/Interpolation.c:381
 }}}

 I'm not sure if this is a problem with Carpet or with my OpenMPI
 installation. It only happens rarely, after one day or so.
 I'm using ET version Payne, gcc 4.9.2, openmpi 1.6.5 on an infiniband
 interconnect.
 The code was compiled with OpenMP support and run using 4 threads per
 process.

 Looking at various variables in Carpet_DriverInterpolate with the
 debugger, I noticed that the vector tmp was reported with size -488221088,
 but this could also be mis-reported by gdb since the code was compiled
 with -O3.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/2023>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list