[Users] config file for lonestar

Yosef Zlochower yosef at astro.rit.edu
Fri Mar 18 15:38:59 CDT 2011


Hi,
   I tried recompiling with the config file you sent. The code
still hangs on a 600 core scaling test parfile that can be evolved
successfully on fewer cores.
   I was able to attach a debugger to two of the frozen simulations.
In once case the code hanged on an MPI_Gather called from
one of my local thorns, in the other case the hang
was due to a carpet call. Here is a partial backtrace from
the latter case.

MPIDI_CH3I_MRAILI_Get_next_vbuf (vc_ptr=0x7fffab2c6520,
     vbuf_ptr=0x7fffab2c6528) at ibv_channel_manager.c:356
356     ibv_channel_manager.c: No such file or directory.
         in ibv_channel_manager.c
(gdb) backtrace
#0  MPIDI_CH3I_MRAILI_Get_next_vbuf (vc_ptr=0x7fffab2c6520,
     vbuf_ptr=0x7fffab2c6528) at ibv_channel_manager.c:356
#1  0x00002acd4dcbee5c in MPIDI_CH3I_read_progress (vc_pptr=0x7fffab2c6520,
     v_ptr=0x7fffab2c6528, is_blocking=318849944) at ch3_read_progress.c:130
#2  0x00002acd4dcbcfcd in MPIDI_CH3I_Progress (is_blocking=-1423153888,
     state=0x7fffab2c6528) at ch3_progress.c:206
#3  0x00002acd4ddd01f3 in PMPI_Waitall (count=-1423153888,
     array_of_requests=0x7fffab2c6528, array_of_statuses=0x13014398)
     at waitall.c:195
#4  0x0000000001084b89 in comm_state::AllPostedCommunicationsFinished (
     this=0x7fffab2c6520, $P2=<value optimized out>)
     at 
/work/00951/tg801986/Cactus/arrangements/Carpet/CarpetLib/src/commstate.cc:350
#5  0x00000000010834a8 in comm_state::step (this=0x7fffab2c6520,
     $R9=<value optimized out>)
     at 
/work/00951/tg801986/Cactus/arrangements/Carpet/CarpetLib/src/commstate.cc:235
#6  0x00000000010005c2 in Carpet::SyncGroups (cctkGH=0x7fffab2c6520,
     groups=..., $c9=<value optimized out>, $d0=<value optimized out>)
     at 
/work/00951/tg801986/Cactus/arrangements/Carpet/Carpet/src/Comm.cc:246
#7  0x0000000000fffe81 in Carpet::SyncProlongateGroups 
(cctkGH=0x7fffab2c6520,
     groups=..., $87=<value optimized out>, $88=<value optimized out>)
     at 
/work/00951/tg801986/Cactus/arrangements/Carpet/Carpet/src/Comm.cc:197


Erik Schnetter wrote:
> That probably means it's not a startup problem... but it could still
> be something strange with MPI, e.g. during regridding.
> 
> Another issue I just remember is running out of memory.
> 
> -erik
> 
> On Fri, Mar 18, 2011 at 12:23 PM, Yosef Zlochower <yosef at astro.rit.edu> wrote:
>> Thanks,
>>
>> Erik Schnetter wrote:
>>
>>> Yosef
>>>
>>> Yes, there is a "standard" config file, but I have not tested it on
>>> large numbers of cores. I attach it below.
>>>
>>> If there is a problem on large numbers of cores, then it is typically
>>> a problem with MPI, either an inconsistency with the MPI libraries
>>> (compiling, linking, and running have to use the same version), or a
>>> problem with the startup mechanism that MPI uses (there may be a
>>> timeout). In the latter case, I hope the system administrators will be
>>> able to help.
>>>
>> The run starts and even produces output twice (on the
>>  32nd iteration), but it then just stalls. I am going to try with the
>> configuration file you sent, and if that doesn't work, I'll submit a
>> ticket.
>>
>>> -erik
>>>
>>> On Fri, Mar 18, 2011 at 11:35 AM, Yosef Zlochower <yosef at astro.rit.edu>
>>> wrote:
>>>> Hi,
>>>>
>>>>  I am trying some simulations on lonestar using the config file
>>>> below. I noticed that my simulations hang when I choose large
>>>> numbers of cores. I haven't yet determined where
>>>> they hang (the same executable and parfile combination does
>>>> not hang on fewer cores). Is there a standard config file
>>>> for lonestar?
>>>>
>>>> CPP = cpp
>>>> FPP = cpp
>>>> CC  = /opt/apps/intel/11.1/bin/intel64/icc
>>>> CXX = /opt/apps/intel/11.1/bin/intel64/icpc
>>>> F77 = /opt/apps/intel/11.1/bin/intel64/ifort
>>>> F90 = /opt/apps/intel/11.1/bin/intel64/ifort
>>>>
>>>> # -inline-debug-info leads to compiler crashes when used with
>>>> optimisation
>>>> CPPFLAGS = -DMPICH_IGNORE_CXX_SEEK
>>>> FPPFLAGS = -traditional
>>>> #CFLAGS   = -g -debug all -traceback -align -std=gnu99 -ansi_alias
>>>> CFLAGS   = -g -debug all -traceback -align -std=c99 -ansi_alias
>>>> CXXFLAGS = -g -debug all -traceback -align -restrict
>>>> F77FLAGS = -g -debug all -traceback -align -pad -w95 -cm
>>>> F90FLAGS = -g -debug all -traceback -align -pad -w95 -cm
>>>>
>>>> C_LINE_DIRECTIVES = yes
>>>> F_LINE_DIRECTIVES = yes
>>>>
>>>> LDFLAGS =  -Wl,-rpath,/opt/apps/intel11_1/mvapich2/1.6/lib
>>>> -Wl,-rpath,/opt/apps/
>>>> intel/11.1/lib/intel64/lib
>>>>
>>>> # Include hypre, SPAI, Chaco, spooles, SuperLU_DIST, MUMPS, ParMetis,
>>>> SCALAPACK,
>>>>  blacs, g2c, and X11 for PETSc
>>>> LIBDIRS = /opt/apps/intel/11.1/lib/intel64/lib
>>>>
>>>> # -check-uninit aborts for uninitialised variables, which is too strict
>>>> DEBUG           = no
>>>> CPP_DEBUG_FLAGS = -DCARPET_DEBUG
>>>> FPP_DEBUG_FLAGS = -DCARPET_DEBUG
>>>> C_DEBUG_FLAGS   = -O0 # -check-uninit
>>>> CXX_DEBUG_FLAGS = -O0 # -check-uninit
>>>> F77_DEBUG_FLAGS = -O0 -check bounds -check format # -check all
>>>> F90_DEBUG_FLAGS = -O0 -check bounds -check format # -check all
>>>>
>>>> # -O3 miscompiles parts of CarpetLib
>>>> # -ip leads to wrong physics when OpenMP is used
>>>> OPTIMISE           = yes
>>>> CPP_OPTIMISE_FLAGS = # -DCARPET_OPTIMISE -DNDEBUG
>>>> FPP_OPTIMISE_FLAGS = # -DCARPET_OPTIMISE -DNDEBUG
>>>> C_OPTIMISE_FLAGS   = -O2 -xW
>>>> CXX_OPTIMISE_FLAGS = -O2 -xW
>>>> F77_OPTIMISE_FLAGS = -O2 -xW
>>>> F90_OPTIMISE_FLAGS = -O2 -xW
>>>>
>>>> PROFILE           = no
>>>> CPP_PROFILE_FLAGS =
>>>> FPP_PROFILE_FLAGS =
>>>> C_PROFILE_FLAGS   = -pg
>>>> CXX_PROFILE_FLAGS = -pg
>>>> F77_PROFILE_FLAGS = -pg
>>>> F90_PROFILE_FLAGS = -pg
>>>>
>>>> OPENMP           = yes
>>>> CPP_OPENMP_FLAGS = -openmp
>>>> FPP_OPENMP_FLAGS = -D_OPENMP
>>>> C_OPENMP_FLAGS   = -openmp
>>>> CXX_OPENMP_FLAGS = -openmp
>>>> F77_OPENMP_FLAGS = -openmp
>>>> F90_OPENMP_FLAGS = -openmp
>>>>
>>>> WARN           = yes
>>>> CPP_WARN_FLAGS =
>>>> FPP_WARN_FLAGS =
>>>> C_WARN_FLAGS   =
>>>> CXX_WARN_FLAGS =
>>>> F77_WARN_FLAGS =
>>>> F90_WARN_FLAGS =
>>>>
>>>> BLAS_DIR  =
>>>>
>>>> HDF5     = yes
>>>> HDF5_DIR = BUILD
>>>>
>>>> #Previously:
>>>> #GSL_DIR = /opt/apps/gsl/1.11
>>>> GSL_DIR = BUILD
>>>>
>>>> LAPACK_DIR  =
>>>> MPI          = CUSTOM
>>>> MPI_INC_DIRS = /opt/apps/intel11_1/mvapich2/1.6/include/
>>>> MPI_LIB_DIRS = /opt/apps/intel11_1/mvapich2/1.6/lib/ /opt/ofed/lib64/
>>>> MPI_LIBS     = mpich rt rdmacm ibverbs ibumad
>>>>
>>>> PTHREADS = yes
>>>>
>>>>
>>>>
>>>> --
>>>> Dr. Yosef Zlochower
>>>> Center for Computational Relativity and Gravitation
>>>> Assistant Professor
>>>> School of Mathematical Sciences
>>>> Rochester Institute of Technology
>>>> 85 Lomb Memorial Drive
>>>> Rochester, NY 14623
>>>>
>>>> Office:74-2067
>>>> Phone: +1 585-475-6103
>>>>
>>>> yosef at astro.rit.edu
>>>>
>>>> CONFIDENTIALITY NOTE: The information transmitted, including
>>>> attachments, is intended only for the person(s) or entity to which it
>>>> is addressed and may contain confidential and/or privileged material.
>>>> Any review, retransmission, dissemination or other use of, or taking
>>>> of any action in reliance upon this information by persons or entities
>>>> other than the intended recipient is prohibited. If you received this
>>>> in error, please contact the sender and destroy any copies of this
>>>> information.
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at einsteintoolkit.org
>>>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>>>
>>>
>>>


More information about the Users mailing list