[Users] Using Stampede2 SKX

James Healy jchsma at rit.edu
Sat Jan 20 09:21:18 CST 2018


Hello all,

I am trying to run on the new skylake processors on Stampede2 and while 
the run speeds we are obtaining are very good, we are concerned that we 
aren't optimizing properly when it comes to OpenMP.  For instance, we 
see the best speeds when we use 8 MPI processors per node (with 6 
threads each for a total of 48 total threads/node).  Based on the 
architecture, we were expecting to see the best speeds with 2 MPI/node.  
Here is what I have tried:

 1. Using the simfactory files for stampede2-skx (config file, run and
    submit scripts, and modules loaded) I compiled a version of
    ET_2017_06 using LazEv (RIT's evolution thorn) and McLachlan and
    submitted a series of runs that change both the number of nodes
    used, and how I distribute the 48 threads/node between MPI processes.
 2. I use a standard low resolution grid, with no IO or regridding. 
    Parameter file attached.
 3. Run speeds are measured from Carpet::physical_time_per_hour at
    iteration 256.
 4. I tried both with and without hwloc/SystemTopology.
 5. For both McLachlan and LazEv, I see similar results, with 2 MPI/node
    giving the worst results (see attached plot for McLachlan) and a
    slight preferences for 8 MPI/node.

So my questions are:

 1. Has there been any tests run by any other users on stampede2 skx?
 2. Should we expect 2 MPI/node to be the optimal choice?
 3. If so, are there any other configurations we can try that could help
    optimize?

Thanks in advance!

Jim Healy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20180120/e87cdc37/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mclachlan.png
Type: image/png
Size: 44038 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20180120/e87cdc37/attachment-0001.png 
-------------- next part --------------

# rl	dx		Resolution	Res:m-		Res:m+		rm		rp		NNm	NNp
# 0	2.38095238	M/0.4200	m-/0.1930	m+/0.2270	400.000000	400.000000	168	168
# 1	1.19047619	M/0.8400	m-/0.3859	m+/0.4541	225.000000	225.000000	189	189
# 2	0.59523810	M/1.6800	m-/0.7719	m+/0.9081	150.000000	150.000000	252	252
# 3	0.29761905	M/3.3600	m-/1.5438	m+/1.8162	35.000000	35.000000	117	117
# 4	0.14880952	M/6.7200	m-/3.0876	m+/3.6324	10.000000	10.000000	67	67
# 5	0.07440476	M/13.4400	m-/6.1751	m+/7.2649	4.800000	4.800000	64	64
# 6	0.03720238	M/26.8800	m-/12.3503	m+/14.5297	2.400000	2.400000	64	64
# 7	0.01860119	M/53.7600	m-/24.7005	m+/29.0595	1.200000	1.200000	64	64
# 8	0.00930060	M/107.5200	m-/49.4011	m+/58.1189	0.700000	0.700000	75	75
# 9	0.00465030	M/215.0400	m-/98.8022	m+/116.2378	0.500000	0.250000	107	53
# 10	0.00232515	M/430.0800	m-/197.6043	m+/232.4757	0.250000	0.125000	107	53
# dt = 0.000775049603174603
# m+ = 0.54054054054054
# m- = 0.459459459459459

ActiveThorns = "admbase admcoupling admmacros coordgauge spacemask StaticConformal  boundary time cartgrid3d ioutil CoordBase aeilocalinterp Slab SphericalSurface LocalReduce MoL Carpet CarpetInterp CarpetIOASCII CarpetLib CarpetReduce CarpetSlab CarpetRegrid2 CarpetIOHDF5 CarpetIOScalar TwoPunctures InitBase SymBase LoopControl GSL ReflectionSymmetry CarpetIOBasic"

#############################################################
# Grid
#############################################################

CartGrid3D::type                        = "coordbase"
CartGrid3D::domain                      = "full"
CartGrid3D::avoid_origin                = "no"

CoordBase::domainsize                   = minmax
CoordBase::xmin                         = -400
CoordBase::ymin                         = -400
CoordBase::zmin                         = 0
CoordBase::xmax                         = 400
CoordBase::ymax                         = 400
CoordBase::zmax                         = 400
CoordBase::dx                           = 2.38095238095238
CoordBase::dy                           = 2.38095238095238
CoordBase::dz                           = 2.38095238095238
CoordBase::boundary_size_x_lower        = 4
CoordBase::boundary_size_y_lower        = 4
CoordBase::boundary_size_z_lower        = 4
CoordBase::boundary_shiftout_x_lower    = 0
CoordBase::boundary_shiftout_y_lower    = 0
CoordBase::boundary_shiftout_z_lower    = 1
CoordBase::boundary_size_x_upper        = 4
CoordBase::boundary_size_y_upper        = 4
CoordBase::boundary_size_z_upper        = 4
CoordBase::boundary_shiftout_x_upper    = 0
CoordBase::boundary_shiftout_y_upper    = 0
CoordBase::boundary_shiftout_z_upper    = 0

#############################################################
# Symmetries
#############################################################

ReflectionSymmetry::reflection_x        = "no"
ReflectionSymmetry::reflection_y        = "no"
ReflectionSymmetry::reflection_z        = "yes"
ReflectionSymmetry::avoid_origin_x      = "no"
ReflectionSymmetry::avoid_origin_y      = "no"
ReflectionSymmetry::avoid_origin_z      = "no"

#RotatingSymmetry180::poison_boundaries  = "yes"

#############################################################
# Run statistics
#############################################################

#TimerReport::out_every       = 1024
#TimerReport::out_filename    = "TimerReport"

#############################################################
# CarpetRegrid2
#############################################################

CarpetRegrid2::regrid_every = 2048 #2048
Carpet::grid_coordinates_filename = "grid.asc" 
CarpetRegrid2::symmetry_rotating180   = "no"

CarpetRegrid2::num_centres = 3

CarpetRegrid2::num_levels_1 = 11
CarpetRegrid2::position_x_1 = 5.97297297297297
CarpetRegrid2::position_y_1 = 0
CarpetRegrid2::position_z_1 = 0
CarpetRegrid2::radius_1[ 1] = 0
CarpetRegrid2::radius_1[ 2] = 0
CarpetRegrid2::radius_1[ 3] = 35
CarpetRegrid2::radius_1[ 4] = 10
CarpetRegrid2::radius_1[ 5] = 4.8
CarpetRegrid2::radius_1[ 6] = 2.4
CarpetRegrid2::radius_1[ 7] = 1.2
CarpetRegrid2::radius_1[ 8] = 0.7
CarpetRegrid2::radius_1[ 9] = 0.25
CarpetRegrid2::radius_1[10] = 0.125

CarpetRegrid2::num_levels_2 = 11
CarpetRegrid2::position_x_2 = -7.02702702702703
CarpetRegrid2::position_y_2 = 0
CarpetRegrid2::position_z_2 = 0
CarpetRegrid2::radius_2[ 1] = 0
CarpetRegrid2::radius_2[ 2] = 0
CarpetRegrid2::radius_2[ 3] = 35
CarpetRegrid2::radius_2[ 4] = 10
CarpetRegrid2::radius_2[ 5] = 4.8
CarpetRegrid2::radius_2[ 6] = 2.4
CarpetRegrid2::radius_2[ 7] = 1.2
CarpetRegrid2::radius_2[ 8] = 0.7
CarpetRegrid2::radius_2[ 9] = 0.5
CarpetRegrid2::radius_2[10] = 0.25

CarpetRegrid2::num_levels_3  =  3
CarpetRegrid2::position_x_3  =  0
CarpetRegrid2::position_y_3  =  0
CarpetRegrid2::position_z_3  =  0
CarpetRegrid2::radius_3[1]  =  225
CarpetRegrid2::radius_3[2]  =  150

#LazRegrid2::num_levels_on_recover[0]=10
#LazRegrid2::num_levels_on_recover[1]=10

#############################################################
# SphericalSurface
#############################################################

SphericalSurface::nsurfaces = 3
SphericalSurface::maxntheta = 39
SphericalSurface::maxnphi   = 76

SphericalSurface::ntheta      [0] = 39
SphericalSurface::nphi        [0] = 76
SphericalSurface::nghoststheta[0] = 2
SphericalSurface::nghostsphi  [0] = 2

SphericalSurface::ntheta      [1] = 39
SphericalSurface::nphi        [1] = 76
SphericalSurface::nghoststheta[1] = 2
SphericalSurface::nghostsphi  [1] = 2

SphericalSurface::ntheta      [2] = 39
SphericalSurface::nphi        [2] = 76
SphericalSurface::nghoststheta[2] = 2
SphericalSurface::nghostsphi  [2] = 2

#############################################################
# Carpet
#############################################################

driver::ghost_size                      = 4
Carpet::domain_from_coordbase           = "yes"
Carpet::prolongation_order_space        = 5
Carpet::prolongation_order_time         = 2
Carpet::max_refinement_levels           = 11
Carpet::use_buffer_zones                = "yes"
#Carpet::num_integrator_substeps         = 1
#Carpet::additional_buffer_zones         = 2
Carpet::verbose                         = "no"
Carpet::veryverbose                     = "no"
Carpet::schedule_barriers               = "no"

Carpet::init_3_timelevels               = "yes"
Carpet::init_each_timelevel             = "no"
Carpet::init_fill_timelevels            = "no"
Carpet::enable_all_storage		= "no"
Carpet::regrid_during_recovery          = "no"

Carpet::refinement_factor              = 2
#Carpet::time_refinement_factors        = "[1,1,1,2,4,8,16,32,64,128]"
#Carpet::poison_new_timelevels          = "yes"
#Carpet::check_for_poison               = "no"
#Carpet::poison_value                   = 113
#Carpet::use_tapered_grids              = "no"
#Carpet::output_timers_every             = 1024
#Carpet::print_timestats_every           = 0


#############################################################
# CarpetLib
#############################################################

CarpetLib::output_bboxes  = no
CarpetLib::check_bboxes              = no
CarpetLib::interleave_communications = yes
CarpetLib::combine_sends             = yes
CarpetLib::print_memstats_every      = 1024
#CarpetLib::max_memory_size_MB        = 3100
#CarpetLib::poison_new_memory            = "yes"
#CarpetLib::poison_value                 = 114

#############################################################
# Time integration
#############################################################

Cactus::terminate                     = "any"
Cactus::max_runtime                   = 2850
Cactus::cctk_final_time               = 4999
Cactus::cctk_itlast                   = 256
Cactus::cctk_timer_output             = "full"
Cactus::highlight_warning_messages    = "no"

Time::dtfac                           = 0.333333333333333

MethodOfLines::ode_method             = "RK4"
MethodOfLines::MoL_NaN_Check          = "no"
MethodOfLines::MoL_Intermediate_Steps = 4
MethodOfLines::MoL_Num_Scratch_Levels = 1

#############################################################
# Initial data
#############################################################

initbase::initial_data_setup_method = init_all_levels

ADMBase::initial_data = "twopunctures"
ADMBase::metric_type  = "Physical"
ADMBase::initial_lapse   = "twopunctures-averaged"
ADMBase::initial_shift   = "zero"
ADMBase::initial_dtlapse = "zero"
ADMBase::initial_dtshift = "zero"

# Uncomment these for fast but very inaccurate initial data
#       TwoPunctures::npoints_A = 6
#       TwoPunctures::npoints_B = 6
#       TwoPunctures::npoints_phi = 6

TwoPunctures::verbose           = "yes"
TwoPunctures::keep_u_around     = no

###TwoPunctures::npoints_A         = 70
##TwoPunctures::npoints_B         = 70
#TwoPunctures::npoints_phi       = 70

TwoPunctures::par_b             = 6.5
TwoPunctures::center_offset[0]  = -0.527027027027027

TwoPunctures::par_m_plus        = 0.529929328970489
TwoPunctures::par_P_plus[0]     = -0.000385046023339
TwoPunctures::par_P_plus[1]     = 0.079132079120352
TwoPunctures::par_P_plus[2]     = 0
TwoPunctures::par_S_plus[0]     = 0
TwoPunctures::par_S_plus[1]     = 0
TwoPunctures::par_S_plus[2]     = 0

TwoPunctures::par_m_minus       = 0.371848649946643
TwoPunctures::par_P_minus[0]    = 0.000385046023339
TwoPunctures::par_P_minus[1]    = -0.079132079120352
TwoPunctures::par_P_minus[2]    = 0
TwoPunctures::par_S_minus[0]    = 0
TwoPunctures::par_S_minus[1]    = 0
TwoPunctures::par_S_minus[2]    = 0.126661796932067

TwoPunctures::Newton_maxit = 10
TwoPunctures::Newton_tol = 7.0e-10
TwoPunctures::grid_setup_method = "evaluation"
TwoPunctures::TP_Tiny = 1e-6

#############################################################
# Evolution system
#############################################################

ActiveThorns = "ML_BSSN ML_BSSN_Helper NewRad"

ADMBase::evolution_method         = "ML_BSSN"
ADMBase::lapse_evolution_method   = "ML_BSSN"
ADMBase::shift_evolution_method   = "ML_BSSN"
ADMBase::dtlapse_evolution_method = "ML_BSSN"
ADMBase::dtshift_evolution_method = "ML_BSSN"

ML_BSSN::fdOrder             = 6
ML_BSSN::harmonicN           = 1      # 1+log
ML_BSSN::harmonicF           = 2.0    # 1+log
ML_BSSN::ShiftGammaCoeff     = 0.75
ML_BSSN::BetaDriver          = 1.0
ML_BSSN::advectLapse         = 1
ML_BSSN::advectShift         = 1

ML_BSSN::MinimumLapse        = 1.0e-8

ML_BSSN::initial_boundary_condition = "extrapolate-gammas"
ML_BSSN::rhs_boundary_condition     = "NewRad"
Boundary::radpower                     = 2

ML_BSSN::ML_log_confac_bound = "none"
ML_BSSN::ML_metric_bound     = "none"
ML_BSSN::ML_Gamma_bound      = "none"
ML_BSSN::ML_trace_curv_bound = "none"
ML_BSSN::ML_curv_bound       = "none"
ML_BSSN::ML_lapse_bound      = "none"
ML_BSSN::ML_dtlapse_bound    = "none"
ML_BSSN::ML_shift_bound      = "none"
ML_BSSN::ML_dtshift_bound    = "none"



ActiveThorns = "Dissipation"

Dissipation::order = 5
Dissipation::vars  = "
        ML_BSSN::ML_log_confac
        ML_BSSN::ML_metric
        ML_BSSN::ML_trace_curv
        ML_BSSN::ML_curv
        ML_BSSN::ML_Gamma
        ML_BSSN::ML_lapse
        ML_BSSN::ML_shift
        ML_BSSN::ML_dtlapse
        ML_BSSN::ML_dtshift
"



ActiveThorns = "ML_ADMConstraints TmunuBase"


#############################################################
# Output
#############################################################

IO::out_dir                          = $parfile
IO::out_fileinfo                     = "all"

CarpetIOBasic::outInfo_every         = 128
CarpetIOBasic::outInfo_vars          = "ML_ADMConstraints::ML_Ham Carpet::physical_time_per_hour"
CarpetIOBasic::real_max              = 1e6
CarpetIOBasic::int_width             = 12

CarpetIOScalar::outScalar_every      = 0
CarpetIOScalar::outScalar_reductions = "norm2 minimum"
CarpetIOScalar::outScalar_vars       = ""

CarpetIOASCII::out1D_every           = 0
CarpetIOASCII::out1D_x               = "yes"
CarpetIOASCII::out1D_y               = "no"
CarpetIOASCII::out1D_z               = "no"
CarpetIOASCII::out1D_d               = "no"
CarpetIOASCII::out1D_vars            = ""

#CarpetIOASCII::out2D_every           = 0
#CarpetIOASCII::out2D_vars            = ""
#Carpetioascii::out3D_ghosts          = "yes"
#CarpetIOASCII::out2D_xz              = "no"
#CarpetIOASCII::out2D_yz              = "no"
#CarpetIOASCII::out_precision         = 19

#CarpetIOHDF5::out_every              = 0
#CarpetIOHDF5::out_vars               = ""

CarpetIOHDF5::out2D_every            = 0
CarpetIOHDF5::out2D_vars             = ""

#############################################################
# Checkpoint and recovery
#############################################################

CarpetIOHDF5::checkpoint       = "no"
IO::checkpoint_every_walltime_hours = 6
IO::checkpoint_keep            = 2
IO::checkpoint_dir             = "test_checks"
IO::checkpoint_on_terminate    = "no"

IO::recover                    = "autoprobe"
IO::recover_dir                = "test_checks"
IO::recover_and_remove         = "no"


More information about the Users mailing list