[Users] setting up an unsupported cluster

Eric West ewest at d.umn.edu
Wed Feb 7 21:29:16 CST 2018


Hi All,

I am trying setup ET on a new cluster (hosted by the Minnesota 
Supercomputing Institute at U of Minnesota, Twin Cities). I have been 
using the mdb file bluewaters.ini as a template, replacing bluewaters 
specs with my specs where necessary. I've attached my mdb file for 
reference. I am able to build simfactory using the --machine=mesabi 
flag. I am able to run the testsuite with no failed tests (although 
several "unrunnable" tests, which I am assuming is ok?). I can submit 
jobs, and they seem to run to completion just fine.

However, every time I log in, I am on a different login node, which 
triggers an "unknown machine name" error, unless I have previously built 
simfactory on that particular node. I have tried to mimick bluewaters' 
aliaspattern line in hopes that it would do the trick. But I must be 
doing something wrong. What do I need to include in my mdb file to force 
the system to recognize that all of the login nodes are on the same machine?

For a bit more background: MSI uses a two step login process. First you 
ssh into a login machine. Then you ssh from there into one of the 
clusters. The machine I eventually reach is named mesabi, and the login 
hosts are named ln000[1-6].

Any help is greatly appreciated.

Thanks,
Eric

-- 
Eric J West
Assistant Professor
Department of Physics and Astronomy
University of Minnesota Duluth

-------------- next part --------------
[mesabi]

# last-tested-on: 2018-02-07
# last-tested-by: Eric West <ewest at d.umn.edu>

# Machine description
nickname        = mesabi
name            = Mesabi
location        = University of Minnesota
description     = HP Linux cluster at MSI
webpage         = https://www.msi.umn.edu/help-documentation
status          = experimental

# Access to this machine
hostname        = mesabi

envsetup        = <<EOT
    source /etc/profile
    module load gcc/7.2.0
    module load ompi/3.0.0
EOT
aliaspattern   = ^ln000[1-6](\.msi\.umn\.edu)$

# Source tree management
sourcebasedir   = /home/ewest/@USER@
optionlist      = mesabi.cfg
submitscript    = mesabi.sub
runscript       = mesabi.run
make            = make -j4

# Simulation management
basedir         = /home/ewest/@USER@/simulations
nodes           = 719 #number of nodes
#max-num-smt     = ??? #max threads per core
#num-smt         = ??? #suggested threads per core
max-num-threads = 24 #max threads per process
num-threads     = 24 #threads per process
ppn             = 24  #cores per node
min-ppn         = 1   #min allowed ppn
memory          = 61920 #memory per node in MB
allocation      = NO_ALLOCATION
queue           = small
maxwalltime     = 96:00:00
submit          = qsub @SCRIPTFILE@
getstatus       = qstat @JOB_ID@
stop            = qdel @JOB_ID@
submitpattern   = (\d+)
statuspattern   = ^@JOB_ID@\D
queuedpattern   = " Q "
runningpattern  = " R "
holdingpattern  = " H "
#scratchbasedir  = ???
stdout          = cat @SIMULATION_NAME at .out
stderr          = cat @SIMULATION_NAME at .err
stdout-follow   = tail -n 100 -f @SIMULATION_NAME at .out @SIMULATION_NAME at .err


More information about the Users mailing list