[Users] setting up an unsupported cluster
Eric West
ewest at d.umn.edu
Wed Feb 7 21:29:16 CST 2018
Hi All,
I am trying setup ET on a new cluster (hosted by the Minnesota
Supercomputing Institute at U of Minnesota, Twin Cities). I have been
using the mdb file bluewaters.ini as a template, replacing bluewaters
specs with my specs where necessary. I've attached my mdb file for
reference. I am able to build simfactory using the --machine=mesabi
flag. I am able to run the testsuite with no failed tests (although
several "unrunnable" tests, which I am assuming is ok?). I can submit
jobs, and they seem to run to completion just fine.
However, every time I log in, I am on a different login node, which
triggers an "unknown machine name" error, unless I have previously built
simfactory on that particular node. I have tried to mimick bluewaters'
aliaspattern line in hopes that it would do the trick. But I must be
doing something wrong. What do I need to include in my mdb file to force
the system to recognize that all of the login nodes are on the same machine?
For a bit more background: MSI uses a two step login process. First you
ssh into a login machine. Then you ssh from there into one of the
clusters. The machine I eventually reach is named mesabi, and the login
hosts are named ln000[1-6].
Any help is greatly appreciated.
Thanks,
Eric
--
Eric J West
Assistant Professor
Department of Physics and Astronomy
University of Minnesota Duluth
-------------- next part --------------
[mesabi]
# last-tested-on: 2018-02-07
# last-tested-by: Eric West <ewest at d.umn.edu>
# Machine description
nickname = mesabi
name = Mesabi
location = University of Minnesota
description = HP Linux cluster at MSI
webpage = https://www.msi.umn.edu/help-documentation
status = experimental
# Access to this machine
hostname = mesabi
envsetup = <<EOT
source /etc/profile
module load gcc/7.2.0
module load ompi/3.0.0
EOT
aliaspattern = ^ln000[1-6](\.msi\.umn\.edu)$
# Source tree management
sourcebasedir = /home/ewest/@USER@
optionlist = mesabi.cfg
submitscript = mesabi.sub
runscript = mesabi.run
make = make -j4
# Simulation management
basedir = /home/ewest/@USER@/simulations
nodes = 719 #number of nodes
#max-num-smt = ??? #max threads per core
#num-smt = ??? #suggested threads per core
max-num-threads = 24 #max threads per process
num-threads = 24 #threads per process
ppn = 24 #cores per node
min-ppn = 1 #min allowed ppn
memory = 61920 #memory per node in MB
allocation = NO_ALLOCATION
queue = small
maxwalltime = 96:00:00
submit = qsub @SCRIPTFILE@
getstatus = qstat @JOB_ID@
stop = qdel @JOB_ID@
submitpattern = (\d+)
statuspattern = ^@JOB_ID@\D
queuedpattern = " Q "
runningpattern = " R "
holdingpattern = " H "
#scratchbasedir = ???
stdout = cat @SIMULATION_NAME at .out
stderr = cat @SIMULATION_NAME at .err
stdout-follow = tail -n 100 -f @SIMULATION_NAME at .out @SIMULATION_NAME at .err
More information about the Users
mailing list