[Users] Einstein toolkit with Sun Grid Engine

Roland Haas rhaas at illinois.edu
Fri Oct 8 17:13:55 CDT 2021


Hello Chris,

the "way"-ness was a TACC thing I believe.

Occasionally, in particular in old files, you will see constructs such
as:

uniq ${PBS_NODEFILE} > ${MPD_NODEFILE}
for node in $(cat ${MPD_NODEFILE}); do
    for ((proc=0; $proc<@(@PPN_USED@ / @NUM_THREADS@)@; proc=$proc+1)); do
        echo ${node}
    done
done > ${MPI_NODEFILE}

mpirun -np @NUM_PROCS@ -machinefile ${MPI_NODEFILE} @EXECUTABLE@ -L 3 @PARFILE@

ie one constructs a custom MPI host list file that manually lists the
nostname as many times as needed to start the correct number of MPI
ranks on the host.

SGE has a similar variable PE_HOSTFILE and if all else fails you can
likely do the same thing replace PBS_NODEFILE by PE_HOSTFILE

Yours,
Roland

> Chris
> 
> I am unfamiliar with the details of SGE; I cannot tell whether this
> approach makes sense.
> 
> -erik
> 
> 
> On Thu, Oct 7, 2021 at 5:19 PM Chris Stevens <chris.stevens at canterbury.ac.nz>
> wrote:
> 
> > Hi Erik,
> >
> > Thanks for your suggestion.
> >
> > I am happy using these in the scripts, but I think the problem is how to
> > pass these expressions to SGE. From what I can tell, the output of
> > @(@PPN_USED@/@NUM_THREADS@)@way is, for example, "6way", given @PPN_USED@=48
> > and @NUM_THREADS@=8. This means that I have requested the parallel
> > environment called 6way with @PROCS_REQUESTED@ slots. If I requested 48
> > slots, then I would use mpirun -np 6. Thus, from what I gather, for this to
> > work, this specific parallel environment 6way needs to exist. I am now
> > figuring out how to configure parallel environments in such a way, most
> > likely by changing the allocation rule.
> >
> > Let me know if you think this is wrong, as it does seem rather stupid to
> > not be able to just set -ncpus-per-task like in Slurm in the submission
> > script.
> >
> > Cheers,
> >
> > Chris
> >
> >
> >
> >
> >
> >
> >
> >
> > *Dr Chris Stevens*
> >
> > *Lecturer in Applied Mathematics*
> >
> > Rm 602, Jack Erskine building
> >
> > School of Mathematics and Statistics
> >
> > T: +64 3 369 0396 (Internal 90396)
> >
> > University of Canterbury | Te Whare Wānanga o Waitaha
> >
> > Private Bag 4800, Christchurch 8140, New Zealand
> >
> > https://urldefense.com/v3/__http://www.chrisdoesmaths.com__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_vsMaonm$ 
> >
> > *Director*
> > SCRI Ltd
> > https://urldefense.com/v3/__http://www.scri.co.nz__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_kjY9Edc$ 
> >
> > ------------------------------
> > *From:* Erik Schnetter <schnetter at cct.lsu.edu>
> > *Sent:* 08 October 2021 09:40
> > *To:* Chris Stevens <chris.stevens at canterbury.ac.nz>
> > *Cc:* users at einsteintoolkit.org <users at einsteintoolkit.org>
> > *Subject:* Re: [Users] Einstein toolkit with Sun Grid Engine
> >
> > Chris
> >
> > It might not be necessary to hard-code the number of threads. You can use
> > arbitrary Python expressions via "@( ... )@" in the templates. See e.g. the
> > template for Blue Waters which uses this to choose between CPU and GPU
> > queues.
> >
> > -erik
> >
> >
> > On Thu, Oct 7, 2021 at 4:04 PM Chris Stevens <  
> > chris.stevens at canterbury.ac.nz> wrote:  
> >
> > Hi Roland,
> >
> > That's fantastic, thanks for linking those files.
> >
> > It works as expected with only MPI processes. I am careful in compiling
> > and running with the same (and only) OpenMPI installation on the cluster,
> > so this should be OK.
> >
> > Finding a Slurm to SGE conversion table, there is no SGE equivalent to
> > ncpus-per-task from Slurm, rather it is the allocation type of the given
> > parallel environment that does this. I.e. the backend.
> >
> > https://urldefense.com/v3/__https://srcc.stanford.edu/sge-slurm-conversion__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_iEcWKp1$ 
> >
> > Further, in the submit script of ranger, the crucial line
> >
> > #$ -pe @(@PPN_USED@/@NUM_THREADS@)@way @PROCS_REQUESTED@
> >
> > shows that you request @PROCS_REQUESTED@ slots (as I currently have) and
> > the first argument shows that the name of the parallel environment is
> > dependent upon @NUM_THREADS at . From what I take from this, I need to set
> > up a parallel environment that has hardcoded the number of threads I want
> > per MPI process and then use that parallel environment. I'll see how I go
> > there, but it isn't initially obvious how to do this!
> >
> > Cheers,
> >
> > Chris
> >
> >
> >
> >
> >
> >
> >
> > *Dr Chris Stevens*
> >
> > *Lecturer in Applied Mathematics*
> >
> > Rm 602, Jack Erskine building
> >
> > School of Mathematics and Statistics
> >
> > T: +64 3 369 0396 (Internal 90396)
> >
> > University of Canterbury | Te Whare Wānanga o Waitaha
> >
> > Private Bag 4800, Christchurch 8140, New Zealand
> >
> > https://urldefense.com/v3/__http://www.chrisdoesmaths.com__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_vsMaonm$ 
> >
> > *Director*
> > SCRI Ltd
> > https://urldefense.com/v3/__http://www.scri.co.nz__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_kjY9Edc$ 
> >
> >
> > ------------------------------
> > *From:* Roland Haas
> > *Sent:* Thursday, October 07, 2021 06:22
> > *To:* Chris Stevens
> > *Cc:* users at einsteintoolkit.org
> > *Subject:* Re: [Users] Einstein toolkit with Sun Grid Engine
> >
> > Hello Chris,
> >
> > We used SGE a long time ago on some of the TACC machines.
> >
> > You can find an old setup for TACC's Ranger cluster in an old commit
> > like so:
> >
> > git checkout fed9f8d6fae4c52ed2d0a688fcc99e51b94e608e
> >
> > and then look at the "ranger" files in OUTDATED subdirectories of
> > machines, runscripts, submitscripts.
> >
> > Having all MPI ranks on a single node might also be caused by using
> > different MPI stacks when compiling and when running so you must make
> > sure that the "mpirun" (or equivalent command) you use is the one that
> > belongs to the MPI library that you used when linking your code.
> >
> > Finally you may also have to check if this is an issue with threads and
> > MPI ranks. Ie I would check if things are still wrong if you use only
> > MPI processes and no OpenMP threads at all (in that case you would have
> > to check what SGE counts: threads (cores) or MPI ranks (processes)).
> >
> > Yours,
> > Roland
> >  
> > > Hi everyone,
> > >
> > > I have set up the Einstein toolkit on a local cluster of 20 nodes with  
> > the SGE scheduler. I have not seen any examples of this scheduler being
> > used with the Einstein toolkit.  
> > >
> > > I have managed to get it working; however it seems if I ask for a  
> > certain number of slots that requires more than one node, it correctly
> > allocates these, however all processes and threads are run on the one node
> > and is oversubscribed.  
> > >
> > > My question is whether anybody has used SGE with the Einstein toolkit  
> > and if this is a good thing or not? If it is possible, I can send more
> > details if there are people willing to help solve this inter-node
> > communication problem.  
> > >
> > > Thanks in advance,
> > >
> > > Chris
> > >
> > > [cid:29d54967-59c8-486e-adea-80af7ce2cc49]
> > >
> > >
> > > [cid:55ebbbb5-1e12-45a2-8d51-206c70460c36]
> > >
> > >
> > >
> > > Dr Chris Stevens
> > >
> > > Lecturer in Applied Mathematics
> > >
> > > Rm 602, Jack Erskine building
> > >
> > > School of Mathematics and Statistics
> > >
> > > T: +64 3 369 0396 (Internal 90396)
> > >
> > > University of Canterbury | Te Whare Wānanga o Waitaha
> > >
> > > Private Bag 4800, Christchurch 8140, New Zealand
> > >
> > >  
> > https://urldefense.com/v3/__http://www.chrisdoesmaths.com__;!!DZ3fjg!rvExVfoK3iWdskfjDNUxwMCUktw9L_Wt8NTaikC7HLu245hE370Ok_JYsZduIoBu$
> > <
> > https://urldefense.com/v3/__http://www.chrisdoesmaths.com/__;!!DZ3fjg!rvExVfoK3iWdskfjDNUxwMCUktw9L_Wt8NTaikC7HLu245hE370Ok_JYsfTVv_dN$  
> > >
> > >
> > >
> > > Director
> > > SCRI Ltd
> > >  
> > https://urldefense.com/v3/__http://www.scri.co.nz__;!!DZ3fjg!rvExVfoK3iWdskfjDNUxwMCUktw9L_Wt8NTaikC7HLu245hE370Ok_JYsaY3VCkl$
> > <
> > https://urldefense.com/v3/__http://www.scri.co.nz/__;!!DZ3fjg!rvExVfoK3iWdskfjDNUxwMCUktw9L_Wt8NTaikC7HLu245hE370Ok_JYsSEV4xVt$  
> > >
> > >  
> >
> >
> >
> > --
> > My email is as private as my paper mail. I therefore support encrypting
> > and signing email messages. Get my PGP key from https://urldefense.com/v3/__http://pgp.mit.edu__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_myvSPeF$  .
> > _______________________________________________
> > Users mailing list
> > Users at einsteintoolkit.org
> > https://urldefense.com/v3/__http://lists.einsteintoolkit.org/mailman/listinfo/users__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_p6wf57D$ 
> >
> >
> >
> > --
> > Erik Schnetter <schnetter at cct.lsu.edu>
> > https://urldefense.com/v3/__http://www.perimeterinstitute.ca/personal/eschnetter/__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_soNnivC$ 
> >
> > _______________________________________________
> > Users mailing list
> > Users at einsteintoolkit.org
> > https://urldefense.com/v3/__http://lists.einsteintoolkit.org/mailman/listinfo/users__;!!DZ3fjg!uIkk67wRbCsh3zElvpI9BgBa5DFSTT4-ejwsw-X1jEDWdsxqLIqqJTCL_p6wf57D$ 
> >  
> 
> 


Yours,
Roland

-- 
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20211008/cf547bfe/attachment.bin 


More information about the Users mailing list