[Users] need help running simulation on slurm

Roland Haas rhaas at illinois.edu
Wed Dec 4 09:23:39 CST 2024


Hello Maya,

Sorry for the long delay in responding.

> My name is Maya Baireddy. I am working on my school research project trying
> to run the simulation of BNS merger on amarel supercomputer from my local
> university.

Thanks for contacting me. I will try and see if I can provide some
help. 

> Could you please help me to start my simulation on SLURM. I have followed
> the ETK gallery example for BNS simulation steps 1-5. But I am not able to
> proceed to successfully create a machine to run the simulation.
> 
> I run the following steps
> /home/sb1554/BNS/simfactory/bin/sim create bns --parfile
> /home/sb1554/BNS/bns.par --machine slurmbns
> 
> srun  bns.sh -o slurm.bns.%N.%j.out

Looks ok to me. 

> and got the error:
> 
> **** An error occurred in MPI_Init_thread*** on a NULL communicator***
> MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,***
>  and potentially your MPI job)*

With errors like this my first guess is usually that he MPI stack used
to compile and the one used at runtime is not the same. 

SLURM in particular can be dicey in that respect since it can try and
directly interface with MPI.

> I am attaching my machine, submit script, run script, log files.

Thank you.

> I would appreciate any pointers from you. Or if you could point me to the
> right person.
> 
> I was trying to post this on EKT forum, but need one credit to post.

To the gitter chat? I though that was open to all with no requirements?

This one:

https://gitter.im/EinsteinToolkit/EinsteinToolkit

The mailing list users at einsteintoolkit.org was unavailable for a while
due to required maintenance. You would have to sign up to post,
otherwise the post will be held for "moderator approval" (which it will
receive, just may take a little bit of time).

Now for the actual question.

The machine.ini file (slurmbns.ini-machine.txt) looks strange (eg it
contains two [slurmbns] sections). The line

submit          = sbatch /home/sb1554/BNS/simfactory/mdb/runscripts/slurmbns.run

will always make is use the file
"/home/sb1554/BNS/simfactory/mdb/runscripts/slurmbns.run" as the file
passed to SLURM as the job script. 

The "submit" entry should be just "sbatch" without the extra file name.

There is no "envsetup" section so you will have to yourself make sure
that he same modules (in particular mpi modules) are loaded when you
compile and when you submit the job, otherwise SLURM (and srun) may use
the wrong MPI stack.

The run script "slurmbns.ini-runscript.txt" is also strange since it
contains a "/home/sb1554/BNS/simfactory/bin/sim run" which would again
call the runscript. Instead it should contain the "srun" call.

The submitscript is also strange since it should end with a line
calling "sim run".

It may be best to first directly add the srun command to the SLURM
batch file (the "submitscript" mostly since it has the SBATCH headers)
and set the headers by hand, load the modules, and call srun from there.

This should make it look very similar to a MPI+OpenMP "Hybrid"
parallelization example submit script that your cluster admins may
provide.

For SLURM based machines, the machine ini files usually look very
similar. In your case I would suggest taking a look at say the ones for
the Delta cluster at NCSA:

https://bitbucket.org/simfactory/simfactory2/src/master/mdb/machines/delta.ini

https://bitbucket.org/simfactory/simfactory2/src/master/mdb/optionlists/delta.cfg

https://bitbucket.org/simfactory/simfactory2/src/master/mdb/runscripts/delta.run

https://bitbucket.org/simfactory/simfactory2/src/master/mdb/submitscripts/delta.sub

You may also want to call in to the Einstein Toolkit weekly call on
Thursday (the Gitter chat may also work for real time communication).

Yours,
Roland

-- 
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.einsteintoolkit.org/pipermail/users/attachments/20241204/eaec329b/attachment.sig>


More information about the Users mailing list