[Users] Running with SLURM

Roland Haas rhaas at illinois.edu
Tue Aug 9 09:46:31 CDT 2022


Hello Jessica,

Thank you for the complete log files attached.

> I am trying to get ET setup on the Quartz cluster at Indiana
> University
> (https://urldefense.com/v3/__https://kb.iu.edu/d/qrtz__;!!DZ3fjg!6NAU24hNx0p3U_OIRGrTJ0MnP7bfBsm9OeN4v6sYNxkjyjyjOvBX7Oze58YaAnHsE8Tx3wna_LQ-BXrf2g$
> ).  Simfactory compiles, however, I've run into a problem when a
> simulation begins and keep getting a segmentation fault (one per MPI
> process).  The cluster uses SLURM, which is new to me, and RHEL 8.
> I've attached the relevant portion of an example output file (with
> stderr and stdout merged), the corresponding batch file that produced
> it when submitted, and below are the key settings from the machine
> file and the one line from the generic mpi runscript that I have
> edited.  Any guidance would be much appreciated.

My guess would be that there is a mismatch between the MPI library used
to compile and the one used to run.

You can look at the file

configs/sim/bindings/Configuration/Capabilities/make.MPI.defn

which will list explicitly what libraries are used to link against and
what directories were used. These should match the runtime directories
that show up in the error message:

/N/soft/rhel8/openmpi/gnu/4.0.5

Sometimes one cannot really compile on compute nodes, did you check
with the cluster admins whether they suppor this or whether compilation
should only be done on the login nodes?

It is possible that "mpirun" used by the run script is not the one
matching the MPI library used also there is the possibility that srun
does not use the correct MPI stack. One would hope that "module load
openmpi" would take care of this.

Finally it will be very helpful to first try a simple Hello, world MPI
code such as this one (say): 

https://mpitutorial.com/tutorials/mpi-hello-world/

and following one of the cluster examples.

To mimic how Cactus compiles you would not use the mpicc compiler
wrapper but instead do something like (after inspecting make.MPI.defn):

gcc -L/N/soft/rhel8/openmpi/gnu/4.0.5/lib \
  -I/N/soft/rhel8/openmpi/gnu/4.0.5/include \
  -Wl,--rpath,/N/soft/rhel8/openmpi/gnu/4.0.5/lib \
  hello.c -lmpi_cxx -lmpi -o hello

then submit a job for hello.

Yours,
Roland

-- 
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20220809/07e3bcae/attachment.bin 


More information about the Users mailing list