[Users] Issue with npernode value in MPI

Shamim Haque 1910511 shamims at iiserb.ac.in
Wed May 1 06:54:08 CDT 2024


Hi all,

I am attempting ETK installation in KALINGA Cluster at NISER, India. This
cluster has 40 procs per node and SLURM workload manager.

I compiled ETK with gcc-7.5 and openmpi-4.0.5 (attached the machinefile,
optionlist, submitscript and runscript). The installation is mostly
alright, as I can run parfiles for test TOV and BNS mergers.

I tried to run a simulation with procs=160 (nodes 4) and num-threads=1 but
landed with this error (error file also attached):












*+ mpiexec -n 640 -npernode 40.0
/home/kamal/simulations/dx25_r500_rg7_t30_p640-1_2/SIMFACTORY/exe/cactus_sim
-L 3
/home/kamal/simulations/dx25_r500_rg7_t30_p640-1_2/output-0000/eos20_dx25_r500_rg7.par----------------------------------------------------------------------------Open
MPI has detected that a parameter given to a command lineoption does not
match the expected format:  Option: npernode  Param:  40.0This is
frequently caused by omitting to provide the parameterto an option that
requires one. Please check the command line and try
again.----------------------------------------------------------------------------*

Strangely, this error is not at all regular. Mostly, the error won't
appear, and the simulation works just fine (with no changes being made in
the scripts or simfactory command). In fact, this exact simulation has
worked fine before. Since I am unable to find the source of this issue, I
am also unable to recreate the error on my own. But it does kick in
occasionally.

My command for mpi execution in runscript looks like this:

*time mpiexec -n @NUM_PROCS@ -npernode @(@PPN_USED@ / @NUM_THREADS@)@
@EXECUTABLE@ -L 3 @PARFILE@*

If I replace * @(@PPN_USED@ / @NUM_THREADS@)@ *with a desired value, then
the script always works. My simfactory command looks like this:


*./simfactory/bin/sim create-submit dx25_r500_rg7_t30_p640-1_2
--parfile=par-smooth/scale_test/eos20_dx25_r500_rg7.par --queue=large1
--procs=640 --num-threads=1 --walltime=00:45:00*

I am unable to understand how to solve this issue. Any help with this issue
is appreciated. Please let me know if you need more information. Thank you.

Regards
Shamim Haque
Senior Research Fellow (SRF)
Department of Physics
IISER Bhopal
ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.einsteintoolkit.org/pipermail/users/attachments/20240501/ac78fb05/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kalinga_gcc.run
Type: application/octet-stream
Size: 618 bytes
Desc: not available
URL: <http://lists.einsteintoolkit.org/pipermail/users/attachments/20240501/ac78fb05/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kalinga.ini
Type: application/x-ini
Size: 2815 bytes
Desc: not available
URL: <http://lists.einsteintoolkit.org/pipermail/users/attachments/20240501/ac78fb05/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kalinga_gcc.cfg
Type: application/octet-stream
Size: 3325 bytes
Desc: not available
URL: <http://lists.einsteintoolkit.org/pipermail/users/attachments/20240501/ac78fb05/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kalinga.sub
Type: application/octet-stream
Size: 492 bytes
Desc: not available
URL: <http://lists.einsteintoolkit.org/pipermail/users/attachments/20240501/ac78fb05/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dx25_r500_rg7_t30_p640-1_2.err
Type: application/octet-stream
Size: 1215 bytes
Desc: not available
URL: <http://lists.einsteintoolkit.org/pipermail/users/attachments/20240501/ac78fb05/attachment-0003.obj>


More information about the Users mailing list