[Users] Issue with npernode value in MPI

Shamim Haque 1910511 shamims at iiserb.ac.in
Wed May 1 07:08:39 CDT 2024


Sorry for the typo in the email: *I tried to run a simulation with
procs=640 (nodes 16) procs=160 (nodes 4)*
Shamim Haque
Senior Research Fellow (SRF)
Department of Physics
IISER Bhopal

ᐧ

On Wed, May 1, 2024 at 5:24 PM Shamim Haque 1910511 <shamims at iiserb.ac.in>
wrote:

> Hi all,
>
> I am attempting ETK installation in KALINGA Cluster at NISER, India. This
> cluster has 40 procs per node and SLURM workload manager.
>
> I compiled ETK with gcc-7.5 and openmpi-4.0.5 (attached the machinefile,
> optionlist, submitscript and runscript). The installation is mostly
> alright, as I can run parfiles for test TOV and BNS mergers.
>
> I tried to run a simulation with procs=160 (nodes 4) and num-threads=1 but
> landed with this error (error file also attached):
>
>
>
>
>
>
>
>
>
>
>
>
> *+ mpiexec -n 640 -npernode 40.0
> /home/kamal/simulations/dx25_r500_rg7_t30_p640-1_2/SIMFACTORY/exe/cactus_sim
> -L 3
> /home/kamal/simulations/dx25_r500_rg7_t30_p640-1_2/output-0000/eos20_dx25_r500_rg7.par----------------------------------------------------------------------------Open
> MPI has detected that a parameter given to a command lineoption does not
> match the expected format:  Option: npernode  Param:  40.0This is
> frequently caused by omitting to provide the parameterto an option that
> requires one. Please check the command line and try
> again.----------------------------------------------------------------------------*
>
> Strangely, this error is not at all regular. Mostly, the error won't
> appear, and the simulation works just fine (with no changes being made in
> the scripts or simfactory command). In fact, this exact simulation has
> worked fine before. Since I am unable to find the source of this issue, I
> am also unable to recreate the error on my own. But it does kick in
> occasionally.
>
> My command for mpi execution in runscript looks like this:
>
> *time mpiexec -n @NUM_PROCS@ -npernode @(@PPN_USED@ / @NUM_THREADS@)@
> @EXECUTABLE@ -L 3 @PARFILE@*
>
> If I replace * @(@PPN_USED@ / @NUM_THREADS@)@ *with a desired value, then
> the script always works. My simfactory command looks like this:
>
>
> *./simfactory/bin/sim create-submit dx25_r500_rg7_t30_p640-1_2
> --parfile=par-smooth/scale_test/eos20_dx25_r500_rg7.par --queue=large1
> --procs=640 --num-threads=1 --walltime=00:45:00*
>
> I am unable to understand how to solve this issue. Any help with this
> issue is appreciated. Please let me know if you need more information.
> Thank you.
>
> Regards
> Shamim Haque
> Senior Research Fellow (SRF)
> Department of Physics
> IISER Bhopal
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.einsteintoolkit.org/pipermail/users/attachments/20240501/563ca2fe/attachment-0001.htm>


More information about the Users mailing list