[Users] Issue with npernode value in MPI

Steven Brandt sbrandt at cct.lsu.edu
Wed May 1 12:34:53 CDT 2024


Hello Shamim,

The error says that you're calling MPI with the wrong parameters, 
specificall -npernode. Since you're using slurm, MPI should be smart 
enough that you don't need to pass -n, -npernode,  How did you get a 
Runscript and Submitscript for this machine. Did you create yourself?

--Steve

On 5/1/2024 6:54 AM, Shamim Haque 1910511 wrote:
> Hi all,
>
> I am attempting ETK installation in KALINGA Cluster at NISER, India. 
> This cluster has 40 procs per node and SLURM workload manager.
>
> I compiled ETK with gcc-7.5 and openmpi-4.0.5 (attached the 
> machinefile, optionlist, submitscript and runscript). The installation 
> is mostly alright, as I can run parfiles for test TOV and BNS mergers.
>
> I tried to run a simulation with procs=160 (nodes 4) and num-threads=1 
> but landed with this error (error file also attached):
>
> /+ mpiexec -n 640 -npernode 40.0 
> /home/kamal/simulations/dx25_r500_rg7_t30_p640-1_2/SIMFACTORY/exe/cactus_sim 
> -L 3 
> /home/kamal/simulations/dx25_r500_rg7_t30_p640-1_2/output-0000/eos20_dx25_r500_rg7.par
> ----------------------------------------------------------------------------
> Open MPI has detected that a parameter given to a command line
> option does not match the expected format:
>
>   Option: npernode
>   Param:  40.0
>
> This is frequently caused by omitting to provide the parameter
> to an option that requires one. Please check the command line and try 
> again.
> ----------------------------------------------------------------------------
> /
>
> Strangely, this error is not at all regular. Mostly, the error won't 
> appear, and the simulation works just fine (with no changes being made 
> in the scripts or simfactory command). In fact, this exact simulation 
> has worked fine before. Since I am unable to find the source of this 
> issue, I am also unable to recreate the error on my own. But it does 
> kick in occasionally.
>
> My command for mpi execution in runscript looks like this:
>
> /time mpiexec -n @NUM_PROCS@ -npernode @(@PPN_USED@ / @NUM_THREADS@)@ 
> @EXECUTABLE@ -L 3 @PARFILE@/
>
> If I replace / @(@PPN_USED@ / @NUM_THREADS@)@ /with a desired value, 
> then the script always works. My simfactory command looks like this:
>
> /./simfactory/bin/sim create-submit dx25_r500_rg7_t30_p640-1_2 
> --parfile=par-smooth/scale_test/eos20_dx25_r500_rg7.par --queue=large1 
> --procs=640 --num-threads=1 --walltime=00:45:00
> /
>
> I am unable to understand how to solve this issue. Any help with this 
> issue is appreciated. Please let me know if you need more information. 
> Thank you.
>
> Regards
> Shamim Haque
> Senior Research Fellow (SRF)
> Department of Physics
> IISER Bhopal
>>
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.einsteintoolkit.org/pipermail/users/attachments/20240501/bfef1b19/attachment.htm>


More information about the Users mailing list