[Users] Issue with npernode value in MPI
Steven Brandt
sbrandt at cct.lsu.edu
Wed May 1 12:34:53 CDT 2024
Hello Shamim,
The error says that you're calling MPI with the wrong parameters,
specificall -npernode. Since you're using slurm, MPI should be smart
enough that you don't need to pass -n, -npernode, How did you get a
Runscript and Submitscript for this machine. Did you create yourself?
--Steve
On 5/1/2024 6:54 AM, Shamim Haque 1910511 wrote:
> Hi all,
>
> I am attempting ETK installation in KALINGA Cluster at NISER, India.
> This cluster has 40 procs per node and SLURM workload manager.
>
> I compiled ETK with gcc-7.5 and openmpi-4.0.5 (attached the
> machinefile, optionlist, submitscript and runscript). The installation
> is mostly alright, as I can run parfiles for test TOV and BNS mergers.
>
> I tried to run a simulation with procs=160 (nodes 4) and num-threads=1
> but landed with this error (error file also attached):
>
> /+ mpiexec -n 640 -npernode 40.0
> /home/kamal/simulations/dx25_r500_rg7_t30_p640-1_2/SIMFACTORY/exe/cactus_sim
> -L 3
> /home/kamal/simulations/dx25_r500_rg7_t30_p640-1_2/output-0000/eos20_dx25_r500_rg7.par
> ----------------------------------------------------------------------------
> Open MPI has detected that a parameter given to a command line
> option does not match the expected format:
>
> Option: npernode
> Param: 40.0
>
> This is frequently caused by omitting to provide the parameter
> to an option that requires one. Please check the command line and try
> again.
> ----------------------------------------------------------------------------
> /
>
> Strangely, this error is not at all regular. Mostly, the error won't
> appear, and the simulation works just fine (with no changes being made
> in the scripts or simfactory command). In fact, this exact simulation
> has worked fine before. Since I am unable to find the source of this
> issue, I am also unable to recreate the error on my own. But it does
> kick in occasionally.
>
> My command for mpi execution in runscript looks like this:
>
> /time mpiexec -n @NUM_PROCS@ -npernode @(@PPN_USED@ / @NUM_THREADS@)@
> @EXECUTABLE@ -L 3 @PARFILE@/
>
> If I replace / @(@PPN_USED@ / @NUM_THREADS@)@ /with a desired value,
> then the script always works. My simfactory command looks like this:
>
> /./simfactory/bin/sim create-submit dx25_r500_rg7_t30_p640-1_2
> --parfile=par-smooth/scale_test/eos20_dx25_r500_rg7.par --queue=large1
> --procs=640 --num-threads=1 --walltime=00:45:00
> /
>
> I am unable to understand how to solve this issue. Any help with this
> issue is appreciated. Please let me know if you need more information.
> Thank you.
>
> Regards
> Shamim Haque
> Senior Research Fellow (SRF)
> Department of Physics
> IISER Bhopal
> ᐧ
>
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.einsteintoolkit.org/pipermail/users/attachments/20240501/bfef1b19/attachment.htm>
More information about the Users
mailing list