[Users] Error in BNS Simulation at Cluster

Steven R. Brandt sbrandt at cct.lsu.edu
Thu Aug 12 09:22:45 CDT 2021


Something I'm not understanding. You want to run on 2 nodes, but you 
don't seem to be using a batch queue system... so how does MPI know 
which two nodes to use? Does this machine have slurm installed?

--Steve

On 8/12/2021 12:28 AM, Shamim Haque wrote:
> Hi Steven,
>
> I used the generic Submitscript, no change in that. The Runscript is 
> as follows:
>
> /echo "Preparing:"
> set -x                          # Output commands
> set -e                          # Abort on errors
> cd @RUNDIR at -active
> echo "Checking:"
> pwd
> hostname
> date
> echo "Environment:"
> module load /home2/mallick/ET/Cactus/openmpi-x86_64
> export CACTUS_NUM_PROCS=@NUM_PROCS@
> export CACTUS_NUM_THREADS=@NUM_THREADS@
> export GMON_OUT_PREFIX=gmon.out
> export OMP_NUM_THREADS=@NUM_THREADS@
> env | sort > SIMFACTORY/ENVIRONMENT
> echo "Starting:"
> export CACTUS_STARTTIME=$(date +%s)
> mpirun -np @NUM_PROCS@ @EXECUTABLE@ -L 3 @PARFILE@
> /
> /echo "Stopping:"
> date
> echo "Done."
> /
>
> I have attached these files here as well. All the output/error files 
> are attached in my previous mail.
>
> Regards
> Shamim Haque
> Junior Research Fellow (JRF)
> Department of Physics
> IISER Bhopal
>
>>
> On Thu, Aug 12, 2021 at 12:48 AM Steven R. Brandt <sbrandt at cct.lsu.edu 
> <mailto:sbrandt at cct.lsu.edu>> wrote:
>
>     What SubmitScript and RunScript are you using? Can you show us?
>     Thanks.
>
>     --Steve
>
>     On 8/10/2021 2:35 AM, Shamim Haque wrote:
>>     Hello,
>>
>>     I am trying to run the BNS simulation on the cluster at IISER
>>     Bhopal. Upon using 2 nodes (16x2 cores) my simulation stalled at
>>     this message:
>>     /The environment variable CACTUS_NUM_PROCS is set to 32, but
>>     there are 1 MPI processes. This may indicate a severe problem
>>     with the MPI startup mechanism./
>>     /APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
>>     /
>>
>>     The command fed into the simfactory via jobscript is as follows:
>>
>>     /time mpirun /home2/mallick/ET/Cactus/simfactory/bin/sim
>>     create-submit nsns30 --basedir=/home2/mallick/simulations
>>     --procs=32 --ppn=16 --num-threads=1 --num-smt=1 --ppn-used=16
>>     --parfile /home2/mallick/ET/Cactus/parfile/nsns_vlr_mass_diff.par
>>     --walltime=24:00:00//*
>>     */
>>
>>     I could not figure out the issue. I am also struggling with
>>     setting up the machine scripts as per the cluster, so I am not
>>     sure if that is somehow hampering the simulation.
>>
>>     Thanks in advance for helping me with this issue. I have attached
>>     the concerned scripts and outputs for reference.
>>
>>     Regards
>>     Shamim Haque
>>     Junior Research Fellow (JRF)
>>     Department of Physics
>>     IISER Bhopal
>>>>
>>     _______________________________________________
>>     Users mailing list
>>     Users at einsteintoolkit.org  <mailto:Users at einsteintoolkit.org>
>>     http://lists.einsteintoolkit.org/mailman/listinfo/users  <http://lists.einsteintoolkit.org/mailman/listinfo/users>
>     _______________________________________________
>     Users mailing list
>     Users at einsteintoolkit.org <mailto:Users at einsteintoolkit.org>
>     http://lists.einsteintoolkit.org/mailman/listinfo/users
>     <http://lists.einsteintoolkit.org/mailman/listinfo/users>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20210812/cbe4e7e8/attachment-0001.html 


More information about the Users mailing list