[Users] Error in BNS Simulation at Cluster
Roland Haas
rhaas at illinois.edu
Thu Aug 12 09:58:58 CDT 2021
Hello all,
these error messages are often caused by using a different MPI stack
when compiling then when running.
You would have to make sure that eg the same modules are loaded.
You could check that eg the location of the MPI library (from ldd
cactus_sim) and mpirun (from which mpirun) are compatible (eg in the
same subdirectory tree).
Yours,
Roland
> For that, I write a separate jobscript and submit it in the queue. That
> jobscript looks like this:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *#!/bin/bash#PBS -N nsns2#PBS -j oe#PBS -V#PBS -v TEMP=/scratch1#PBS -o
> job_nsns2.out#PBS -e job_nsns2.err#PBS -l select=2:ncpus=16#PBS -l
> walltime=24:10:00#PBS -q p-queueexport OMP_NUM_THREADS=1cd
> /home2/mallick/ET/CactusCPUS=`cat $PBS_NODEFILE | wc -l`DATE=`date +%c`echo
> Job started at $DATEmodule load /home2/mallick/ET/Cactus/openmpi-x86_64*
>
>
>
> */home2/mallick/ET/Cactus/simfactory/bin/sim whoami*
>
>
> *time mpirun /home2/mallick/ET/Cactus/simfactory/bin/sim create-submit
> nsns30 --basedir=/home2/mallick/simulations --procs=32 --ppn=16
> --num-threads=1 --num-smt=1 --ppn-used=16 --parfile
> /home2/mallick/ET/Cactus/parfile/nsns_vlr_mass_diff.par
> --walltime=24:00:00DATE=`date +%c`echo Job finished at $DATE*
>
> Regards
> Shamim Haque
> Junior Research Fellow (JRF)
> Department of Physics
> IISER Bhopal
>
> ᐧ
>
> On Thu, Aug 12, 2021 at 7:52 PM Steven R. Brandt <sbrandt at cct.lsu.edu>
> wrote:
>
> > Something I'm not understanding. You want to run on 2 nodes, but you don't
> > seem to be using a batch queue system... so how does MPI know which two
> > nodes to use? Does this machine have slurm installed?
> >
> > --Steve
> > On 8/12/2021 12:28 AM, Shamim Haque wrote:
> >
> > Hi Steven,
> >
> > I used the generic Submitscript, no change in that. The Runscript is as
> > follows:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *echo "Preparing:" set -x # Output commands set
> > -e # Abort on errors cd @RUNDIR at -active echo
> > "Checking:" pwd hostname date echo "Environment:" module load
> > /home2/mallick/ET/Cactus/openmpi-x86_64 export CACTUS_NUM_PROCS=@NUM_PROCS@
> > export CACTUS_NUM_THREADS=@NUM_THREADS@ export GMON_OUT_PREFIX=gmon.out
> > export OMP_NUM_THREADS=@NUM_THREADS@ env | sort > SIMFACTORY/ENVIRONMENT
> > echo "Starting:" export CACTUS_STARTTIME=$(date +%s) mpirun -np @NUM_PROCS@
> > @EXECUTABLE@ -L 3 @PARFILE@ *
> >
> >
> >
> > *echo "Stopping:" date echo "Done." *
> >
> > I have attached these files here as well. All the output/error files are
> > attached in my previous mail.
> >
> > Regards
> > Shamim Haque
> > Junior Research Fellow (JRF)
> > Department of Physics
> > IISER Bhopal
> >
> > ᐧ
> >
> > On Thu, Aug 12, 2021 at 12:48 AM Steven R. Brandt <sbrandt at cct.lsu.edu>
> > wrote:
> >
> >> What SubmitScript and RunScript are you using? Can you show us? Thanks.
> >>
> >> --Steve
> >> On 8/10/2021 2:35 AM, Shamim Haque wrote:
> >>
> >> Hello,
> >>
> >> I am trying to run the BNS simulation on the cluster at IISER Bhopal.
> >> Upon using 2 nodes (16x2 cores) my simulation stalled at this message:
> >>
> >> *The environment variable CACTUS_NUM_PROCS is set to 32, but there are 1
> >> MPI processes. This may indicate a severe problem with the MPI startup
> >> mechanism.*
> >>
> >> *APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) *
> >>
> >> The command fed into the simfactory via jobscript is as follows:
> >>
> >> *time mpirun /home2/mallick/ET/Cactus/simfactory/bin/sim create-submit
> >> nsns30 --basedir=/home2/mallick/simulations --procs=32 --ppn=16
> >> --num-threads=1 --num-smt=1 --ppn-used=16 --parfile
> >> /home2/mallick/ET/Cactus/parfile/nsns_vlr_mass_diff.par --walltime=24:00:00*
> >>
> >> I could not figure out the issue. I am also struggling with setting up
> >> the machine scripts as per the cluster, so I am not sure if that is somehow
> >> hampering the simulation.
> >>
> >> Thanks in advance for helping me with this issue. I have attached the
> >> concerned scripts and outputs for reference.
> >>
> >> Regards
> >> Shamim Haque
> >> Junior Research Fellow (JRF)
> >> Department of Physics
> >> IISER Bhopal
> >> ᐧ
> >>
> >> _______________________________________________
> >> Users mailing listUsers at einsteintoolkit.orghttp://lists.einsteintoolkit.org/mailman/listinfo/users
> >>
> >> _______________________________________________
> >> Users mailing list
> >> Users at einsteintoolkit.org
> >> https://urldefense.com/v3/__http://lists.einsteintoolkit.org/mailman/listinfo/users__;!!DZ3fjg!oex2NxK0rF3iegHjOc0Pfs-qzWZO7ydmEDV97bZY1oonG-xoi7ICPiGpKmUIBnKU$
> >>
> >
--
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20210812/3dae34d1/attachment-0001.bin
More information about the Users
mailing list