[Users] Error in BNS Simulation at Cluster
Steven R. Brandt
sbrandt at cct.lsu.edu
Thu Aug 12 09:22:45 CDT 2021
Something I'm not understanding. You want to run on 2 nodes, but you
don't seem to be using a batch queue system... so how does MPI know
which two nodes to use? Does this machine have slurm installed?
--Steve
On 8/12/2021 12:28 AM, Shamim Haque wrote:
> Hi Steven,
>
> I used the generic Submitscript, no change in that. The Runscript is
> as follows:
>
> /echo "Preparing:"
> set -x # Output commands
> set -e # Abort on errors
> cd @RUNDIR at -active
> echo "Checking:"
> pwd
> hostname
> date
> echo "Environment:"
> module load /home2/mallick/ET/Cactus/openmpi-x86_64
> export CACTUS_NUM_PROCS=@NUM_PROCS@
> export CACTUS_NUM_THREADS=@NUM_THREADS@
> export GMON_OUT_PREFIX=gmon.out
> export OMP_NUM_THREADS=@NUM_THREADS@
> env | sort > SIMFACTORY/ENVIRONMENT
> echo "Starting:"
> export CACTUS_STARTTIME=$(date +%s)
> mpirun -np @NUM_PROCS@ @EXECUTABLE@ -L 3 @PARFILE@
> /
> /echo "Stopping:"
> date
> echo "Done."
> /
>
> I have attached these files here as well. All the output/error files
> are attached in my previous mail.
>
> Regards
> Shamim Haque
> Junior Research Fellow (JRF)
> Department of Physics
> IISER Bhopal
>
> ᐧ
>
> On Thu, Aug 12, 2021 at 12:48 AM Steven R. Brandt <sbrandt at cct.lsu.edu
> <mailto:sbrandt at cct.lsu.edu>> wrote:
>
> What SubmitScript and RunScript are you using? Can you show us?
> Thanks.
>
> --Steve
>
> On 8/10/2021 2:35 AM, Shamim Haque wrote:
>> Hello,
>>
>> I am trying to run the BNS simulation on the cluster at IISER
>> Bhopal. Upon using 2 nodes (16x2 cores) my simulation stalled at
>> this message:
>> /The environment variable CACTUS_NUM_PROCS is set to 32, but
>> there are 1 MPI processes. This may indicate a severe problem
>> with the MPI startup mechanism./
>> /APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
>> /
>>
>> The command fed into the simfactory via jobscript is as follows:
>>
>> /time mpirun /home2/mallick/ET/Cactus/simfactory/bin/sim
>> create-submit nsns30 --basedir=/home2/mallick/simulations
>> --procs=32 --ppn=16 --num-threads=1 --num-smt=1 --ppn-used=16
>> --parfile /home2/mallick/ET/Cactus/parfile/nsns_vlr_mass_diff.par
>> --walltime=24:00:00//*
>> */
>>
>> I could not figure out the issue. I am also struggling with
>> setting up the machine scripts as per the cluster, so I am not
>> sure if that is somehow hampering the simulation.
>>
>> Thanks in advance for helping me with this issue. I have attached
>> the concerned scripts and outputs for reference.
>>
>> Regards
>> Shamim Haque
>> Junior Research Fellow (JRF)
>> Department of Physics
>> IISER Bhopal
>> ᐧ
>>
>> _______________________________________________
>> Users mailing list
>> Users at einsteintoolkit.org <mailto:Users at einsteintoolkit.org>
>> http://lists.einsteintoolkit.org/mailman/listinfo/users <http://lists.einsteintoolkit.org/mailman/listinfo/users>
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org <mailto:Users at einsteintoolkit.org>
> http://lists.einsteintoolkit.org/mailman/listinfo/users
> <http://lists.einsteintoolkit.org/mailman/listinfo/users>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20210812/cbe4e7e8/attachment-0001.html
More information about the Users
mailing list