[Users] Error in BNS Simulation at Cluster

Shamim Haque shamims at iiserb.ac.in
Tue Aug 10 02:35:45 CDT 2021


Hello,

I am trying to run the BNS simulation on the cluster at IISER Bhopal. Upon
using 2 nodes (16x2 cores) my simulation stalled at this message:

*The environment variable CACTUS_NUM_PROCS is set to 32, but there are 1
MPI processes. This may indicate a severe problem with the MPI startup
mechanism.*

*APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)*

The command fed into the simfactory via jobscript is as follows:

*time mpirun /home2/mallick/ET/Cactus/simfactory/bin/sim create-submit
nsns30 --basedir=/home2/mallick/simulations --procs=32 --ppn=16
--num-threads=1 --num-smt=1 --ppn-used=16 --parfile
/home2/mallick/ET/Cactus/parfile/nsns_vlr_mass_diff.par --walltime=24:00:00*

I could not figure out the issue. I am also struggling with setting up the
machine scripts as per the cluster, so I am not sure if that is somehow
hampering the simulation.

Thanks in advance for helping me with this issue. I have attached the
concerned scripts and outputs for reference.

Regards
Shamim Haque
Junior Research Fellow (JRF)
Department of Physics
IISER Bhopal
ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20210810/e22eaa65/attachment-0001.html 
-------------- next part --------------
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::Creating simulation nsns30
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::Simulation directory: /home2/mallick/simulations/nsns30
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::Simulation Properties:
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::[properties]
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::machine         = n132.hpc.iiserb
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::simulationid    = simulation-nsns30-n132.hpc.iiserb-n132.hpc.iiserb-mallick-2021.08.08-13.35.35-1131
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::sourcedir       = /home2/mallick/ET/Cactus
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::configuration   = sim
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::configid        = config-sim-kanadlogin1.hpc.iiserb-home2-mallick-ET-Cactus
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::buildid         = build-sim-kanadlogin1.hpc.iiserb-mallick-2021.01.05-12.51.59-6139
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::testsuite       = False
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::executable      = /home2/mallick/simulations/nsns30/SIMFACTORY/exe/cactus_sim
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::optionlist      = /home2/mallick/simulations/nsns30/SIMFACTORY/cfg/OptionList
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::submitscript    = /home2/mallick/simulations/nsns30/SIMFACTORY/run/SubmitScript
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::runscript       = /home2/mallick/simulations/nsns30/SIMFACTORY/run/RunScript
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::parfile         = /home2/mallick/simulations/nsns30/SIMFACTORY/par/nsns_vlr_mass_diff.par
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::
[LOG:2021-08-08 13:35:35] restart.create(simulationName, parfile)::Simulation nsns30 created
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::Restart for simulation nsns30 created with restart id 0, long restart id 0000
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::Prepping for submission
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::No previous walltime available to be used, using walltime 24:00:00
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::Defined substituion properties for submission
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::{'SIMULATION_ID': 'simulation-nsns30-n132.hpc.iiserb-n132.hpc.iiserb-mallick-2021.08.08-13.35.35-1131', 'NODE_PROCS': 16, 'PPN_USED': 16, 'PPN': 16, 'ALLOCATION': 'NO_ALLOCATION', 'WALLTIME_HH': '24', 'CPUFREQ': None, 'USER': 'mallick', 'RUNDIR': '/home2/mallick/simulations/nsns30/output-0000', 'NODES': 2, 'SIMULATION_NAME': 'nsns30', 'WALLTIME': '24:00:00', 'NUM_THREADS': 1, 'EXECUTABLE': '/home2/mallick/simulations/nsns30/SIMFACTORY/exe/cactus_sim', 'PROCS_REQUESTED': 32, 'EMAIL': 'mallick', 'RESTART_ID': 0, 'CHAINED_JOB_ID': '', 'FROM_RESTART_COMMAND': '', 'NUM_SMT': 1, 'WALLTIME_SECONDS': 86400, 'SIMFACTORY': '/home2/mallick/ET/Cactus/simfactory/bin/sim', 'PROCS': 32, 'SUBMITSCRIPT': '/home2/mallick/simulations/nsns30/output-0000/SIMFACTORY/SubmitScript', 'WALLTIME_HOURS': 24.0, 'WALLTIME_MM': '00', 'PARFILE': '/home2/mallick/simulations/nsns30/output-0000/nsns_vlr_mass_diff.par', 'WALLTIME_SS': '00', 'QUEUE': 'NOQUEUE', 'CONFIGURATION': 'sim', 'SOURCEDIR': '/home2/mallick/ET/Cactus', 'HOSTNAME': 'n132.hpc.iiserb', 'NUM_PROCS': 32, 'SCRIPTFILE': '/home2/mallick/simulations/nsns30/output-0000/SIMFACTORY/SubmitScript', 'MEMORY': '0', 'WALLTIME_MINUTES': 1440, 'SHORT_SIMULATION_NAME': 'nsns30-0000'}
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::self.Properties: /home2/mallick/simulations/nsns30/output-0000/SIMFACTORY/properties.ini
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::[properties]
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::machine         = n132.hpc.iiserb
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::simulationid    = simulation-nsns30-n132.hpc.iiserb-n132.hpc.iiserb-mallick-2021.08.08-13.35.35-1131
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::sourcedir       = /home2/mallick/ET/Cactus
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::configuration   = sim
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::configid        = config-sim-kanadlogin1.hpc.iiserb-home2-mallick-ET-Cactus
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::buildid         = build-sim-kanadlogin1.hpc.iiserb-mallick-2021.01.05-12.51.59-6139
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::testsuite       = False
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::executable      = /home2/mallick/simulations/nsns30/SIMFACTORY/exe/cactus_sim
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::optionlist      = /home2/mallick/simulations/nsns30/SIMFACTORY/cfg/OptionList
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::submitscript    = /home2/mallick/simulations/nsns30/SIMFACTORY/run/SubmitScript
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::runscript       = /home2/mallick/simulations/nsns30/SIMFACTORY/run/RunScript
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::parfile         = /home2/mallick/simulations/nsns30/SIMFACTORY/par/nsns_vlr_mass_diff.par
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::chainedjobid    = -1
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::ppn             = 16
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::procsrequested  = 32
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::allocation      = NO_ALLOCATION
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::user            = mallick
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::numsmt          = 1
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::walltime        = 24:00:00
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::numprocs        = 32
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::nodeprocs       = 16
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::numthreads      = 1
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::hostname        = n132.hpc.iiserb
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::ppnused         = 16
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::queue           = NOQUEUE
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::cpufreq         = 
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::procs           = 32
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::memory          = 0
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::nodes           = 2
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::pbsSimulationName= nsns30-0000
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::saving substituted submitscript contents to: /home2/mallick/simulations/nsns30/output-0000/SIMFACTORY/SubmitScript
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::Executing submission command: exec nohup /home2/mallick/simulations/nsns30/output-0000/SIMFACTORY/SubmitScript < /dev/null > /home2/mallick/simulations/nsns30/output-0000/nsns30.out 2> /home2/mallick/simulations/nsns30/output-0000/nsns30.err & echo $!
[LOG:2021-08-08 13:35:35] self.makeActive()::Simulation nsns30 with restart-id 0 has been made active
[LOG:2021-08-08 13:35:35] job_id = self.extractJobId(output)::received raw output: 1138
[LOG:2021-08-08 13:35:35] job_id = self.extractJobId(output)::
[LOG:2021-08-08 13:35:35] job_id = self.extractJobId(output)::using submitRegex: (.*)
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::After searching raw output, it was determined that the job_id is: 1138
[LOG:2021-08-08 13:35:35] self.submit(submitScript)::Simulation nsns30, with restart id 0, and job id 1138 has been submitted
[LOG:2021-08-08 13:35:35] self.load(simulationName, restartId)::For simulation nsns30, loaded restart id 0, long restart id 0000
[LOG:2021-08-08 13:35:35] self.run()::Prepping for execution/run
[LOG:2021-08-08 13:35:35] checkpointing = self.PrepareCheckpointing(recover_id)::PrepareCheckpointing: max_restart_id: -1
[LOG:2021-08-08 13:35:35] self.run()::Defined substitution properties for execution/run
[LOG:2021-08-08 13:35:35] self.run()::{'SIMULATION_ID': 'simulation-nsns30-n132.hpc.iiserb-n132.hpc.iiserb-mallick-2021.08.08-13.35.35-1131', 'NODE_PROCS': '16', 'PPN_USED': '16', 'PPN': '16', 'WALLTIME_HH': '24', 'CPUFREQ': '', 'USER': 'mallick', 'RUNDIR': '/home2/mallick/simulations/nsns30/output-0000', 'NODES': '2', 'SIMULATION_NAME': 'nsns30', 'WALLTIME': '24:00:00', 'NUM_THREADS': '1', 'EXECUTABLE': '/home2/mallick/simulations/nsns30/SIMFACTORY/exe/cactus_sim', 'PROCS_REQUESTED': '32', 'RESTART_ID': 0, 'NUM_SMT': '1', 'WALLTIME_SECONDS': 86400, 'CONFIGURATION': 'sim', 'PROCS': '32', 'SUBMITSCRIPT': '/home2/mallick/simulations/nsns30/SIMFACTORY/run/SubmitScript', 'WALLTIME_MM': '00', 'MACHINE': 'n132.hpc.iiserb', 'PARFILE': '/home2/mallick/simulations/nsns30/output-0000/nsns_vlr_mass_diff.par', 'WALLTIME_SS': '00', 'WALLTIME_HOURS': 24.0, 'SOURCEDIR': '/home2/mallick/ET/Cactus', 'HOSTNAME': 'n132.hpc.iiserb', 'RUNDEBUG': 0, 'NUM_PROCS': '32', 'SCRIPTFILE': '/home2/mallick/simulations/nsns30/SIMFACTORY/run/SubmitScript', 'MEMORY': '0', 'WALLTIME_MINUTES': 1440, 'SHORT_SIMULATION_NAME': 'nsns30-0000'}
[LOG:2021-08-08 13:35:35] self.run()::Executing run command: /home2/mallick/simulations/nsns30/output-0000/SIMFACTORY/RunScript
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RunScript
Type: application/octet-stream
Size: 1222 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20210810/e22eaa65/attachment-0008.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nsns_vlr_mass_diff.par
Type: application/octet-stream
Size: 21691 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20210810/e22eaa65/attachment-0009.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nsns30.err
Type: application/octet-stream
Size: 28219 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20210810/e22eaa65/attachment-0010.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nsns30.out
Type: application/octet-stream
Size: 1518026 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20210810/e22eaa65/attachment-0011.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SubmitScript
Type: application/octet-stream
Size: 475 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20210810/e22eaa65/attachment-0012.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: machine_n131.hpc.iiserb.ini
Type: application/octet-stream
Size: 1994 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20210810/e22eaa65/attachment-0013.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jobscript
Type: application/octet-stream
Size: 1181 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20210810/e22eaa65/attachment-0014.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jobscript.out
Type: application/octet-stream
Size: 1399 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20210810/e22eaa65/attachment-0015.obj 


More information about the Users mailing list