[Users] a problem met in running the sample code "Binary, inspiraling neutron stars forming a hypermassive neutron star" on Jimmy's own work station

白济民 beki-cat at sjtu.edu.cn
Mon May 24 07:35:34 CDT 2021


Hi Roland,
Thanks for your advice and I know that I need more than 1 MPI ranks to run the simulation. I manage to change the related parameters in my mdb/machines .ini file as follows:
--8<--
# Source tree management
sourcebasedir   = /home/bai/ET
optionlist      = generic.cfg
submitscript    = generic.sub
runscript       = generic.run
make            = make -j at MAKEJOBS@
basedir         = /home/bai/simulations
ppn             = 52
max-num-threads = 26
num-threads     = 26
nodes           = 1
submit          = exec nohup @SCRIPTFILE@ < /dev/null > @RUNDIR@/@SIMULATION_NAME at .out 2> @RUNDIR@/@SIMULATION_NAME at .err & echo $!
getstatus       = ps @JOB_ID@
--8<--
so that I can use the "./simfactory/bin/sim setup-silent" command to run simfactory using the machine's default settings.

However, when I run the simulation, it aborts and the same level 0 warning occurs together with the following notice:
--9<--
WARNING level 0 from host dell-Precision-7920-Tower process 0
  while executing schedule bin BoundaryConditions, routine RotatingSymmetry180::Rot180_ApplyBC
  in thorn RotatingSymmetry180, file /home/bai/ET/Cactus/configs/sim/build/RotatingSymmetry180/rotatingsymmetry180.c:492:
  -> TAT/Slab can only be used if there is a single local component per MPI process
cactus_sim: /home/bai/ET/Cactus/configs/sim/build/Carpet/helpers.cc:275: int Carpet::Abort(const cGH*, int): Assertion `0' failed.
Rank 0 with PID 74149 received signal 6
Writing backtrace to nsnstohmns/backtrace.0.txt
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).

mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE).  You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------
--9<--
For clearance, I upload the machine.ini file. 
Yours sincerely,
Jimmy

----- 原始邮件 -----
发件人: "Roland Haas" <rhaas at illinois.edu>
收件人: "白济民" <beki-cat at sjtu.edu.cn>
抄送: "users" <users at einsteintoolkit.org>, "1614603292" <1614603292 at qq.com>
发送时间: 星期五, 2021年 5 月 21日 下午 10:02:06
主题: Re: [Users] a problem met in running the sample code "Binary, inspiraling neutron stars forming a hypermassive neutron star" on Jimmy's own work station

Hello Jimmy,

the error is the level 0 warning at the end of the err file:

--8<--
WARNING level 0 from host dell-Precision-7920-Tower process 0
  while executing schedule bin BoundaryConditions, routine RotatingSymmetry180::Rot180_ApplyBC
  in thorn RotatingSymmetry180, file /home/bai/ET/Cactus/configs/sim/build/RotatingSymmetry180/rotatingsymmetry180.c:492:
  -> TAT/Slab can only be used if there is a single local component per MPI process
cactus_sim: /home/bai/ET/Cactus/configs/sim/build/Carpet/helpers.cc:275: int Carpet::Abort(const cGH*, int): Assertion `0' 
--8<--

namely "TAT/Slab can only be used if there is a single local component
per MPI process". 

To avoid this you will have to use more than 1 MPI ranks (the technical
description is a bit complicated).

When using simulation factory you must ensure that the values for
--procs / --cores (total number of threads created) and --num-threads
(number of threads per MPI rank) are such that there are at least 2 MPI
ranks.

Eg:

./simfactory/bin/sim submit --cores 12 --num-threads 6 ...

or when using mpirun directly the equivalent would be:

export OMP_NUM_THREADS=6
mpirun -n 2 ...

Yours,
Roland

> Hello,
>     I met a problem when running the sample code "Binary, inspiraling neutron stars forming a hypermassive neutron star" in ET's gallery on my own work station and I'm looking forward to your help.
>     It aborts unexpectedly after running a few minutes. The end of the Output-error-file reads as follows:
> 
>     cactus_sim: /home/bai/ET/Cactus/configs/sim/build/Carpet/helpers.cc:275: int Carpet::Abort(const cGH*, int): Assertion `0' failed.
>     Rank 0 with PID 73447 received signal 6
>     Writing backtrace to nsnstohmns/backtrace.0.txt
>     Aborted (core dumped)
> 
>     I also uploaded the entire error file for clearance.
> 
>     I built the ET using 64 processors by using the following command:
>     simfactory/bin/sim build -j64 --thornlist thornlists/nsnstohmns.th
>     
>     and I ran the simulation using 20 processors by using the following command:
>     ./simfactory/bin/sim create-submit bns_merger /home/bai/ET/Cactus/par/nsnstohmns.par 20 24:0:0
>     
> Yours sincerely:
> Jimmy
>         

-- 
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dell-Precision-7920-Tower.ini
Type: application/octet-stream
Size: 1972 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20210524/6585f7ad/attachment.obj 


More information about the Users mailing list