[Users] a problem met in running the sample code "Binary, inspiraling neutron stars forming a hypermassive neutron star" on Jimmy's own work station

白济民 beki-cat at sjtu.edu.cn
Mon May 24 23:41:03 CDT 2021


Hi Roland,
I'm sorry I got several typos in the command lines in the previous reply, they should be:
"./simfactory/bin/sim create-submit bns_merger --procs 52 --num-threads 26 --parfile /home/bai/ET/Cactus/par/nsnstohmns.par -Roe --walltime 24:0:0"
it returns: "sim.py: error: no such option: -R"
and,
Instead, I built the ET and run the simulation via the commands:
--8<--
simfactory/bin/sim build  --procs 52 --num-threads 26 --thornlist thornlists/nsnstohmns.th 
./simfactory/bin/sim create-submit bns_merger_4 --procs 52 --num-threads 26 --parfile /home/bai/ET/Cactus/par/nsnstohmns.par --walltime 24:0:0
--8<--
Yours sincerely:
Jimmy


----- 原始邮件 -----
发件人: "白济民" <beki-cat at sjtu.edu.cn>
收件人: "users" <users at einsteintoolkit.org>
抄送: "1614603292" <1614603292 at qq.com>
发送时间: 星期二, 2021年 5 月 25日 下午 12:32:18
主题: Re: [Users] a problem met in running the sample code "Binary, inspiraling neutron stars forming a hypermassive neutron star" on Jimmy's own work station

Hi Roland,
Thanks for your patience. However, when I execute the command adding "-Roe" in Cactus:
"./simfactory/bin/sim create-submit bns_merger --procs 52 --num-threads 26 --parfile /home/bai/ET/Cactus/par/nsnstohmns.par --Roe --walltime 24:0:0"
it returns: "sim.py: error: no such option: -R"

Instead, I built the ET and run the simulation via the commands:
--8<--
simfactory/bin/sim build  --procs 52 --num-threads 26 --thornlist thornlists/nsnstohmns.th 
./simfactory/bin/sim create-submit bns_merger_4 --procs 52 --num-threads 26 --parfile /home/bai/ET/Cactus/par/nsnstohmns.par -Roe --walltime 24:0:0
--8<--

When I look at the file "mp_Psi4_l2_m2_r300.00" I'm interested in (for clearance I upload this file) it gets double lines with the same records and I wonder 
this shows that the simulation is started 2 times and I guess this is the case of mismatching MPI ranks and I'm looking forward to avoid this.
I also notice in the err file there is a large number of level-1 errors (it is too large, so I grep 1000 lines for uploading for clearance~), and I wonder 
why they occur, is this also a consequence of mismatching MPI ranks?
Yours sincerely:
Jimmy



----- 原始邮件 -----
发件人: "Roland Haas" <rhaas at illinois.edu>
收件人: "白济民" <beki-cat at sjtu.edu.cn>
抄送: "users" <users at einsteintoolkit.org>, "1614603292" <1614603292 at qq.com>
发送时间: 星期一, 2021年 5 月 24日 下午 10:29:24
主题: Re: [Users] a problem met in running the sample code "Binary, inspiraling neutron stars forming a hypermassive neutron star" on Jimmy's own work station

Hello Jimmy,

ok, in case you are already giving options to simfactory that should
result in multiple MPI ranks (eg --procs 26 --num-threads 13) then you
are most likely facing an issue that the MPI stack used to compile the
code is not the same as the one used to run the code. This should
however have resulted in a different error (namely Carpet reporting
that something is inconsistent with a CACTUS_NUM_PROCS and the number
of MPI ranks), which is why I suggested the issue might be the
simfactory command line used. I explain how to check this at the end
of the email.

Can you provide the exact (no simplified, other otherwise
modified) simfactory command line you used? Otherwise this is very hard
to remotely diagnose.

Note that the ini files just provide defaults and eg the one you
provided will, since you set num-threads to 26, use a single MPI rank
until you ask for more procs/cores than 26. Ie this command:

./simfactory/bin/sim submit --procs 26 --parfiles ...

will use 1 MPI rank. Instead you must use a command line like the one I
provided as an example before:

./simfactory/bin/sim submit --procs 26 --num-threads 13 ...

that explicitly asks for procs and num-threads such that more than 1
MPI rank is created.

Having mismatched MPI stacks tends to manifest itself in that instead of
N MPI ranks Carpet reports just 1 MPI rank but the simulation is
started N times.

To check whether this is the case you would add the "-Roe" option to
the Cactus command line which causes it to write output from each MPI
rank to a file CCTK_ProcN.out where N is the MPI rank.

You should run this and check and provide the (comnplete, please
do not abridge them) output files.

Carpet reports the total number of MPI ranks that it uses in there.

Yours,
Roland

> Hi Roland,
> Thanks for your advice and I know that I need more than 1 MPI ranks to run the simulation. I manage to change the related parameters in my mdb/machines .ini file as follows:
> --8<--
> # Source tree management
> sourcebasedir   = /home/bai/ET
> optionlist      = generic.cfg
> submitscript    = generic.sub
> runscript       = generic.run
> make            = make -j at MAKEJOBS@
> basedir         = /home/bai/simulations
> ppn             = 52
> max-num-threads = 26
> num-threads     = 26
> nodes           = 1
> submit          = exec nohup @SCRIPTFILE@ < /dev/null > @RUNDIR@/@SIMULATION_NAME at .out 2> @RUNDIR@/@SIMULATION_NAME at .err & echo $!
> getstatus       = ps @JOB_ID@
> --8<--
> so that I can use the "./simfactory/bin/sim setup-silent" command to run simfactory using the machine's default settings.
> 
> However, when I run the simulation, it aborts and the same level 0 warning occurs together with the following notice:
> --9<--
> WARNING level 0 from host dell-Precision-7920-Tower process 0
>   while executing schedule bin BoundaryConditions, routine RotatingSymmetry180::Rot180_ApplyBC
>   in thorn RotatingSymmetry180, file /home/bai/ET/Cactus/configs/sim/build/RotatingSymmetry180/rotatingsymmetry180.c:492:
>   -> TAT/Slab can only be used if there is a single local component per MPI process  
> cactus_sim: /home/bai/ET/Cactus/configs/sim/build/Carpet/helpers.cc:275: int Carpet::Abort(const cGH*, int): Assertion `0' failed.
> Rank 0 with PID 74149 received signal 6
> Writing backtrace to nsnstohmns/backtrace.0.txt
> -----------------------------------------------------------------------------
> It seems that [at least] one of the processes that was started with
> mpirun did not invoke MPI_INIT before quitting (it is possible that
> more than one process did not invoke MPI_INIT -- mpirun was only
> notified of the first one, which was on node n0).
> 
> mpirun can *only* be used with MPI programs (i.e., programs that
> invoke MPI_INIT and MPI_FINALIZE).  You can use the "lamexec" program
> to run non-MPI programs over the lambooted nodes.
> -----------------------------------------------------------------------------
> --9<--
> For clearance, I upload the machine.ini file. 
> Yours sincerely,
> Jimmy
> 
> ----- 原始邮件 -----
> 发件人: "Roland Haas" <rhaas at illinois.edu>
> 收件人: "白济民" <beki-cat at sjtu.edu.cn>
> 抄送: "users" <users at einsteintoolkit.org>, "1614603292" <1614603292 at qq.com>
> 发送时间: 星期五, 2021年 5 月 21日 下午 10:02:06
> 主题: Re: [Users] a problem met in running the sample code "Binary, inspiraling neutron stars forming a hypermassive neutron star" on Jimmy's own work station
> 
> Hello Jimmy,
> 
> the error is the level 0 warning at the end of the err file:
> 
> --8<--
> WARNING level 0 from host dell-Precision-7920-Tower process 0
>   while executing schedule bin BoundaryConditions, routine RotatingSymmetry180::Rot180_ApplyBC
>   in thorn RotatingSymmetry180, file /home/bai/ET/Cactus/configs/sim/build/RotatingSymmetry180/rotatingsymmetry180.c:492:
>   -> TAT/Slab can only be used if there is a single local component per MPI process  
> cactus_sim: /home/bai/ET/Cactus/configs/sim/build/Carpet/helpers.cc:275: int Carpet::Abort(const cGH*, int): Assertion `0' 
> --8<--
> 
> namely "TAT/Slab can only be used if there is a single local component
> per MPI process". 
> 
> To avoid this you will have to use more than 1 MPI ranks (the technical
> description is a bit complicated).
> 
> When using simulation factory you must ensure that the values for
> --procs / --cores (total number of threads created) and --num-threads
> (number of threads per MPI rank) are such that there are at least 2 MPI
> ranks.
> 
> Eg:
> 
> ./simfactory/bin/sim submit --cores 12 --num-threads 6 ...
> 
> or when using mpirun directly the equivalent would be:
> 
> export OMP_NUM_THREADS=6
> mpirun -n 2 ...
> 
> Yours,
> Roland
> 
> > Hello,
> >     I met a problem when running the sample code "Binary, inspiraling neutron stars forming a hypermassive neutron star" in ET's gallery on my own work station and I'm looking forward to your help.
> >     It aborts unexpectedly after running a few minutes. The end of the Output-error-file reads as follows:
> > 
> >     cactus_sim: /home/bai/ET/Cactus/configs/sim/build/Carpet/helpers.cc:275: int Carpet::Abort(const cGH*, int): Assertion `0' failed.
> >     Rank 0 with PID 73447 received signal 6
> >     Writing backtrace to nsnstohmns/backtrace.0.txt
> >     Aborted (core dumped)
> > 
> >     I also uploaded the entire error file for clearance.
> > 
> >     I built the ET using 64 processors by using the following command:
> >     simfactory/bin/sim build -j64 --thornlist thornlists/nsnstohmns.th
> >     
> >     and I ran the simulation using 20 processors by using the following command:
> >     ./simfactory/bin/sim create-submit bns_merger /home/bai/ET/Cactus/par/nsnstohmns.par 20 24:0:0
> >     
> > Yours sincerely:
> > Jimmy
> >           
> 


-- 
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .


More information about the Users mailing list