[Users] MPI killing issue

Roland Haas rhaas at illinois.edu
Wed Dec 23 07:30:31 CST 2020


Hello Karima,

I am not a black hole evolution expert, but I can give this a try.

Looking at the AHFinderDirect output in your *.out file (and the grid
structure output):

--8<--
INFO (AHFinderDirect): AH 1/3: r=0.25 at (3.000000,-0.000000,0.000000)
INFO (AHFinderDirect): AH 1/3: area=9.555893546e+36 m_irreducible=4.360142907e+17
INFO (AHFinderDirect): AH 2/3: r=0.25 at (-3.000000,-0.000000,-0.000000)
INFO (AHFinderDirect): AH 2/3: area=9.555893546e+36 m_irreducible=4.360142907e+17
--8<--

your black holes are located exactly on grid points. Since there is a
singularity that goes as 1/r at the center of the black holes this means
that your data contains very large numbers which can develop into
NaNs and PunctureTracker picks this up. In particular this can be
compiler and optimization level specific (in particular the Intel and
GNU compilers may produce different results).

To fix I would suggest moving your black holes way from the exact grid
point. By slightly increasing or decreasing their separation or pick a
grid spacing that avoids putting a grind point there.

Your parfile seems to use a smaller domain (only to 60) and coarser
resolution (3 instead of 2) on the coarsest grid which can increase the
likelyhood of this to happen. Eg using a coarsest grid spacing of 3.03
(and a grid coordinate size of 20*3.03 = 60.6) may help since it
(should) avoid bringing any grid point very close to the center of the
black hole.

Yours,
Roland

> Thank you, it removed the error, but I again encounter another error in
> the PunctureTracker. I am unable to adjust it, maybe a lack of some
> knowledge. Your support would be greatly appreciated in getting around it.
> 
> Thank you
> 
> Best regards,
> Karima S
> 
> 
> On Wed, Dec 23, 2020 at 3:50 AM Roland Haas <rhaas at illinois.edu> wrote:
> 
> > Hello Karima,
> >
> > looking at the error message and the parfile (BBHLowRes.par) that you
> > included in your email then there are these lines in the parfile:
> >
> > --8<--
> > Contents successfully written to
> > /home/karima/Cactus/repos/simfactory2/etc/defs.local.ini
> > %%bash
> > ./simfactory/bin/sim build -j2 --thornlist ../einsteintoolkit.th
> > %%bash
> > ./simfactory/bin/sim build -j2 --thornlist ../einsteintoolkit.th
> > Using configuration: sim
> > Updated thorn list for configuration sim
> > Building sim
> > Cactus - version: 4.9.0
> > Building configuration sim
> > Reconfiguring thorns
> > Reading ThornList...
> > Parsing configuration files...
> >    ADMAnalysis
> >
> >
> >
> > --8<--
> >
> > which clearly should not be there. They looks to me like some part of
> > screen output produced when copying and pasting the cell content from
> > the tutorial (which in itself is not wrong).
> >
> > You will have to remove those lines and check that your parfile
> > actually agrees with the one from Cactus/par/arXiv-1111.3344 that you
> > would like to use.
> >
> > Yours,
> > Roland
> >
> > > Hello there,
> > > I am running a simple binary black hole simulation from the parameter
> > file
> > > kept in the Cactus/par/arXiv-1111.3344 where I simulated high resolution.
> > > Furthermore, I assume simulation for lower resolution is less costly than
> > > higher, but I wonder it keeps giving me errors in the low-resolution run.
> > > (kernel dies for the MedResolution). I tried almost all the ways of
> > > adjusting the par file according to my local machine strength, but I
> > > failed. The error is below along with the par file, *.err, and *.out
> > > filesgenerators attached.
> > > Note: There is no storage issue at all.
> > >
> > > Any help would be highly appreciated.
> > >
> > >
> > >
> > > Warning: Too many threads per process specified: specified
> > > num-threads=2 (ppn-used is 2)
> > > Warning: Total number of threads and number of threads per process are
> > > inconsistent: procs=1, num-threads=2 (procs*num-smt must be an integer
> > > multiple of num-threads)
> > > Warning: Total number of threads and number of cores per node are
> > > inconsistent: procs=1, ppn-used=2 (procs must be an integer multiple
> > > of ppn-used)
> > > + set -e
> > > + cd /home/karima/simulations/bbhL/output-0000-active
> > > + echo Checking:
> > > + pwd
> > > + hostname
> > > + date
> > > + echo Environment:
> > > + export CACTUS_NUM_PROCS=1
> > > + export CACTUS_NUM_THREADS=2
> > > + export GMON_OUT_PREFIX=gmon.out
> > > + export OMP_NUM_THREADS=2
> > > + env
> > > + sort
> > > + echo Starting:
> > > + date +%s
> > > + export CACTUS_STARTTIME=1608653732
> > > + [ 1 = 1 ]
> > > + [ 0 -eq 0 ]
> > > + /home/karima/simulations/bbhL/SIMFACTORY/exe/cactus_sim -L 3
> > > /home/karima/simulations/bbhL/output-0000/BBHLowRes.par
> > > WARNING level 0 from host karima-Latitude-E5470 process 0
> > >   in thorn cactus, file BBHLowRes.par:1:
> > >   -> ERROR IN PARAMETER FILE:Parse Error
> > > Expected one of the following characters: ':'
> > > CarpetMask::excluded_surface_factor[1] = 1.0
> > > CarpetMask::excluded_surface       [2] = 2
> > > CarpetMask::excluded_surface_factor[2] = 1.0
> > > Contents successfully written to
> > > /home/karima/Cactus/repos/simfactory2/etc/defs.local.ini
> > >          ^
> > >
> > >
> > --------------------------------------------------------------------------
> > > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> > > with errorcode 1.
> > >
> > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> > > You may or may not see output from other processes, depending on
> > > exactly when Open MPI kills them.
> > >
> > >
> > >
> > > Thanks in advance
> > >
> > > best,
> > >
> > > Karima S
> >
> >
> > --
> > My email is as private as my paper mail. I therefore support encrypting
> > and signing email messages. Get my PGP key from https://urldefense.com/v3/__http://pgp.mit.edu__;!!DZ3fjg!vK_laSF6UHgmUOfxg3c0gLamGE2bQJMKY1KL7Qt89sNFp5t4jH3VBktz_Xw8QFbI$  .
> >


-- 
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20201223/f07085ab/attachment.bin 


More information about the Users mailing list