[Users] Issue running the default qc0-mclachlan.par

Gomard-Henshaw, Chad cgomard at gatech.edu
Sat Oct 6 09:15:42 CDT 2018


Roland,


Removing OpenMPI worked for me and I was able to run qc0 to completion. For anyone else who runs into this issue I am operating on Ubuntu through WSL.


Thank you so much for your help!


--

Chad Henshaw
Georgia Institute of Technology - Physics


________________________________
From: Roland Haas <rhaas at illinois.edu>
Sent: Friday, October 5, 2018 11:57:43 AM
To: Gomard-Henshaw, Chad
Cc: ian.hinder at aei.mpg.de; Einstein Toolkit Users
Subject: Re: [Users] Issue running the default qc0-mclachlan.par

Hello Chad,

thank you for the files.

Assuming that you have already tried compiling from scratch (ie running
"rm -rf configs/sim" before simfactory/bin/sim build) to make sure the
issue is not between an MPI stack installed by your package manager and
the copy of OpenMPI that Cactus can compile itself from the source code
in arrangements/ExternalLibraries/MPI if it does not detect an
installed MPI stack then you can check for which ones are installed
using your system's package manager.

Basically on

Ubuntu/Debian/Mint:

dpkg --list | grep -i mpi | grep -iv compil

Centos/Fedora/RedHat/OpenSUSE:

rpm -qa |  grep -i mpi | grep -iv compil

which show all installed packages whose name contains "mpi" but not
"compil" (ignore case). If in there you find more than one of:

* OpenMPI
* Mpich
* mvapich
* impi

then you have multiple stacks installed and should uninstall (using
apt-get, yum, zypper or whatever the package manager is called) all but
one.

Yours,
Roland

> Hi Roland,
>
> My apologies regarding the screenshots. I've tried running the simulation using the submit command as you suggested (I'm running on a regular workstation). Here is the complete text:
>
>
> ./simfactory/bin/sim submit qc00-submit1 \
> >   --parfile=par/qc0-mclachlan.par --procs=2 --num-threads=1 --ppn-used=2
> Warning: simulation "qc00-submit1" does not exist or is not readable
> Parameter file: /home/henshaw/Cactus/par/qc0-mclachlan.par
> Skeleton Created
> Job directory: "/home/henshaw/simulations/qc00-submit1"
> Executable: "/home/henshaw/Cactus/exe/cactus_sim"
> Option list: "/home/henshaw/simulations/qc00-submit1/SIMFACTORY/cfg/OptionList"
> Submit script: "/home/henshaw/simulations/qc00-submit1/SIMFACTORY/run/SubmitScript"
> Run script: "/home/henshaw/simulations/qc00-submit1/SIMFACTORY/run/RunScript"
> Parameter file: "/home/henshaw/simulations/qc00-submit1/SIMFACTORY/par/qc0-mclachlan.par"
> Assigned restart id: 0
> Executing submit command: exec nohup /home/henshaw/simulations/qc00-submit1/output-0000/SIMFACTORY/SubmitScript < /dev/null > /dev/null 2> /dev/null & echo $!
> Submit finished, job id is 39
>
>
>
> Attached are the .out , .err, and backtrace files, it looks like I'm getting the same error as before. I don't quite understand your comment about multiple MPI stacks being installed - how would I know if this is the case?
>
>
> Thanks for your help.
>
>
> --
>
> Chad Henshaw
> Georgia Institute of Technology - Physics
>
>
> ________________________________
> From: Roland Haas <rhaas at illinois.edu>
> Sent: Tuesday, October 2, 2018 8:27:23 PM
> To: Gomard-Henshaw, Chad
> Cc: ian.hinder at aei.mpg.de; Einstein Toolkit Users
> Subject: Re: [Users] Issue running the default qc0-mclachlan.par
>
> Hello Chad,
>
> please do not just provide the error messages as a screenshot since
> this shows only last couple of lines and makes it very hard to read
> the error messages. Instead please copy and paste the actual text into
> the email.
>
> Ideally even provide the *.out and *.err files in the output-XXXX
> directory (if they exist) as attachments to the email.
>
> The error you are observing can be caused by a faulty MPI stack in
> particular if the code is compiled with one MPI library but run with
> another.
>
> You can often determine that by looking at the output of the *.err and
> *.out files and checking whether the output is duplicated. In your case
> since you used create-run no such file was created so that is not quite
> possible.
>
> Looking at the output there are for example two "Writing backtrace to
> qc0-mchlachlan/backtrace.0.txt" (hope I typed in this path correctly)
> lines while there should be only one since the text is output
> in ./repos/carpet/CarpetLib/src/backtrace.cc only once.
>
> I would try using the more commonly used "submit" command to start the
> simulation then check the out and err files ie:
>
> ./simfactory/bin/sim submit qc00-submit1 \
>   --parfile=par/qc0-mclachlan.par --procs=2 --num-threads=1 --ppn-used=2
>
> I would also check if perhaps more than one MPI stack has been
> installed (eg MPICH and OpenMPI). This machine is your
> laptop/workstation or is this a cluster?
>
> Yours,
> Roland
>
> > Hi Roland,
> >
> >
> > Thanks for the reply. I completely re-installed and compiled ETK and tried to run qc0 using this command:
> >
> >
> > ./simfactory/bin/sim create-run qc0 \
> >   --parfile=par/qc0-mclachlan.par --procs=2 --num-threads=1 --ppn-used=2
> >
> >
> > The program runs only for a moment before I get a new error (attached). I'm not sure how to interpret this, could you take a look?
> >
> >
> > Much appreciated!
> >
> > --
> >
> > Chad Henshaw
> > Georgia Institute of Technology - Physics
> >
> >
> > ________________________________
> > From: Roland Haas <rhaas at illinois.edu>
> > Sent: Tuesday, October 2, 2018 8:20:02 AM
> > To: Gomard-Henshaw, Chad
> > Cc: ian.hinder at aei.mpg.de; Einstein Toolkit Users
> > Subject: Re: [Users] Issue running the default qc0-mclachlan.par
> >
> > Hello Chad,
> >
> > that information can be read out of eg the RunScript that you can find
> > in simulations/XXX/output-0000/SIMFACTORY/RunScript or (the default
> > value anyway) from
> >
> > simfactory/bin/sim print-mdb-entry $(simfactory/bin/sim whoami | awk '{print $NF}') | grep threads
> > max-num-threads = 12
> > num-threads     = 6
> >
> > You can also force the issue by using:
> >
> >
> > ./simfactory/bin/sim create-run static_tov  --parfile=par/static_tov_small_short.par --procs=2 --num-threads=1 --ppn-used=2  --walltime=8:0:0
> >
> > which uses 2 threads in total (procs=2) with 1 thread per MPI rank (num-threads=1) and makes simfactory believe that there are 2 cores present (ppn-used=2). This results in 2 MPI ranks with 1 thread each.
> >
> > Yours,
> > Roland
> >
> > > Hi Ian,
> > >
> > >
> > > Thanks for replying. This didn't work for me, but I understand your logic. Is there a way that I can determine how many threads per process my simfactory is configured for? That way I should be able to specify the exact # of procs right?
> > >
> > >
> > > Thanks!
> > >
> > >
> > > --
> > >
> > > Chad Henshaw
> > > Georgia Institute of Technology - Physics
> > >
> > >
> > > ________________________________
> > > From: ian.hinder at aei.mpg.de <ian.hinder at aei.mpg.de>
> > > Sent: Monday, October 1, 2018 5:06:24 PM
> > > To: Gomard-Henshaw, Chad
> > > Cc: Einstein Toolkit Users
> > > Subject: Re: [Users] Issue running the default qc0-mclachlan.par
> > >
> > >
> > >
> > > On 28 Sep 2018, at 18:48, Gomard-Henshaw, Chad <cgomard at gatech.edu<mailto:cgomard at gatech.edu>> wrote:
> > >
> > > Hello,
> > >
> > > When running the default qc0 simulation, I get an error (see attached). This was run using the following command in the windows linux subshell:
> > >
> > > ./simfactory/bin/sim create-run qc05 \
> > >   --parfile=par/qc0-mclachlan.par
> > >
> > >
> > > The simulation runs for about an hour before aborting; I get partial output files but only with two data points. Can you please advise on how to address this issue?
> > >
> > > Hi,
> > >
> > > We should have a FAQ...  You need to run on at least two processes, due to internal limitations in the code. So add
> > >
> > >  --procs 2
> > >
> > > to your create-run command line.
> > >
> > > [I don't know exactly how your machine is configured in simfactory; if it is configured to use more than one thread per process, then you need to use enough "--procs" (which really means "threads") that at least two MPI processes are used.]
> > >
> > > --
> > > Ian Hinder
> > > https://ianhinder.net
> > >
> >
> >
> >
> > --
> > My email is as private as my paper mail. I therefore support encrypting
> > and signing email messages. Get my PGP key from http://pgp.mit.edu .
>
>
>
> --
> My email is as private as my paper mail. I therefore support encrypting
> and signing email messages. Get my PGP key from http://keys.gnupg.net.



--
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20181006/21beb8be/attachment-0001.html 


More information about the Users mailing list