[Users] Issue running the default qc0-mclachlan.par

Gomard-Henshaw, Chad cgomard at gatech.edu
Wed Oct 3 16:40:45 CDT 2018


Hi Roland,

My apologies regarding the screenshots. I've tried running the simulation using the submit command as you suggested (I'm running on a regular workstation). Here is the complete text:


./simfactory/bin/sim submit qc00-submit1 \
>   --parfile=par/qc0-mclachlan.par --procs=2 --num-threads=1 --ppn-used=2
Warning: simulation "qc00-submit1" does not exist or is not readable
Parameter file: /home/henshaw/Cactus/par/qc0-mclachlan.par
Skeleton Created
Job directory: "/home/henshaw/simulations/qc00-submit1"
Executable: "/home/henshaw/Cactus/exe/cactus_sim"
Option list: "/home/henshaw/simulations/qc00-submit1/SIMFACTORY/cfg/OptionList"
Submit script: "/home/henshaw/simulations/qc00-submit1/SIMFACTORY/run/SubmitScript"
Run script: "/home/henshaw/simulations/qc00-submit1/SIMFACTORY/run/RunScript"
Parameter file: "/home/henshaw/simulations/qc00-submit1/SIMFACTORY/par/qc0-mclachlan.par"
Assigned restart id: 0
Executing submit command: exec nohup /home/henshaw/simulations/qc00-submit1/output-0000/SIMFACTORY/SubmitScript < /dev/null > /dev/null 2> /dev/null & echo $!
Submit finished, job id is 39



Attached are the .out , .err, and backtrace files, it looks like I'm getting the same error as before. I don't quite understand your comment about multiple MPI stacks being installed - how would I know if this is the case?


Thanks for your help.


--

Chad Henshaw
Georgia Institute of Technology - Physics


________________________________
From: Roland Haas <rhaas at illinois.edu>
Sent: Tuesday, October 2, 2018 8:27:23 PM
To: Gomard-Henshaw, Chad
Cc: ian.hinder at aei.mpg.de; Einstein Toolkit Users
Subject: Re: [Users] Issue running the default qc0-mclachlan.par

Hello Chad,

please do not just provide the error messages as a screenshot since
this shows only last couple of lines and makes it very hard to read
the error messages. Instead please copy and paste the actual text into
the email.

Ideally even provide the *.out and *.err files in the output-XXXX
directory (if they exist) as attachments to the email.

The error you are observing can be caused by a faulty MPI stack in
particular if the code is compiled with one MPI library but run with
another.

You can often determine that by looking at the output of the *.err and
*.out files and checking whether the output is duplicated. In your case
since you used create-run no such file was created so that is not quite
possible.

Looking at the output there are for example two "Writing backtrace to
qc0-mchlachlan/backtrace.0.txt" (hope I typed in this path correctly)
lines while there should be only one since the text is output
in ./repos/carpet/CarpetLib/src/backtrace.cc only once.

I would try using the more commonly used "submit" command to start the
simulation then check the out and err files ie:

./simfactory/bin/sim submit qc00-submit1 \
  --parfile=par/qc0-mclachlan.par --procs=2 --num-threads=1 --ppn-used=2

I would also check if perhaps more than one MPI stack has been
installed (eg MPICH and OpenMPI). This machine is your
laptop/workstation or is this a cluster?

Yours,
Roland

> Hi Roland,
>
>
> Thanks for the reply. I completely re-installed and compiled ETK and tried to run qc0 using this command:
>
>
> ./simfactory/bin/sim create-run qc0 \
>   --parfile=par/qc0-mclachlan.par --procs=2 --num-threads=1 --ppn-used=2
>
>
> The program runs only for a moment before I get a new error (attached). I'm not sure how to interpret this, could you take a look?
>
>
> Much appreciated!
>
> --
>
> Chad Henshaw
> Georgia Institute of Technology - Physics
>
>
> ________________________________
> From: Roland Haas <rhaas at illinois.edu>
> Sent: Tuesday, October 2, 2018 8:20:02 AM
> To: Gomard-Henshaw, Chad
> Cc: ian.hinder at aei.mpg.de; Einstein Toolkit Users
> Subject: Re: [Users] Issue running the default qc0-mclachlan.par
>
> Hello Chad,
>
> that information can be read out of eg the RunScript that you can find
> in simulations/XXX/output-0000/SIMFACTORY/RunScript or (the default
> value anyway) from
>
> simfactory/bin/sim print-mdb-entry $(simfactory/bin/sim whoami | awk '{print $NF}') | grep threads
> max-num-threads = 12
> num-threads     = 6
>
> You can also force the issue by using:
>
>
> ./simfactory/bin/sim create-run static_tov  --parfile=par/static_tov_small_short.par --procs=2 --num-threads=1 --ppn-used=2  --walltime=8:0:0
>
> which uses 2 threads in total (procs=2) with 1 thread per MPI rank (num-threads=1) and makes simfactory believe that there are 2 cores present (ppn-used=2). This results in 2 MPI ranks with 1 thread each.
>
> Yours,
> Roland
>
> > Hi Ian,
> >
> >
> > Thanks for replying. This didn't work for me, but I understand your logic. Is there a way that I can determine how many threads per process my simfactory is configured for? That way I should be able to specify the exact # of procs right?
> >
> >
> > Thanks!
> >
> >
> > --
> >
> > Chad Henshaw
> > Georgia Institute of Technology - Physics
> >
> >
> > ________________________________
> > From: ian.hinder at aei.mpg.de <ian.hinder at aei.mpg.de>
> > Sent: Monday, October 1, 2018 5:06:24 PM
> > To: Gomard-Henshaw, Chad
> > Cc: Einstein Toolkit Users
> > Subject: Re: [Users] Issue running the default qc0-mclachlan.par
> >
> >
> >
> > On 28 Sep 2018, at 18:48, Gomard-Henshaw, Chad <cgomard at gatech.edu<mailto:cgomard at gatech.edu>> wrote:
> >
> > Hello,
> >
> > When running the default qc0 simulation, I get an error (see attached). This was run using the following command in the windows linux subshell:
> >
> > ./simfactory/bin/sim create-run qc05 \
> >   --parfile=par/qc0-mclachlan.par
> >
> >
> > The simulation runs for about an hour before aborting; I get partial output files but only with two data points. Can you please advise on how to address this issue?
> >
> > Hi,
> >
> > We should have a FAQ...  You need to run on at least two processes, due to internal limitations in the code. So add
> >
> >  --procs 2
> >
> > to your create-run command line.
> >
> > [I don't know exactly how your machine is configured in simfactory; if it is configured to use more than one thread per process, then you need to use enough "--procs" (which really means "threads") that at least two MPI processes are used.]
> >
> > --
> > Ian Hinder
> > https://ianhinder.net
> >
>
>
>
> --
> My email is as private as my paper mail. I therefore support encrypting
> and signing email messages. Get my PGP key from http://pgp.mit.edu .



--
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://keys.gnupg.net.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20181003/c583f931/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: qc00-submit1.err
Type: application/octet-stream
Size: 1823 bytes
Desc: qc00-submit1.err
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20181003/c583f931/attachment-0002.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: qc00-submit1.out
Type: application/octet-stream
Size: 197675 bytes
Desc: qc00-submit1.out
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20181003/c583f931/attachment-0003.obj 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: backtrace.0.txt
Url: http://lists.einsteintoolkit.org/pipermail/users/attachments/20181003/c583f931/attachment-0001.txt 


More information about the Users mailing list