[Users] Cactus Core Usage on HPC Cluster

Gwyneth Allwright allgwy001 at myuct.ac.za
Mon Feb 6 15:33:40 CST 2017


Hi Ian and Erik,

Setting export OMP_NUM_THREADS=1 did the trick! I'm now up and running.

Thank you very much for helping me out!

Gwyneth


On Sun, Feb 5, 2017 at 9:24 PM, Ian Hinder <ian.hinder at aei.mpg.de> wrote:

>
> On 5 Feb 2017, at 18:09, Gwyneth Allwright <allgwy001 at myuct.ac.za> wrote:
>
> Hi Ian and Erik,
>
> Thank you very much for all the advice and pointers so far!
>
> I didn't compile the ET myself; it was done by an HPC engineer. He is
> unfamiliar with Cactus and started off not using a config file, so he had
> to troubleshoot his way through the compilation process. We are both
> scratching our heads about what the issue with mpirun could be.
>
> I suspect he didn't set MPI_DIR, so I'm going to suggest that he fixes
> that and see if recompiling takes care of things.
>
> The scheduler automatically terminates jobs that run on too many
> processors. For my simulation, this appears to happen as soon as
> TwoPunctures starts generating the initial data. I then get error messages
> of the form: "Job terminated as it used more cores (17.6) than requested
> (4)." (I switched from requesting 3 processors to requesting 4.) The number
> of cores it tries to use appears to differ from run to run.
>
> The parameter file uses Carpet. It generates the following output (when I
> request 4 processors):
>
> INFO (Carpet): MPI is enabled
> INFO (Carpet): Carpet is running on 4 processes
> INFO (Carpet): This is process 0
> INFO (Carpet): OpenMP is enabled
> INFO (Carpet): This process contains 16 threads, this is thread 0
> INFO (Carpet): There are 64 threads in total
> INFO (Carpet): There are 16 threads per process
>
>
> It looks like mpirun has started the 4 processes that you asked for, and
> each of those processes has started 16 threads.  The ET uses OpenMP threads
> by default.  You need to set the environment variable OMP_NUM_THREADS to
> the number of threads you want per process.  If you just want 4 MPI
> processes, each with one thread, then you can try putting
>
> export OMP_NUM_THREADS=1
>
> before your mpirun command.  On Linux, OMP_NUM_THREADS defaults to the
> number of "hardware threads" in the system (which will likely be the number
> of cores multiplied by 2, if hyperthreading is enabled).  So a single
> process that supports OpenMP will use all the cores available.  If you want
> to have more than one MPI process using OpenMP on the same node, you will
> have to restrict the number of threads per process.
>
> Carpet has a couple of environment variables which is uses to cross-check
> that you have the number of MPI processes and threads that you were
> expecting.  To help with debugging, you can set
>
> export CACTUS_NUM_THREADS=1
> export CACTUS_NUM_PROCS=4
>
> if you want 4 processes with one thread each.  This won't affect the
> number of threads or processes, but it will allow Carpet to check that what
> you intended matches reality.  In this case, it should abort with an error
> (or in older versions of Carpet, output a warning), since while you have 4
> processes, each one has 16 threads, not 1.
>
> Mpirun gives me the following information for the node allocation:
> slots=4, max_slots=0, slots_inuse=0, state=UP.
>
> The tree view of the processes looks like this:
>
> PID TTY      STAT   TIME COMMAND
> 19503 ?        S      0:00 sshd: allgwy001 at pts/7
>
> 19504 pts/7    Ss     0:00  \_ -bash
>  6047 pts/7    R+     0:00      \_ ps -u allgwy001 f
>
>
> This is not showing the Cactus or mpirun process at all; something is
> wrong.  Was Cactus running when you typed this?  Were you logged in to the
> node that it was running on?
>
> Adding "cat $PBS_NODEFILE" to my PBS script didn't seem to produce
> anything, although I could be doing something stupid. I'm very new to the
> syntax!
>
>
> That's odd.
>
> --
> Ian Hinder
> http://members.aei.mpg.de/ianhin
>
> Disclaimer - University of Cape Town This e-mail is subject to UCT
> policies and e-mail disclaimer published on our website at
> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27
> 21 650 9111 <+27%2021%20650%209111>. If this e-mail is not related to the
> business of UCT, it is sent by the sender in an individual capacity. Please
> report security incidents or abuse via csirt at uct.ac.za
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20170206/01b07a82/attachment.html 


More information about the Users mailing list