[Users] Cactus Core Usage on HPC Cluster
allgwy001 at myuct.ac.za
Mon Feb 6 15:33:40 CST 2017
Hi Ian and Erik,
Setting export OMP_NUM_THREADS=1 did the trick! I'm now up and running.
Thank you very much for helping me out!
On Sun, Feb 5, 2017 at 9:24 PM, Ian Hinder <ian.hinder at aei.mpg.de> wrote:
> On 5 Feb 2017, at 18:09, Gwyneth Allwright <allgwy001 at myuct.ac.za> wrote:
> Hi Ian and Erik,
> Thank you very much for all the advice and pointers so far!
> I didn't compile the ET myself; it was done by an HPC engineer. He is
> unfamiliar with Cactus and started off not using a config file, so he had
> to troubleshoot his way through the compilation process. We are both
> scratching our heads about what the issue with mpirun could be.
> I suspect he didn't set MPI_DIR, so I'm going to suggest that he fixes
> that and see if recompiling takes care of things.
> The scheduler automatically terminates jobs that run on too many
> processors. For my simulation, this appears to happen as soon as
> TwoPunctures starts generating the initial data. I then get error messages
> of the form: "Job terminated as it used more cores (17.6) than requested
> (4)." (I switched from requesting 3 processors to requesting 4.) The number
> of cores it tries to use appears to differ from run to run.
> The parameter file uses Carpet. It generates the following output (when I
> request 4 processors):
> INFO (Carpet): MPI is enabled
> INFO (Carpet): Carpet is running on 4 processes
> INFO (Carpet): This is process 0
> INFO (Carpet): OpenMP is enabled
> INFO (Carpet): This process contains 16 threads, this is thread 0
> INFO (Carpet): There are 64 threads in total
> INFO (Carpet): There are 16 threads per process
> It looks like mpirun has started the 4 processes that you asked for, and
> each of those processes has started 16 threads. The ET uses OpenMP threads
> by default. You need to set the environment variable OMP_NUM_THREADS to
> the number of threads you want per process. If you just want 4 MPI
> processes, each with one thread, then you can try putting
> export OMP_NUM_THREADS=1
> before your mpirun command. On Linux, OMP_NUM_THREADS defaults to the
> number of "hardware threads" in the system (which will likely be the number
> of cores multiplied by 2, if hyperthreading is enabled). So a single
> process that supports OpenMP will use all the cores available. If you want
> to have more than one MPI process using OpenMP on the same node, you will
> have to restrict the number of threads per process.
> Carpet has a couple of environment variables which is uses to cross-check
> that you have the number of MPI processes and threads that you were
> expecting. To help with debugging, you can set
> export CACTUS_NUM_THREADS=1
> export CACTUS_NUM_PROCS=4
> if you want 4 processes with one thread each. This won't affect the
> number of threads or processes, but it will allow Carpet to check that what
> you intended matches reality. In this case, it should abort with an error
> (or in older versions of Carpet, output a warning), since while you have 4
> processes, each one has 16 threads, not 1.
> Mpirun gives me the following information for the node allocation:
> slots=4, max_slots=0, slots_inuse=0, state=UP.
> The tree view of the processes looks like this:
> PID TTY STAT TIME COMMAND
> 19503 ? S 0:00 sshd: allgwy001 at pts/7
> 19504 pts/7 Ss 0:00 \_ -bash
> 6047 pts/7 R+ 0:00 \_ ps -u allgwy001 f
> This is not showing the Cactus or mpirun process at all; something is
> wrong. Was Cactus running when you typed this? Were you logged in to the
> node that it was running on?
> Adding "cat $PBS_NODEFILE" to my PBS script didn't seem to produce
> anything, although I could be doing something stupid. I'm very new to the
> That's odd.
> Ian Hinder
> Disclaimer - University of Cape Town This e-mail is subject to UCT
> policies and e-mail disclaimer published on our website at
> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27
> 21 650 9111 <+27%2021%20650%209111>. If this e-mail is not related to the
> business of UCT, it is sent by the sender in an individual capacity. Please
> report security incidents or abuse via csirt at uct.ac.za
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Users