[Users] Cactus Core Usage on HPC Cluster
schnetter at cct.lsu.edu
Sun Feb 5 11:35:45 CST 2017
Apart from needing to see the option list etc., it might be that
OpenMP is using the additional "processors". Depending on how the
cluster has been set up, OpenMP might not have been foreseen, and if
so, disabling OpenMP while building or while running would be
necessary. At run time, you can set the environment variable
OMP_NUM_THREADS=1. Exactly how to do so depends on which MPI version
you are using.
On Sun, Feb 5, 2017 at 12:09 PM, Gwyneth Allwright
<allgwy001 at myuct.ac.za> wrote:
> Hi Ian and Erik,
> Thank you very much for all the advice and pointers so far!
> I didn't compile the ET myself; it was done by an HPC engineer. He is
> unfamiliar with Cactus and started off not using a config file, so he had to
> troubleshoot his way through the compilation process. We are both scratching
> our heads about what the issue with mpirun could be.
> I suspect he didn't set MPI_DIR, so I'm going to suggest that he fixes that
> and see if recompiling takes care of things.
> The scheduler automatically terminates jobs that run on too many processors.
> For my simulation, this appears to happen as soon as TwoPunctures starts
> generating the initial data. I then get error messages of the form: "Job
> terminated as it used more cores (17.6) than requested (4)." (I switched
> from requesting 3 processors to requesting 4.) The number of cores it tries
> to use appears to differ from run to run.
> The parameter file uses Carpet. It generates the following output (when I
> request 4 processors):
> INFO (Carpet): MPI is enabled
> INFO (Carpet): Carpet is running on 4 processes
> INFO (Carpet): This is process 0
> INFO (Carpet): OpenMP is enabled
> INFO (Carpet): This process contains 16 threads, this is thread 0
> INFO (Carpet): There are 64 threads in total
> INFO (Carpet): There are 16 threads per process
> Mpirun gives me the following information for the node allocation: slots=4,
> max_slots=0, slots_inuse=0, state=UP.
> The tree view of the processes looks like this:
> PID TTY STAT TIME COMMAND
> 19503 ? S 0:00 sshd: allgwy001 at pts/7
> 19504 pts/7 Ss 0:00 \_ -bash
> 6047 pts/7 R+ 0:00 \_ ps -u allgwy001 f
> Adding "cat $PBS_NODEFILE" to my PBS script didn't seem to produce anything,
> although I could be doing something stupid. I'm very new to the syntax!
> Erik: documentation for the cluster I'm trying to run on (called HEX) is
> available under the Documentation tab at hpc.uct.ac.za, but it's very basic.
> Thanks again for all your help! I'll let you know if we make any progress.
> On Sun, Feb 5, 2017 at 10:55 AM, Ian Hinder <ian.hinder at aei.mpg.de> wrote:
>> On 4 Feb 2017, at 20:13, Gwyneth Allwright <allgwy001 at myuct.ac.za> wrote:
>> Hi All,
>> I'm trying to get the Einstein Toolkit installed on an HPC cluster running
>> SLES. The trouble is that Cactus tries to use all the available processors
>> even when I specify a smaller number (by setting ppn in my PBS script).
>> As a test, I tried running the compilation with a parameter file that
>> required about 5 GB of RAM. In my PBS script, I set nodes=1 and ppn=3, and
>> then ran using openmpi-1.10.1:
>> mpirun -hostfile $PBS_NODEFILE <ET exe> <parameter file>
>> This resulted in the simulation running on all 24 available processors,
>> even though I'd only requested 3. Since PBS and MPI are integrated, I was
>> told that using -np with mpirun wouldn't help.
>> Does anyone know how to address this issue?
>> Hi Gwyneth,
>> A few questions:
>> Can you check which nodes are appearing in $PBS_NODEFILE?
>> When you say it's running on all 24 available processors, what do you
>> mean? Do you mean that "top" shows 24 processes, or that the CPU is at
>> 100%? Could it be that you do in fact have three processes, but due to
>> OpenMP, each process actually has 12 threads?
>> Can you do
>> ps -u $USER f
>> on the node while Cactus is running? This will show you a tree view of
>> the processes that are running.
>> Is the parameter file using PUGH or Carpet? The standard output from the
>> first process should show you the number of processes that Cactus thinks it
>> is running on.
>> When you compiled the ET, did you specify the location of the MPI library
>> with MPI_DIR? If Cactus failed to find the library automatically, it will
>> have automatically built its own version of OpenMPI and linked with that
>> version. If you then use an mpirun from a different installation of MPI,
>> things will not work correctly.
>> Maybe you can increase the verbosity of mpirun, to check how many
>> processes it is trying to start.
>> Ian Hinder
>> Disclaimer - University of Cape Town This e-mail is subject to UCT
>> policies and e-mail disclaimer published on our website at
>> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27
>> 21 650 9111. If this e-mail is not related to the business of UCT, it is
>> sent by the sender in an individual capacity. Please report security
>> incidents or abuse via csirt at uct.ac.za
Erik Schnetter <schnetter at cct.lsu.edu>
More information about the Users