[Users] Cactus Core Usage on HPC Cluster
ian.hinder at aei.mpg.de
Sun Feb 5 02:55:28 CST 2017
On 4 Feb 2017, at 20:13, Gwyneth Allwright <allgwy001 at myuct.ac.za> wrote:
> Hi All,
> I'm trying to get the Einstein Toolkit installed on an HPC cluster running SLES. The trouble is that Cactus tries to use all the available processors even when I specify a smaller number (by setting ppn in my PBS script).
> As a test, I tried running the compilation with a parameter file that required about 5 GB of RAM. In my PBS script, I set nodes=1 and ppn=3, and then ran using openmpi-1.10.1:
> mpirun -hostfile $PBS_NODEFILE <ET exe> <parameter file>
> This resulted in the simulation running on all 24 available processors, even though I'd only requested 3. Since PBS and MPI are integrated, I was told that using -np with mpirun wouldn't help.
> Does anyone know how to address this issue?
A few questions:
Can you check which nodes are appearing in $PBS_NODEFILE?
When you say it's running on all 24 available processors, what do you mean? Do you mean that "top" shows 24 processes, or that the CPU is at 100%? Could it be that you do in fact have three processes, but due to OpenMP, each process actually has 12 threads?
Can you do
ps -u $USER f
on the node while Cactus is running? This will show you a tree view of the processes that are running.
Is the parameter file using PUGH or Carpet? The standard output from the first process should show you the number of processes that Cactus thinks it is running on.
When you compiled the ET, did you specify the location of the MPI library with MPI_DIR? If Cactus failed to find the library automatically, it will have automatically built its own version of OpenMPI and linked with that version. If you then use an mpirun from a different installation of MPI, things will not work correctly.
Maybe you can increase the verbosity of mpirun, to check how many processes it is trying to start.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Users