<div dir="ltr">Hi Ian and Erik,<div><br></div><div>Thank you very much for all the advice and pointers so far! </div><div><br></div><div>I didn't compile the ET myself; it was done by an HPC engineer. He is unfamiliar with Cactus and started off not using a config file, so he had to troubleshoot his way through the compilation process. We are both scratching our heads about what the issue with mpirun could be.</div><div><br></div><div>I suspect he didn't set MPI_DIR, so I'm going to suggest that he fixes that and see if recompiling takes care of things.<br></div><div><br></div><div>The scheduler automatically terminates jobs that run on too many processors. For my simulation, this appears to happen as soon as TwoPunctures starts generating the initial data. I then get error messages of the form: "Job terminated as it used more cores (17.6) than requested (4)." (I switched from requesting 3 processors to requesting 4.) The number of cores it tries to use appears to differ from run to run.<br></div><br>The parameter file uses Carpet. It generates the following output (when I request 4 processors):<br><br>INFO (Carpet): MPI is enabled<br>INFO (Carpet): Carpet is running on 4 processes<br>INFO (Carpet): This is process 0<br>INFO (Carpet): OpenMP is enabled<br>INFO (Carpet): This process contains 16 threads, this is thread 0<br>INFO (Carpet): There are 64 threads in total<br>INFO (Carpet): There are 16 threads per process<div><br></div><div>Mpirun gives me the following information for the node allocation: slots=4, max_slots=0, slots_inuse=0, state=UP.</div><div><br></div><div>The tree view of the processes looks like this:</div><div><br></div><div>PID TTY STAT TIME COMMAND<br></div><div>19503 ? S 0:00 sshd: allgwy001@pts/7 <br></div><div>19504 pts/7 Ss 0:00 \_ -bash</div><div> 6047 pts/7 R+ 0:00 \_ ps -u allgwy001 f</div><div><div><br></div><div>Adding "cat $PBS_NODEFILE" to my PBS script didn't seem to produce anything, although I could be doing something stupid. I'm very new to the syntax!<br></div><div><br></div><div>Erik: documentation for the cluster I'm trying to run on (called HEX) is available under the Documentation tab at <a href="http://hpc.uct.ac.za" target="_blank">hpc.uct.ac.za</a>, but it's very basic.<br></div><div><br></div><div>Thanks again for all your help! I'll let you know if we make any progress.</div><div><br></div><div>Gwyneth</div><div><br></div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Feb 5, 2017 at 10:55 AM, Ian Hinder <span dir="ltr"><<a href="mailto:ian.hinder@aei.mpg.de" target="_blank">ian.hinder@aei.mpg.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="word-wrap:break-word">
<br>
<div><span>
<div>On 4 Feb 2017, at 20:13, Gwyneth Allwright <<a href="mailto:allgwy001@myuct.ac.za" target="_blank">allgwy001@myuct.ac.za</a>> wrote:</div>
<br class="m_2633930787465889780m_-7887583222061194707gmail-m_-5473499630723080588m_5090236483679904337gmail-m_8921907416540160247gmail-m_-6770849734898051053m_8415244737964635264gmail-m_4190574942620669472gmail-m_-7048876862251853583m_9042496521132274219m_-9020275535935423507m_8288014323559862520Apple-interchange-newline">
<blockquote type="cite">
<div dir="ltr">Hi All,<br>
<br>
I'm trying to get the Einstein Toolkit installed on an HPC cluster running SLES. The trouble is that Cactus tries to use all the available processors even when I specify a smaller number (by setting ppn in my PBS script).<br>
<br>
As a test, I tried running the compilation with a parameter file that required about 5 GB of RAM. In my PBS script, I set nodes=1 and ppn=3, and then ran using openmpi-1.10.1:<br>
<br>
mpirun -hostfile $PBS_NODEFILE <ET exe> <parameter file><br>
<br>
This resulted in the simulation running on all 24 available processors, even though I'd only requested 3. Since PBS and MPI are integrated, I was told that using -np with mpirun wouldn't help.
<br>
<br>
Does anyone know how to address this issue?<br>
</div>
</blockquote>
<div><br>
</div>
</span><div>Hi Gwyneth,</div>
<div><br>
</div>
<div>A few questions:</div>
<div><br>
</div>
<div>Can you check which nodes are appearing in $PBS_NODEFILE?</div>
<div><br>
</div>
<div>When you say it's running on all 24 available processors, what do you mean? Do you mean that "top" shows 24 processes, or that the CPU is at 100%? Could it be that you do in fact have three processes, but due to OpenMP, each process actually has 12 threads?</div>
<div><br>
</div>
<div>Can you do</div>
<div><br>
</div>
<div>ps -u $USER f</div>
<div><br>
</div>
<div>on the node while Cactus is running? This will show you a tree view of the processes that are running.</div>
<div><br>
</div>
<div>Is the parameter file using PUGH or Carpet? The standard output from the first process should show you the number of processes that Cactus thinks it is running on.</div>
<div><br>
</div>
<div>When you compiled the ET, did you specify the location of the MPI library with MPI_DIR? If Cactus failed to find the library automatically, it will have automatically built its own version of OpenMPI and linked with that version. If you then use an mpirun
from a different installation of MPI, things will not work correctly.</div>
<div><br>
</div>
<div>Maybe you can increase the verbosity of mpirun, to check how many processes it is trying to start.</div><span class="m_2633930787465889780m_-7887583222061194707gmail-m_-5473499630723080588m_5090236483679904337gmail-m_8921907416540160247gmail-m_-6770849734898051053m_8415244737964635264gmail-m_4190574942620669472gmail-m_-7048876862251853583m_9042496521132274219m_-9020275535935423507HOEnZb"><font color="#888888">
<div><br>
</div>
</font></span></div><span class="m_2633930787465889780m_-7887583222061194707gmail-m_-5473499630723080588m_5090236483679904337gmail-m_8921907416540160247gmail-m_-6770849734898051053m_8415244737964635264gmail-m_4190574942620669472gmail-m_-7048876862251853583m_9042496521132274219m_-9020275535935423507HOEnZb"><font color="#888888">
<div>
<div style="color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div style="color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div style="color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div style="color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div>-- </div>
<div>Ian Hinder</div>
<div><a href="http://members.aei.mpg.de/ianhin" target="_blank">http://members.aei.mpg.de/ianh<wbr>in</a></div>
</div>
</div>
</div>
</div>
</div></font></span><span>
<br>
Disclaimer - University of Cape Town This e-mail is subject to UCT policies and e-mail disclaimer published on our website at <a href="http://www.uct.ac.za/about/policies/emaildisclaimer/" target="_blank">http://www.uct.ac.za/about/pol<wbr>icies/emaildisclaimer/</a> or obtainable from <a href="tel:+27%2021%20650%209111" value="+27216509111" target="_blank">+27 21 650 9111</a>. If this e-mail is not related to the business
of UCT, it is sent by the sender in an individual capacity. Please report security incidents or abuse via <a href="mailto:csirt@uct.ac.za" target="_blank">csirt@uct.ac.za</a>
</span></div>
</blockquote></div><br></div></div></div>