<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link rel="stylesheet" href="moz-extension://0116b865-45ce-4b65-8c35-cb16ae107484/vendor/textcomplete.css">
<link rel="stylesheet" href="moz-extension://0116b865-45ce-4b65-8c35-cb16ae107484/vendor/textcomplete.css">
</head>
<body>
<div class="markdown-here-wrapper" data-md-url="" style="" markdown-here-wrapper-content-modified="true">
<p style="margin: 0px 0px 1.2em !important;">Dear Toolkit
Community,</p>
<p style="margin: 0px 0px 1.2em !important;"><br>
</p>
<p style="margin: 0px 0px 1.2em !important;">I’m struggling to
make use of all of the available threads in the toolkit when
running on a machine that has hypter-threading enabled.</p>
<p style="margin: 0px 0px 1.2em !important;">On my local machine,
which does not have hypter-threading, if I invoke the toolkit’s
binary using <code style="font-family: Consolas, Inconsolata, Courier, monospace;margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; font-weight: 550; background-color: rgba(119, 119, 119, 0.3); border-radius: 3px; display: inline;">OMP_NUM_THREADS=2 mpirun -np 4 -- exe/base -p par/parfile.par</code>,
it outputs<br>
</p>
<pre style="font-family: Consolas, Inconsolata, Courier, monospace;font-size: 1em; line-height: 1.2em;margin: 1.2em 0px;"><code style="font-family: Consolas, Inconsolata, Courier, monospace;margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; font-weight: 550; background-color: rgba(119, 119, 119, 0.3); border-radius: 3px; display: inline;white-space: pre; overflow: auto; border-radius: 3px; border: 1px solid rgb(204, 204, 204); padding: 0.5em 0.7em; display: block;">INFO (Carpet): MPI is enabled
INFO (Carpet): Carpet is running on 4 processes
INFO (Carpet): This is process 0
INFO (Carpet): OpenMP is enabled
INFO (Carpet): This process contains 2 threads, this is thread 0
INFO (Carpet): There are 8 threads in total
INFO (Carpet): There are 2 threads per process
</code></pre>
<p style="margin: 0px 0px 1.2em !important;">This creates 4
processes with 2 threads each and uses all 8 of the available
threads in my CPU, as expected.</p>
<p style="margin: 0px 0px 1.2em !important;">I am now free to
change the number of processes and threads as I see fit, in
order to look for the configuration that minimizes the physical
time per hour.</p>
<p style="margin: 0px 0px 1.2em !important;"><br>
</p>
<p style="margin: 0px 0px 1.2em !important;">However, most of my
computations are performed in Marenostrum5, where each machine
has 2 sockets, each with 56 physical cores with hyper-threading
enabled, totaling to 112 physical cores or 224 threads per
machine. For some reason, the toolkit does not use all of the
available threads.</p>
<p style="margin: 0px 0px 1.2em !important;">To replicate the
scenario above, I use the following Slurm submission script</p>
<pre style="font-family: Consolas, Inconsolata, Courier, monospace;font-size: 1em; line-height: 1.2em;margin: 1.2em 0px;"><code style="font-family: Consolas, Inconsolata, Courier, monospace;margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; font-weight: 550; background-color: rgba(119, 119, 119, 0.3); border-radius: 3px; display: inline;white-space: pre; overflow: auto; border-radius: 3px; border: 1px solid rgb(204, 204, 204); padding: 0.5em 0.7em; display: block;">#!/usr/bin/env bash
#SBATCH -N 1
#SBATCH -n 4
#SBATCH -c 1
#SBATCH -t 30
export OMP_NUM_THREADS=2
srun --cpu-bind=none exe/base par/parfile.par
</code></pre>
<p style="margin: 0px 0px 1.2em !important;">where I ask for a
single machine, 4 tasks (to me, task = a process) per machine
and 1 CPU per task, which due to hyper-threading should provide
2 threads per task.</p>
<p style="margin: 0px 0px 1.2em !important;">The output of the
toolkit is</p>
<pre style="font-family: Consolas, Inconsolata, Courier, monospace;font-size: 1em; line-height: 1.2em;margin: 1.2em 0px;"><code style="font-family: Consolas, Inconsolata, Courier, monospace;margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; font-weight: 550; background-color: rgba(119, 119, 119, 0.3); border-radius: 3px; display: inline;white-space: pre; overflow: auto; border-radius: 3px; border: 1px solid rgb(204, 204, 204); padding: 0.5em 0.7em; display: block;">754 INFO (Carpet): MPI is enabled
755 INFO (Carpet): Carpet is running on 4 processes
756 INFO (Carpet): This is process 0
757 INFO (Carpet): OpenMP is enabled
758 INFO (Carpet): This process contains 1 threads, this is thread 0
759 INFO (Carpet): There are 4 threads in total
760 INFO (Carpet): There are 1 threads per process
761 INFO (Carpet): This process runs on host gs22r3b16, pid=1514092
762 INFO (Carpet): This process runs on 8 cores: 54-55, 97, 105, 166-167, 209,217
763 INFO (Carpet): Thread 0 runs on 8 cores: 54-55, 97, 105, 166-167, 209, 217
</code></pre>
<p style="margin: 0px 0px 1.2em !important;">From the output above
you can see that I have been provided with 8 cores, even though
I have requested 4 CPUs in total, which means thar the toolkit
can see the available threads coming from hyper-threading. </p>
<p style="margin: 0px 0px 1.2em !important;">It also shows that it
ignored my request for 2 threads per process, which I set via
the environmental variable <code style="font-family: Consolas, Inconsolata, Courier, monospace;margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; font-weight: 550; background-color: rgba(119, 119, 119, 0.3); border-radius: 3px; display: inline;">OMP_NUM_THREADS</code>.
<br>
</p>
<p style="margin: 0px 0px 1.2em !important;">If I force <code style="font-family: Consolas, Inconsolata, Courier, monospace;margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; font-weight: 550; background-color: rgba(119, 119, 119, 0.3); border-radius: 3px; display: inline;">CACTUS_NUM_THREADS=2</code>,
it crashes with the error</p>
<pre style="font-family: Consolas, Inconsolata, Courier, monospace;font-size: 1em; line-height: 1.2em;margin: 1.2em 0px;"><code style="font-family: Consolas, Inconsolata, Courier, monospace;margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; font-weight: 550; background-color: rgba(119, 119, 119, 0.3); border-radius: 3px; display: inline;white-space: pre; overflow: auto; border-radius: 3px; border: 1px solid rgb(204, 204, 204); padding: 0.5em 0.7em; display: block;">INFO (Carpet): MPI is enabled
INFO (Carpet): Carpet is running on 4 processes
INFO (Carpet): This is process 0
INFO (Carpet): OpenMP is enabled
INFO (Carpet): This process contains 1 threads, this is thread 0
WARNING level 0 from host gs06r3b13 process 1
in thorn Carpet, file /gpfs/home/uapt/uapt015213/projects/cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:187:
-> The environment variable CACTUS_NUM_THREADS is set to 2, but there are 1 threads on this process. This may indicate a severe problem with the OpenMP startup mechanism.
</code></pre>
<p style="margin: 0px 0px 1.2em !important;">which leads me to
believe that it is MPI that is refusing to initialize more
threads, and not the toolkit itself.</p>
<p style="margin: 0px 0px 1.2em !important;"><br>
</p>
<p style="margin: 0px 0px 1.2em !important;">My questions are:</p>
<ol style="margin: 1.2em 0px;padding-left: 2em;">
<li style="margin: 0.5em 0px;">
<p style="margin: 0px 0px 1.2em !important;margin: 0.5em 0px !important;">Is
there a performance gain by making use of hyper-threading
knowing that the toolkit is memory bound and the different
threads share the same cache?</p>
</li>
<li style="margin: 0.5em 0px;">
<p style="margin: 0px 0px 1.2em !important;margin: 0.5em 0px !important;">If
yes, how can I adapt my submission scripts to tell Cactus to
make use of hyper-threading?</p>
</li>
</ol>
<p style="margin: 0px 0px 1.2em !important;"><br>
</p>
<p style="margin: 0px 0px 1.2em !important;">Thank you in advance,</p>
<p style="margin: 0px 0px 1.2em !important;">Best regards,</p>
<p style="margin: 0px 0px 1.2em !important;">José Ferreira</p>
<p style="margin: 0px 0px 1.2em !important;"><br>
</p>
</div>
</body>
</html>