<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 10/31/2024 12:03 PM, José Ferreira
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:1c0862b8-59de-4054-b2f8-a4422b658367@ua.pt">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel="stylesheet"
href="moz-extension://0116b865-45ce-4b65-8c35-cb16ae107484/vendor/textcomplete.css">
<link rel="stylesheet"
href="moz-extension://0116b865-45ce-4b65-8c35-cb16ae107484/vendor/textcomplete.css">
<div class="markdown-here-wrapper" data-md-url="" style=""
markdown-here-wrapper-content-modified="true">
<p style="margin: 0px 0px 1.2em !important;">Dear Toolkit
Community,</p>
<p style="margin: 0px 0px 1.2em !important;"><br>
</p>
<p style="margin: 0px 0px 1.2em !important;">I’m struggling to
make use of all of the available threads in the toolkit when
running on a machine that has hypter-threading enabled.</p>
<p style="margin: 0px 0px 1.2em !important;">On my local
machine, which does not have hypter-threading, if I invoke the
toolkit’s binary using <code
style="font-family: Consolas, Inconsolata, Courier, monospace;margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; font-weight: 550; background-color: rgba(119, 119, 119, 0.3); border-radius: 3px; display: inline;">OMP_NUM_THREADS=2 mpirun -np 4 -- exe/base -p par/parfile.par</code>,
it outputs<br>
</p>
<pre
style="font-family: Consolas, Inconsolata, Courier, monospace;font-size: 1em; line-height: 1.2em;margin: 1.2em 0px;"><code
style="font-family: Consolas, Inconsolata, Courier, monospace;margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; font-weight: 550; background-color: rgba(119, 119, 119, 0.3); border-radius: 3px; display: inline;white-space: pre; overflow: auto; border-radius: 3px; border: 1px solid rgb(204, 204, 204); padding: 0.5em 0.7em; display: block;">INFO (Carpet): MPI is enabled
INFO (Carpet): Carpet is running on 4 processes
INFO (Carpet): This is process 0
INFO (Carpet): OpenMP is enabled
INFO (Carpet): This process contains 2 threads, this is thread 0
INFO (Carpet): There are 8 threads in total
INFO (Carpet): There are 2 threads per process
</code></pre>
<p style="margin: 0px 0px 1.2em !important;">This creates 4
processes with 2 threads each and uses all 8 of the available
threads in my CPU, as expected.</p>
<p style="margin: 0px 0px 1.2em !important;">I am now free to
change the number of processes and threads as I see fit, in
order to look for the configuration that minimizes the
physical time per hour.</p>
<p style="margin: 0px 0px 1.2em !important;"><br>
</p>
<p style="margin: 0px 0px 1.2em !important;">However, most of my
computations are performed in Marenostrum5, where each machine
has 2 sockets, each with 56 physical cores with
hyper-threading enabled, totaling to 112 physical cores or 224
threads per machine. For some reason, the toolkit does not use
all of the available threads.</p>
<p style="margin: 0px 0px 1.2em !important;">To replicate the
scenario above, I use the following Slurm submission script</p>
<pre
style="font-family: Consolas, Inconsolata, Courier, monospace;font-size: 1em; line-height: 1.2em;margin: 1.2em 0px;"><code
style="font-family: Consolas, Inconsolata, Courier, monospace;margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; font-weight: 550; background-color: rgba(119, 119, 119, 0.3); border-radius: 3px; display: inline;white-space: pre; overflow: auto; border-radius: 3px; border: 1px solid rgb(204, 204, 204); padding: 0.5em 0.7em; display: block;">#!/usr/bin/env bash
#SBATCH -N 1
#SBATCH -n 4
#SBATCH -c 1
#SBATCH -t 30
export OMP_NUM_THREADS=2
srun --cpu-bind=none exe/base par/parfile.par
</code></pre>
<p style="margin: 0px 0px 1.2em !important;">where I ask for a
single machine, 4 tasks (to me, task = a process) per machine
and 1 CPU per task, which due to hyper-threading should
provide 2 threads per task.</p>
<p style="margin: 0px 0px 1.2em !important;">The output of the
toolkit is</p>
<pre
style="font-family: Consolas, Inconsolata, Courier, monospace;font-size: 1em; line-height: 1.2em;margin: 1.2em 0px;"><code
style="font-family: Consolas, Inconsolata, Courier, monospace;margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; font-weight: 550; background-color: rgba(119, 119, 119, 0.3); border-radius: 3px; display: inline;white-space: pre; overflow: auto; border-radius: 3px; border: 1px solid rgb(204, 204, 204); padding: 0.5em 0.7em; display: block;">754 INFO (Carpet): MPI is enabled
755 INFO (Carpet): Carpet is running on 4 processes
756 INFO (Carpet): This is process 0
757 INFO (Carpet): OpenMP is enabled
758 INFO (Carpet): This process contains 1 threads, this is thread 0
759 INFO (Carpet): There are 4 threads in total
760 INFO (Carpet): There are 1 threads per process
761 INFO (Carpet): This process runs on host gs22r3b16, pid=1514092
762 INFO (Carpet): This process runs on 8 cores: 54-55, 97, 105, 166-167, 209,217
763 INFO (Carpet): Thread 0 runs on 8 cores: 54-55, 97, 105, 166-167, 209, 217
</code></pre>
<p style="margin: 0px 0px 1.2em !important;">From the output
above you can see that I have been provided with 8 cores, even
though I have requested 4 CPUs in total, which means thar the
toolkit can see the available threads coming from
hyper-threading. </p>
<p style="margin: 0px 0px 1.2em !important;">It also shows that
it ignored my request for 2 threads per process, which I set
via the environmental variable <code
style="font-family: Consolas, Inconsolata, Courier, monospace;margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; font-weight: 550; background-color: rgba(119, 119, 119, 0.3); border-radius: 3px; display: inline;">OMP_NUM_THREADS</code>.
<br>
</p>
<p style="margin: 0px 0px 1.2em !important;">If I force <code
style="font-family: Consolas, Inconsolata, Courier, monospace;margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; font-weight: 550; background-color: rgba(119, 119, 119, 0.3); border-radius: 3px; display: inline;">CACTUS_NUM_THREADS=2</code>,
it crashes with the error</p>
<pre
style="font-family: Consolas, Inconsolata, Courier, monospace;font-size: 1em; line-height: 1.2em;margin: 1.2em 0px;"><code
style="font-family: Consolas, Inconsolata, Courier, monospace;margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap; font-weight: 550; background-color: rgba(119, 119, 119, 0.3); border-radius: 3px; display: inline;white-space: pre; overflow: auto; border-radius: 3px; border: 1px solid rgb(204, 204, 204); padding: 0.5em 0.7em; display: block;">INFO (Carpet): MPI is enabled
INFO (Carpet): Carpet is running on 4 processes
INFO (Carpet): This is process 0
INFO (Carpet): OpenMP is enabled
INFO (Carpet): This process contains 1 threads, this is thread 0
WARNING level 0 from host gs06r3b13 process 1
in thorn Carpet, file /gpfs/home/uapt/uapt015213/projects/cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:187:
-> The environment variable CACTUS_NUM_THREADS is set to 2, but there are 1 threads on this process. This may indicate a severe problem with the OpenMP startup mechanism.
</code></pre>
<p style="margin: 0px 0px 1.2em !important;">which leads me to
believe that it is MPI that is refusing to initialize more
threads, and not the toolkit itself.</p>
</div>
</blockquote>
I believe you are correct about this.<br>
<blockquote type="cite"
cite="mid:1c0862b8-59de-4054-b2f8-a4422b658367@ua.pt">
<div class="markdown-here-wrapper" data-md-url="" style=""
markdown-here-wrapper-content-modified="true">
<p style="margin: 0px 0px 1.2em !important;"><br>
</p>
<p style="margin: 0px 0px 1.2em !important;">My questions are:</p>
<ol style="margin: 1.2em 0px;padding-left: 2em;">
<li style="margin: 0.5em 0px;">
<p
style="margin: 0px 0px 1.2em !important;margin: 0.5em 0px !important;">Is
there a performance gain by making use of hyper-threading
knowing that the toolkit is memory bound and the different
threads share the same cache?</p>
</li>
</ol>
</div>
</blockquote>
I'm inclined to doubt it.<br>
<blockquote type="cite"
cite="mid:1c0862b8-59de-4054-b2f8-a4422b658367@ua.pt">
<div class="markdown-here-wrapper" data-md-url="" style=""
markdown-here-wrapper-content-modified="true">
<ol style="margin: 1.2em 0px;padding-left: 2em;">
<li style="margin: 0.5em 0px;"> <br>
</li>
<li style="margin: 0.5em 0px;">
<p
style="margin: 0px 0px 1.2em !important;margin: 0.5em 0px !important;">If
yes, how can I adapt my submission scripts to tell Cactus
to make use of hyper-threading?</p>
</li>
</ol>
</div>
</blockquote>
<p>Maybe ask your sysadmins if there is some magic forumla to give
to Slurm? <br>
</p>
<p>--Steve<br>
</p>
<blockquote type="cite"
cite="mid:1c0862b8-59de-4054-b2f8-a4422b658367@ua.pt">
<div class="markdown-here-wrapper" data-md-url="" style=""
markdown-here-wrapper-content-modified="true">
<ol style="margin: 1.2em 0px;padding-left: 2em;">
<li style="margin: 0.5em 0px;"> <br>
</li>
</ol>
<p style="margin: 0px 0px 1.2em !important;"><br>
</p>
<p style="margin: 0px 0px 1.2em !important;">Thank you in
advance,</p>
<p style="margin: 0px 0px 1.2em !important;">Best regards,</p>
<p style="margin: 0px 0px 1.2em !important;">José Ferreira</p>
<p style="margin: 0px 0px 1.2em !important;"><br>
</p>
</div>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre wrap="" class="moz-quote-pre">_______________________________________________
Users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Users@einsteintoolkit.org">Users@einsteintoolkit.org</a>
<a class="moz-txt-link-freetext" href="http://lists.einsteintoolkit.org/mailman/listinfo/users">http://lists.einsteintoolkit.org/mailman/listinfo/users</a>
</pre>
</blockquote>
</body>
</html>