<div dir="auto"><div data-smartmail="gmail_signature">Thank you, Peter, Samuel, Steve, and Erik for the suggestions and comments. We will try to check for more options for optimizing the performance.</div><div data-smartmail="gmail_signature" dir="auto"><br></div><div data-smartmail="gmail_signature" dir="auto">Regards,</div><div data-smartmail="gmail_signature" dir="auto">Spandan Sarma</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 21 Dec 2022, 15:25 Peter Diener, &lt;<a href="mailto:diener@cct.lsu.edu">diener@cct.lsu.edu</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear Spendan,<br>

<br>

You say your simulations performed 2840 timesteps in half an hour on 32 <br>

procs, which is 5680 timesteps per hour. Running for a full day you get <br>

132105 timesteps, i.e. 5504 timesteps per hour. So you&#39;re right there is a <br>

small difference in speed. However, remember that the grid structure will <br>

be changing as the black holes move across the grid, so some variation in <br>

speed is to be expected. I think the small difference you observed is <br>

within the natural range of variation.<br>

<br>

Cheers,<br>

<br>

   Peter<br>

<br>

<br>

On Thu, 15 Dec 2022, Spandan Sarma 19306 wrote:<br>

<br>

&gt; Dear Erik and Steven,<br>

&gt; <br>

&gt; Thank you so much for the suggestions. We changed the runscript to add -x<br>

&gt; OMP_NUMTHREADS to the command line and it worked in solving the issue with<br>

&gt; the total number of threads being 144. Now it sets to 32 (equal to the<br>

&gt; number of procs).<br>

&gt; <br>

&gt; Also, the iterations have increased to 132105 for 32 procs (24 hr walltime)<br>

&gt; compared to just 240 before. Although this is a huge increase, we expected<br>

&gt; it to be a bit more. For a shorter walltime (30 mins) we received iterations<br>

&gt; - 2840, 2140, 1216 for procs - 32, 16, 8. Are there any more changes that we<br>

&gt; can do to improve on this?<br>

&gt; <br>

&gt; The new runscript and the output file (as a drive link) are attached below<br>

&gt; (no changes were made to the machine file, option list and the submit script<br>

&gt; from before).<br>

&gt; <br>

&gt; p32_omp.out<br>

&gt; <br>

&gt; On Thu, Dec 15, 2022 at 2:49 PM Spandan Sarma 19306 &lt;<a href="mailto:spandan19@iiserb.ac.in" target="_blank" rel="noreferrer">spandan19@iiserb.ac.in</a>&gt;<br>

&gt; wrote:<br>

&gt;       Dear Erik and Steven,<br>

&gt; <br>

&gt; Thank you so much for the suggestions. We changed the runscript to add<br>

&gt; -x OMP_NUMTHREADS to the command line and it worked in solving the<br>

&gt; issue with the total number of threads being 144. Now it sets to 32<br>

&gt; (equal to the number of procs).<br>

&gt; <br>

&gt; Also, the iterations have increased to 132105 for 32 procs (24 hr<br>

&gt; walltime) compared to just 240 before. Although this is a huge<br>

&gt; increase, we expected it to be a bit more. For a shorter walltime (30<br>

&gt; mins) we received iterations - 2840, 2140, 1216 for procs - 32, 16, 8.<br>

&gt; Are there any more changes that we can do to improve on this?<br>

&gt; <br>

&gt; The new runscript and the output file for 32 procs are attached below<br>

&gt; (no changes were made to the machine file, option list and the submit<br>

&gt; script from before).<br>

&gt; <br>

&gt; On Fri, Dec 9, 2022 at 8:13 PM Steven R. Brandt &lt;<a href="mailto:sbrandt@cct.lsu.edu" target="_blank" rel="noreferrer">sbrandt@cct.lsu.edu</a>&gt;<br>

&gt; wrote:<br>

&gt;       It&#39;s not too late to do a check, though, to see if all<br>

&gt;       other nodes have<br>

&gt;       the same OMP_NUM_THREADS value. Maybe that&#39;s the warning?<br>

&gt;       It sounds like<br>

&gt;       it should be an error.<br>

&gt;<br>

&gt;       --Steve<br>

&gt;<br>

&gt;       On 12/8/2022 5:23 PM, Erik Schnetter wrote:<br>

&gt;       &gt; Steve<br>

&gt;       &gt;<br>

&gt;       &gt; Code that runs as part of the Cactus executable is<br>

&gt;       running too late<br>

&gt;       &gt; for this. At that time, OpenMP has already been<br>

&gt;       initialized.<br>

&gt;       &gt;<br>

&gt;       &gt; There is the environment variable &quot;CACTUS_NUM_THREADS&quot;<br>

&gt;       which is<br>

&gt;       &gt; checked at run time, but only if it is set (for backward<br>

&gt;       &gt; compatibility). Most people do not bother setting it,<br>

&gt;       leaving this<br>

&gt;       &gt; error undetected. There is a warning output, but these<br>

&gt;       are generally<br>

&gt;       &gt; ignored.<br>

&gt;       &gt;<br>

&gt;       &gt; -erik<br>

&gt;       &gt;<br>

&gt;       &gt; On Thu, Dec 8, 2022 at 3:48 PM Steven R. Brandt<br>

&gt;       &lt;<a href="mailto:sbrandt@cct.lsu.edu" target="_blank" rel="noreferrer">sbrandt@cct.lsu.edu</a>&gt; wrote:<br>

&gt;       &gt;&gt; We could probably add some startup code in which MPI<br>

&gt;       broadcasts the<br>

&gt;       &gt;&gt; OMP_NUM_THREADS setting to all the other processes and<br>

&gt;       either checks the<br>

&gt;       &gt;&gt; value of the environment variable or calls<br>

&gt;       omp_set_num_threads() or some<br>

&gt;       &gt;&gt; such.<br>

&gt;       &gt;&gt;<br>

&gt;       &gt;&gt; --Steve<br>

&gt;       &gt;&gt;<br>

&gt;       &gt;&gt; On 12/8/2022 9:03 AM, Erik Schnetter wrote:<br>

&gt;       &gt;&gt;&gt; Spandan<br>

&gt;       &gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt; The problem is likely that MPI does not automatically<br>

&gt;       forward your<br>

&gt;       &gt;&gt;&gt; OpenMP setting to the other nodes. You are setting the<br>

&gt;       environment<br>

&gt;       &gt;&gt;&gt; variable OMP_NUM_THREADS in the run script, and it is<br>

&gt;       likely necessary<br>

&gt;       &gt;&gt;&gt; to forward this environment variable to the other<br>

&gt;       processes as well.<br>

&gt;       &gt;&gt;&gt; Your MPI documentation will tell you how to do this.<br>

&gt;       This is likely an<br>

&gt;       &gt;&gt;&gt; additional option you need to pass when calling<br>

&gt;       &quot;mpirun&quot;.<br>

&gt;       &gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt; -erik<br>

&gt;       &gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt; On Thu, Dec 8, 2022 at 2:50 AM Spandan Sarma 19306<br>

&gt;       &gt;&gt;&gt; &lt;<a href="mailto:spandan19@iiserb.ac.in" target="_blank" rel="noreferrer">spandan19@iiserb.ac.in</a>&gt; wrote:<br>

&gt;       &gt;&gt;&gt;&gt; Hello,<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; This mail is in continuation to the ticket, “Issue<br>

&gt;       with compiling ET on cluster”, by Shamim.<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; So after Roland’s suggestion, we found that using the<br>

&gt;       –prefix &lt;openmpi-directory&gt; command along with hostfile<br>

&gt;       worked successfully in simulating a multiple node<br>

&gt;       simulation in our HPC.<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; Now we find that the BNSM gallery simulation evolves<br>

&gt;       for only 240 iterations on 2 nodes (16+16 procs, 24 hr<br>

&gt;       walltime), which is very slow with respect to, simulation<br>

&gt;       on 1 node (16 procs, 24 hr walltime) evolved for 120988<br>

&gt;       iterations. The parallelization process goes well within 1<br>

&gt;       node, we received iterations - 120988, 67756, 40008 for<br>

&gt;       procs - 16, 8, 4 (24 hr walltime) respectively. We are<br>

&gt;       unable to understand what is causing this issue when<br>

&gt;       openmpi is given 2 nodes (16+16 procs).<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; In the output files we found the following, which may<br>

&gt;       be an indication towards the issue:<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; IINFO (Carpet): MPI is enabled<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; INFO (Carpet): Carpet is running on 32 processes<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; INFO (Carpet): This is process 0<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; INFO (Carpet): OpenMP is enabled<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; INFO (Carpet): This process contains 1 threads, this<br>

&gt;       is thread 0<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; INFO (Carpet): There are 144 threads in total<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; INFO (Carpet): There are 4.5 threads per process<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; INFO (Carpet): This process runs on host n129,<br>

&gt;       pid=20823<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; INFO (Carpet): This process runs on 1 core: 0<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; INFO (Carpet): Thread 0 runs on 1 core: 0<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; INFO (Carpet): This simulation is running in 3<br>

&gt;       dimensions<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; INFO (Carpet): Boundary specification for map 0:<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt;      nboundaryzones: [[3,3,3],[3,3,3]]<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt;      is_internal   : [[0,0,0],[0,0,0]]<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt;      is_staggered  : [[0,0,0],[0,0,0]]<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt;      shiftout      : [[1,0,1],[0,0,0]]<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; WARNING level 1 from host n131 process 21<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt;     in thorn Carpet, file<br>

&gt;       /home2/mallick/ET9/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:426:<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt;     -&gt; The number of threads for this process is<br>

&gt;       larger its number of cores. This may indicate a<br>

&gt;       performance problem.<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; This is something that we couldn’t understand as we<br>

&gt;       asked for only 32 procs, with num-threads set to 1. The<br>

&gt;       command that we used to submit our job was:<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt;    ./simfactory/bin/sim create-submit p32_mpin_npn<br>

&gt;       --procs=32 --ppn=16 --num-threads=1 --ppn-used=16<br>

&gt;       --num-smt=1 --parfile=par/nsnstohmns1.par<br>

&gt;       --walltime=24:10:00<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; I have attached the out file, runscript,<br>

&gt;       submitscript, optionlist, machine file for reference.<br>

&gt;       Thanks in advance for help.<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; Sincerely,<br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       &gt;&gt;&gt;&gt; --<br>

&gt;       &gt;&gt;&gt;&gt; Spandan Sarma<br>

&gt;       &gt;&gt;&gt;&gt; BS-MS&#39; 19<br>

&gt;       &gt;&gt;&gt;&gt; Department of Physics (4th Year),<br>

&gt;       &gt;&gt;&gt;&gt; IISER Bhopal<br>

&gt;       &gt;&gt;&gt;&gt; _______________________________________________<br>

&gt;       &gt;&gt;&gt;&gt; Users mailing list<br>

&gt;       &gt;&gt;&gt;&gt; <a href="mailto:Users@einsteintoolkit.org" target="_blank" rel="noreferrer">Users@einsteintoolkit.org</a><br>

&gt;       &gt;&gt;&gt;&gt;<br>

&gt;       <a href="http://lists.einsteintoolkit.org/mailman/listinfo/users" rel="noreferrer noreferrer" target="_blank">http://lists.einsteintoolkit.org/mailman/listinfo/users</a><br>

&gt;       &gt;&gt;&gt;<br>

&gt;       &gt;&gt; _______________________________________________<br>

&gt;       &gt;&gt; Users mailing list<br>

&gt;       &gt;&gt; <a href="mailto:Users@einsteintoolkit.org" target="_blank" rel="noreferrer">Users@einsteintoolkit.org</a><br>

&gt;       &gt;&gt; <a href="http://lists.einsteintoolkit.org/mailman/listinfo/users" rel="noreferrer noreferrer" target="_blank">http://lists.einsteintoolkit.org/mailman/listinfo/users</a><br>

&gt;       &gt;<br>

&gt;       &gt;<br>

&gt;       _______________________________________________<br>

&gt;       Users mailing list<br>

&gt;       <a href="mailto:Users@einsteintoolkit.org" target="_blank" rel="noreferrer">Users@einsteintoolkit.org</a><br>

&gt;       <a href="http://lists.einsteintoolkit.org/mailman/listinfo/users" rel="noreferrer noreferrer" target="_blank">http://lists.einsteintoolkit.org/mailman/listinfo/users</a><br>

&gt; <br>

&gt; <br>

&gt; <br>

&gt; --<br>

&gt; Spandan Sarma<br>

&gt; BS-MS&#39; 19Department of Physics (4th Year),<br>

&gt; IISER Bhopal<br>

&gt; <br>

&gt; <br>

&gt; <br>

&gt; --<br>

&gt; Spandan Sarma<br>

&gt; BS-MS&#39; 19Department of Physics (4th Year),<br>

&gt; IISER Bhopal<br>

&gt; <br>

&gt;</blockquote></div>