[Users] Issue with Multiple Node Simulation on cluster

Samuel Tootle tootle at itp.uni-frankfurt.de
Mon Dec 19 12:01:50 CST 2022


Dear Spandan,

A minor addition to Erik's statement. If you haven't already, I would recommend outputing carpet timing statistics. Unless you are running just a calculation for 24hrs (no output, no checkpoints, no regridding etc), a 30min test will give you a false sense of the expected evolution time per hour.

Cheers,
Samuel

From: Erik Schnetter <schnetter at gmail.com>
To: Spandan Sarma 19306 <spandan19 at iiserb.ac.in>
CC: users at einsteintoolkit.org
Date: Dec 19, 2022 18:33:07
Subject: Re: [Users] Issue with Multiple Node Simulation on cluster

> Spandan
> 
> It is quite possible that different build options, different MPI
> options, or different parameter settings would improve the performance
> of your calculation. Performance optimisation is a difficult topic,
> and it's impossible to say anything in general. A good starting point
> would be to run your simulation on a different system, to run a
> different parameter file with a known setup on your system, and then
> to compare.
> 
> -erik
> 
> 
> 
> 
> On Thu, Dec 15, 2022 at 4:19 AM Spandan Sarma 19306
> <spandan19 at iiserb.ac.in> wrote:
>> 
>> Dear Erik and Steven,
>> 
>> Thank you so much for the suggestions. We changed the runscript to add -x OMP_NUMTHREADS to the command line and it worked in solving the issue with the total number of threads being 144. Now it sets to 32 (equal to the number of procs).
>> 
>> Also, the iterations have increased to 132105 for 32 procs (24 hr walltime) compared to just 240 before. Although this is a huge increase, we expected it to be a bit more. For a shorter walltime (30 mins) we received iterations - 2840, 2140, 1216 for procs - 32, 16, 8. Are there any more changes that we can do to improve on this?
>> 
>> The new runscript and the output file for 32 procs are attached below (no changes were made to the machine file, option list and the submit script from before).
>> 
>> On Fri, Dec 9, 2022 at 8:13 PM Steven R. Brandt <sbrandt at cct.lsu.edu> wrote:
>>> 
>>> It's not too late to do a check, though, to see if all other nodes have
>>> the same OMP_NUM_THREADS value. Maybe that's the warning? It sounds like
>>> it should be an error.
>>> 
>>> --Steve
>>> 
>>> On 12/8/2022 5:23 PM, Erik Schnetter wrote:
>>>> Steve
>>>> 
>>>> Code that runs as part of the Cactus executable is running too late
>>>> for this. At that time, OpenMP has already been initialized.
>>>> 
>>>> There is the environment variable "CACTUS_NUM_THREADS" which is
>>>> checked at run time, but only if it is set (for backward
>>>> compatibility). Most people do not bother setting it, leaving this
>>>> error undetected. There is a warning output, but these are generally
>>>> ignored.
>>>> 
>>>> -erik
>>>> 
>>>> On Thu, Dec 8, 2022 at 3:48 PM Steven R. Brandt <sbrandt at cct.lsu.edu> wrote:
>>>>> We could probably add some startup code in which MPI broadcasts the
>>>>> OMP_NUM_THREADS setting to all the other processes and either checks the
>>>>> value of the environment variable or calls omp_set_num_threads() or some
>>>>> such.
>>>>> 
>>>>> --Steve
>>>>> 
>>>>> On 12/8/2022 9:03 AM, Erik Schnetter wrote:
>>>>>>>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at einsteintoolkit.org
>>>>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>>> 
>>>> 
>>> _______________________________________________
>>> Users mailing list
>>> Users at einsteintoolkit.org
>>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>> 
>> 
>> 
>> --
>> Spandan Sarma
>> BS-MS' 19
>> Department of Physics (4th Year),
>> IISER Bhopal
> 
> 
> 
> -- 
> Erik Schnetter <schnetter at gmail.com>
> http://www.perimeterinstitute.ca/personal/eschnetter/
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users


More information about the Users mailing list