[Users] Generalising to OpenMP

Erik Schnetter schnetter at cct.lsu.edu
Thu Mar 29 10:41:23 CDT 2018


Chris

I'm sure that omp_get_num_threads is working fine. It is supposed to
return the number of currently (!) active threads. Outside a parallel
region, it returns 1. I'm surprised that Intel returns something else.
You can call omp_get_max_threads to find out how many threads can be
used inside a parallel region. Of course, a parallel region might use
fewer threads, e.g. if a "num_threads" directive is used.

You can write

int num_threads;
#pragma omp parallel
{
#pragma omp master
  num_threads = omp_get_num_threads();
}

to call omp_get_num_threads in the proper way.

-erik




On Thu, Mar 29, 2018 at 8:05 AM, Chris Stevens <c.stevens at ru.ac.za> wrote:
> Hi Erik,
>
> thanks for your reply.
>
> I did not know that OpenMP can change the number of threads during
> execution! I will definitely put in a check for this.
>
> I also came across something rather odd: omp_get_num_threads() does not work
> when using gcc (always returns 1), while Intel seems fine. I found a little
> code snippet online that seems to work fine:
>
> int n = 0;
>   #pragma omp parallel reduction(+:n)
>   n += 1;
>   return n;
>
> Thanks for the info,
>
> Chris
>
>
> On 03/28/2018 03:06 PM, Erik Schnetter wrote:
>
> Chris
>
> The standard approach is to have one scratch array per OpenMP thread.
> This looks like
>
> #pragma omp parallel
> {
>    ... initialise array ...
> #pragma omp for
>   for (...) {
>     ... loop ...
>   }
> }
>
> I'm repeating what you wrote since the inner "omp for" should not (!)
> have a "parallel" directive, since it is already located in a parallel
> region. Depending on how you clean up the scratch array, you could
> also look at the "nowait" directive for the "omp for" block.
>
> If it is expensive to set up the scratch array, then you can cache
> this. Remember that the number of OpenMP threads can change during
> execution. Even if you do not want to handle this case, you should
> still detect it and abort. I typically use a C++ vector for the
> thread-specific data structures.
>
> vector<foo> scratch;
>
> And the loop that uses this:
>
> #pragma omp parallel
> {
> #pragma single
>   if (scratch.size() < omp_get_num_threads()) {
>     ... call initialization routine ...
>  }
> #pragma omp for
>   for (...) {
>     ... loop ..
>   }
> }
>
> The "omp single" region is executed by exactly one thread, and all
> other threads will wait until this thread has finished, ensuring the
> scratch data structure is correctly set up.
>
> -erik
>
>
>
>
>
> On Wed, Mar 28, 2018 at 4:24 AM, Chris Stevens <c.stevens at ru.ac.za> wrote:
>
> Hi everyone,
>
> I am new to OpenMP programming, and have quickly come across an (I think)
> interesting task that I want to parallelise with OpenMP that does not seem
> straightforwardly generalisable. I would like to hear your feedback on the
> potential solutions and whether there is a better way to do it:
>
> Task: Create a scratch array, say of CCTK_REAL, during an initialisation
> phase that is shared with other code using a header file. This array is
> defined only once. This scratch array is then used as temporary storage
> space in a function that does some computation within a for loop. i.e.
>
> function blah ()
>
> {
>
>     for (....)
>
>     {
>
>         computation involving scratch array
>
>     }
>
> }
>
> This works fine when OpenMP is turned off. However when you add to the for
> loop above #pragma omp parallel for, then one sees that this one scratch
> array will be being used by multiple threads at once, thus causing issues.
>
> Solution 1: Each thread has its own scratch space. I am not sure if it is
> within the OpenMP paradigm to create scratch spaces for each thread within
> the initialisation file, and share them through a header. This doesn't seem
> to work and I am thinking this shouldn't be possible?
>
> Solution 2: Create one serial scratch space during initialisation that has
> size dependent on omp_get_num_threads(). One would then have to change the
> computation involving the scratch array to be dependent on
> omp_get_thread_num() so as to use its allocated part of the scratch space.
>
> Solution 3: One could do something like
>
> #pragma omp parallel
>
> {
>
>     create scratch space
>
>     #pragma omp parallel for
>
>     for
>
>     {
>
>         computation involving thread specific scratch space
>
>     }
>
> }
>
> In summary: Solution 1 doesn't seem to work and probably doesn't fit into
> the OpenMP paradigm. Solution 2 would work but doesn't seem very nice.
> Solution 3 is nicer but inefficient as I don't want to be creating scratch
> spaces all the time.
>
> Is there perhaps another way that fits better with the OpenMP paradigm?
>
> Many thanks!
>
> Chris
>
> --
> Dr Chris Stevens
>
> Claude Leon Postdoctoral Fellow
>
> Department of Mathematics
>
> Rhodes University
>
> Room 5
>
> Ph: +27 46 603 8932
>
> Web: www.chrisdoesmaths.com
>
>
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
>
>
>
>
> --
> Dr Chris Stevens
>
> Claude Leon Postdoctoral Fellow
>
> Department of Mathematics
>
> Rhodes University
>
> Room 5
>
> Ph: +27 46 603 8932
>
> Web: www.chrisdoesmaths.com



-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/


More information about the Users mailing list