[Users] Generalising to OpenMP

Wed Mar 28 08:06:15 CDT 2018

Chris

The standard approach is to have one scratch array per OpenMP thread.
This looks like

#pragma omp parallel
{
   ... initialise array ...
#pragma omp for
  for (...) {
    ... loop ...
  }
}

I'm repeating what you wrote since the inner "omp for" should not (!)
have a "parallel" directive, since it is already located in a parallel
region. Depending on how you clean up the scratch array, you could
also look at the "nowait" directive for the "omp for" block.

If it is expensive to set up the scratch array, then you can cache
this. Remember that the number of OpenMP threads can change during
execution. Even if you do not want to handle this case, you should
still detect it and abort. I typically use a C++ vector for the
thread-specific data structures.

vector<foo> scratch;

And the loop that uses this:

#pragma omp parallel
{
#pragma single
  if (scratch.size() < omp_get_num_threads()) {
    ... call initialization routine ...
 }
#pragma omp for
  for (...) {
    ... loop ..
  }
}

The "omp single" region is executed by exactly one thread, and all
other threads will wait until this thread has finished, ensuring the
scratch data structure is correctly set up.

-erik

On Wed, Mar 28, 2018 at 4:24 AM, Chris Stevens <c.stevens at ru.ac.za> wrote:
> Hi everyone,
>
> I am new to OpenMP programming, and have quickly come across an (I think)
> interesting task that I want to parallelise with OpenMP that does not seem
> straightforwardly generalisable. I would like to hear your feedback on the
> potential solutions and whether there is a better way to do it:
>
> Task: Create a scratch array, say of CCTK_REAL, during an initialisation
> phase that is shared with other code using a header file. This array is
> defined only once. This scratch array is then used as temporary storage
> space in a function that does some computation within a for loop. i.e.
>
> function blah ()
>
> {
>
>     for (....)
>
>     {
>
>         computation involving scratch array
>
>     }
>
> }
>
> This works fine when OpenMP is turned off. However when you add to the for
> loop above #pragma omp parallel for, then one sees that this one scratch
> array will be being used by multiple threads at once, thus causing issues.
>
> Solution 1: Each thread has its own scratch space. I am not sure if it is
> within the OpenMP paradigm to create scratch spaces for each thread within
> the initialisation file, and share them through a header. This doesn't seem
> to work and I am thinking this shouldn't be possible?
>
> Solution 2: Create one serial scratch space during initialisation that has
> size dependent on omp_get_num_threads(). One would then have to change the
> computation involving the scratch array to be dependent on
> omp_get_thread_num() so as to use its allocated part of the scratch space.
>
> Solution 3: One could do something like
>
> #pragma omp parallel
>
> {
>
>     create scratch space
>
>     #pragma omp parallel for
>
>     for
>
>     {
>
>         computation involving thread specific scratch space
>
>     }
>
> }
>
> In summary: Solution 1 doesn't seem to work and probably doesn't fit into
> the OpenMP paradigm. Solution 2 would work but doesn't seem very nice.
> Solution 3 is nicer but inefficient as I don't want to be creating scratch
> spaces all the time.
>
> Is there perhaps another way that fits better with the OpenMP paradigm?
>
> Many thanks!
>
> Chris
>
> --
> Dr Chris Stevens
>
> Claude Leon Postdoctoral Fellow
>
> Department of Mathematics
>
> Rhodes University
>
> Room 5
>
> Ph: +27 46 603 8932
>
> Web: www.chrisdoesmaths.com
>
>
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
>

-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/