[Users] OpenMP is making it slower?

Wed May 18 17:33:49 CDT 2011

Ok. Still having problems.  
I defaulted everything to private and explicitly declared my shared's.  Now what happens is that the outer "k" loop never gets incremented. 
Even if I run with only one thread, "k" always equals 1.

So the snippet of code follows.  Where m_ex and m_ib are parameters in Fortran and are hard-coded as numbers by the compiler.
If I compile with -fopenmp it works fine, but at the -fopenmp and "setenv OMP_NUM_THREADS 1" and it won't increment.

Any new ideas?  Thanks in advance.
-Scott

!$OMP PARALLEL DO DEFAULT(PRIVATE) SHARED(mask,ibonly,ax,ay,az,
!$OMP&        agxx,agxy,agxz,agyy,agyz,agzz,
!$OMP&        aKxx,aKxy,aKxz,aKyy,aKyz,aKzz,nx,ny,nz) 
!$OMP& SCHEDULE(STATIC,chunk)          
         do k = 1, nz 
            write(msg,*)'      setbkgrnd:   ' //
     &              'nz = ',nz,',   % done = ',int(k*1.0d2/nz),'  '
            call writemessage(msg)

            do j = 1, ny
               do i = 1, nx

                  if (mask(i,j,k) .ne. m_ex .and.
     &                (ibonly .eq. 0 .or. 
     &                 mask(i,j,k) .eq. m_ib)) then

                   x = ax(i)
                   y = ay(j)
                   z = az(k)
                   include 'gd.inc'
                   include 'kd.inc'

                     agxx(i,j,k) = gxx
                     agxy(i,j,k) = gxy
                     agxz(i,j,k) = gxz
                     agyy(i,j,k) = gyy
                     agyz(i,j,k) = gyz
                     agzz(i,j,k) = gzz

                     aKxx(i,j,k) = Kxx
                     aKxy(i,j,k) = Kxy
                     aKxz(i,j,k) = Kxz
                     aKyy(i,j,k) = Kyy
                     aKyz(i,j,k) = Kyz
                     aKzz(i,j,k) = Kzz
                 else if (mask(i,j,k) .eq. m_ex) then
c                   Excised points
                     agxx(i,j,k) = exval
                     agxy(i,j,k) = exval
                     agxz(i,j,k) = exval
                     agyy(i,j,k) = exval
                     agyz(i,j,k) = exval
                     agzz(i,j,k) = exval

                     aKxx(i,j,k) = exval
                     aKxy(i,j,k) = exval
                     aKxz(i,j,k) = exval
                     aKyy(i,j,k) = exval
                     aKyz(i,j,k) = exval
                     aKzz(i,j,k) = exval
                 endif

               enddo
            enddo
         enddo

There is no explicit OMP END DO statement because it's optional.

--
Scott H. Hawley, Ph.D. 	 		Asst. Prof. of Physics                            
Chemistry & Physics Dept       		Office: Hitch 100D             
Belmont University                 	Tel:  +1-615-460-6206
Nashville, TN 37212 USA           	Fax: +1-615-460-5458
PGP Key at http://sks-keyservers.net

On May 18, 2011, at 4:37 PM, Scott Hawley wrote:

> Erik, Frank, Peter: Thanks guys.  I will pursue your suggestions.  
> 
> 
> --
> Scott H. Hawley, Ph.D. 	 		Asst. Prof. of Physics                            
> Chemistry & Physics Dept       		Office: Hitch 100D             
> Belmont University                 	Tel:  +1-615-460-6206
> Nashville, TN 37212 USA           	Fax: +1-615-460-5458
> PGP Key at http://sks-keyservers.net
> 
> 
> 
> On May 18, 2011, at 12:14 AM, Peter Diener wrote:
> 
>> Hi Scott,
>> 
>> 
>> On Tue, 17 May 2011, Frank Loeffler wrote:
>> 
>>> Hi,
>>> 
>>> On Tue, May 17, 2011 at 03:02:00PM -0700, Scott Hawley wrote:
>>>> Do these all be need to be declared as private?
>>> 
>>> If the temporary variables are declared only inside the loop they are
>>> automatically thread-local. Oh wait, that is Fortran. Well - in that
>>> case you should either declare all private, or (maybe easier) put the
>>> include files into separate functions, declare the temporary variables
>>> only there and call the functions from within the loop, in which case
>>> they also don't have to be specified for openmp (as long as they are not
>>> static).
>>> 
>>>> i certainly don't want the various processors overwriting each others'
>>>> work, which might be what they're doing. -- maybe they're even
>>>> generating NaNs which would slow things down a bit!
>> 
>> Alternatively you may use the DEFAULT(PRIVATE) clause, so that you only 
>> have to specify the shared variables. However, in that case you have to 
>> make sure to really declare all the shared variables as shared, since 
>> otherwise all processors will have to allocate storage and if they are 3d 
>> variables this will slow down the code and increase memory consumption. 
>> Also private variables have undefined values on entry to the parallel 
>> region so not declaring all shared variables properly can also adversely
>> affect the result. So be careful.
>> 
>>> You should see that in the results though. It might make sense to first
>>> make sure that the results with different numbers of threads are the
>>> same (depending on the problem you might actually get bit-by-bit
>>> identical results), and work on optimization later. I agree that your
>>> slow-down actually points towards some bug.
>>> 
>>> Frank
>> 
>> Cheers,
>> 
>>  Peter
>> 
> 
> <PGP.sig><ATT00001..txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20110518/58505fdf/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 535 bytes
Desc: This is a digitally signed message part
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20110518/58505fdf/attachment-0001.bin