[ET Trac] #2626: Multipole is not OpenMP parallelized
Gabriele Bozzola
trac-noreply at einsteintoolkit.org
Thu Aug 4 15:07:37 CDT 2022
#2626: Multipole is not OpenMP parallelized
Reporter: Gabriele Bozzola
Status: new
Milestone:
Version:
Type: enhancement
Priority: trivial
Component:
Comment (by Gabriele Bozzola):
I was looking at the inner function `Multipole_Integrate` \(ie, fixed a radius, l, and m\). I think that loops there \(which are over theta and phi\) can be parallelized with no problems. However, you are probably right and the simplest thing to do would be to add pragmas over l and m. The function call is
```
for (int l=0; l <= lmax; l++)
{
for (int m=-l; m <= l; m++)
{
// Integrate sYlm (real + i imag) over the sphere at radius r
Multipole_Integrate(array_size, ntheta,
reY[si][l][m+l], imY[si][l][m+l],
real, imag, th, ph,
&modes(v, i, l, m, 0), &modes(v, i, l, m, 1));
}//loop over m
}//loop over l
```
`Multipole_Integrate` does not touch any input except the last two, so the input can be shared. `modes` is a custom class that contains all the output data and `&modes(v, i, l, m, 0)` is just a pointer to a `CCTK_REAL`. Different iterations are going to have different pointers, so I think that the iterations are all independent without making any change. `l`and `m` are already private, so, probably to parallelize the entire thorn we just need to add
`#pragma omp for collapse(2)`
\(Plus, we’d have to remove an inner pragma in one of the integration methods\)
> You may actually see an extra speedup, if you are using it, by switching from ASCII output to HDF5 output.
Yes, I am already doing that \(which, incidentally, speeds up significantly kuibit as well\).
--
Ticket URL: https://bitbucket.org/einsteintoolkit/tickets/issues/2626/multipole-is-not-openmp-parallelized
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/trac/attachments/20220804/dc380a8f/attachment.html
More information about the Trac
mailing list