[Users] ET on KNL.
David Radice
dradice at astro.princeton.edu
Wed Mar 1 15:10:47 CST 2017
Hi Ian, Erik, Eloisa,
> I attach a very brief report of some results I obtained in 2015 after attending a KNC workshop.
>> Conclusions: By using 244 threads, with the domain split into tiles of size 8 × 4 × 4 points, and OpenMP threads assigned one per tile as they become available, the MIC was able to outperform the single CPU by a factor of 1.5. The same tiling strategy was used on the CPU, as it has been found to give good performance there in the past. Since we have not yet optimised the code for the MIC architecture, we believe that further speed improvements will be possible, and that solving the Einstein equations on the MIC architecture should be feasible.
>>
> Eloisa, are you using LoopControl? There are tiling parameters which can also help with performance on these devices.
how does tiling work with LoopControl? Is it documented somewhere? I naively thought that the point of tiling was to have chunks of data stored contiguously in memory...
BTW, at the moment I am using this macro for all of my loop needs:
#define UTILS_LOOP3(NAME,I,SI,EI,J,SJ,EJ,K,SK,EK) \
_Pragma("omp for collapse(3)") \
for(int I = SI; I < EI; ++I) \
for(int J = SJ; J < EJ; ++J) \
for(int K = SK; K < EK; ++K)
How would I convert it to something equivalent using LoopControl?
Thanks,
David
PS. Seeing that Eloisa was able to compile bbox.cc with the intel-17.0.0 with -no-vec, I made a patch to disable vectorization using pragmas inside bbox.cc (to avoid having to compile it manually):
https://bitbucket.org/eschnett/carpet/pull-requests/16/carpetlib-fix-compilation-with-intel-1700/diff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20170301/f176ccd8/attachment.bin
More information about the Users
mailing list