[Commits] [svn:einsteintoolkit] www/about/releases/ (Rev. 856)
schnetter at cct.lsu.edu
schnetter at cct.lsu.edu
Mon May 28 13:41:19 CDT 2012
User: eschnett
Date: 2012/05/28 01:41 PM
Added:
/about/releases/
opencl.html
Log:
Describe CUDA/OpenCL in the release
File Changes:
Directory: /about/releases/
===========================
File [added]: opencl.html
Delta lines: +45 -0
===================================================================
--- about/releases/opencl.html (rev 0)
+++ about/releases/opencl.html 2012-05-28 18:41:18 UTC (rev 856)
@@ -0,0 +1,45 @@
+<h3>Accelerator Support</h3>
+
+<p>This release of the Einstein Toolkit adds support for GPUs and
+ other accelerators. This support comprises three levels of
+ abstraction, ranging from merely building and running both CUDA and
+ OpenCL code, to automated code generation targetting GPUs instead of
+ CPUs. As with any other programming paradigm (such as MPI or
+ OpenMP), the performance benefits depend on the particular
+ algorithms used and optimisations that are applied. In addition, the
+ Simulation Factory greatly aids portability to a wide range of
+ computing systems.</p>
+
+ <!-- This additional text only for the details release notes -->
+
+<p>At the lowest level, Cactus now supports compiling, building, and
+ running with either CUDA or OpenCL. CUDA is supported as new
+ language in addition to C, C++, and Fortran; OpenCL is supported as
+ an external library, and builds and executes compute kernels via
+ run-time calls. Details are described in the user's guide (for CUDA)
+ and in thorn <tt>ExternalLibraries/OpenCL</tt> (for OpenCL).</p>
+
+<p>Many accelerator platforms today separate between host memory and
+ device memory, and require explicit copy or map operations to
+ transfer data. An intermediate level of abstraction aids
+ transferring grid variables between host and device, using schedule
+ declarations to keep track of which data are needed where, and
+ minimising expensive data transfers. For OpenCL, there is a compact
+ API to build and execute compute kernels at run time. Details are
+ described in thorns <tt>CactusUtils/Accelerator</tt>
+ and <tt>CactusUtils/OpenCLRunTime</tt> (with example parameter
+ file).</p>
+
+<p>Finally, the code generation
+ system <a href="http://kranccode.org/"><i>Kranc</i></a> has been
+ extended to be able to produce either C++ or OpenCL code, based on
+ the infrastructure described above. This allows writing GPU code in
+ a very high-level manner. However, it needs to be stated that the
+ efficiency of the generated code depends on many variables,
+ including e.g. the finite differencing stencil radius and the number
+ of operations in the generated compute kernels. Non-trivial kernels
+ typically require system-dependent tuning to execute efficiently, as
+ GPUs and other accelerators generally show a rather unforgiving
+ performance behaviour. The thorns <tt>McLachlan/ML_WaveToy</tt> and
+ <tt>McLachlan/ML_WaveToy_CL</tt> are examples, generated from the
+ same Kranc script, showing the generated C++ and OpenCL code.</p>
More information about the Commits
mailing list