[Commits] [svn:einsteintoolkit] www/about/releases/ (Rev. 856)

Mon May 28 13:41:19 CDT 2012

User: eschnett
Date: 2012/05/28 01:41 PM

Added:
 /about/releases/
  opencl.html

Log:
 Describe CUDA/OpenCL in the release

File Changes:

Directory: /about/releases/
===========================

File [added]: opencl.html
Delta lines: +45 -0
===================================================================

--- about/releases/opencl.html	                        (rev 0)
+++ about/releases/opencl.html	2012-05-28 18:41:18 UTC (rev 856)
@@ -0,0 +1,45 @@
+<h3>Accelerator Support</h3>
+
+<p>This release of the Einstein Toolkit adds support for GPUs and
+  other accelerators. This support comprises three levels of
+  abstraction, ranging from merely building and running both CUDA and
+  OpenCL code, to automated code generation targetting GPUs instead of
+  CPUs. As with any other programming paradigm (such as MPI or
+  OpenMP), the performance benefits depend on the particular
+  algorithms used and optimisations that are applied. In addition, the
+  Simulation Factory greatly aids portability to a wide range of
+  computing systems.</p>
+  
+  <!-- This additional text only for the details release notes -->
+  
+<p>At the lowest level, Cactus now supports compiling, building, and
+  running with either CUDA or OpenCL. CUDA is supported as new
+  language in addition to C, C++, and Fortran; OpenCL is supported as
+  an external library, and builds and executes compute kernels via
+  run-time calls. Details are described in the user's guide (for CUDA)
+  and in thorn <tt>ExternalLibraries/OpenCL</tt> (for OpenCL).</p>
+
+<p>Many accelerator platforms today separate between host memory and
+  device memory, and require explicit copy or map operations to
+  transfer data. An intermediate level of abstraction aids
+  transferring grid variables between host and device, using schedule
+  declarations to keep track of which data are needed where, and
+  minimising expensive data transfers. For OpenCL, there is a compact
+  API to build and execute compute kernels at run time. Details are
+  described in thorns <tt>CactusUtils/Accelerator</tt>
+  and <tt>CactusUtils/OpenCLRunTime</tt> (with example parameter
+  file).</p>
+
+<p>Finally, the code generation
+  system <a href="http://kranccode.org/"><i>Kranc</i></a> has been
+  extended to be able to produce either C++ or OpenCL code, based on
+  the infrastructure described above. This allows writing GPU code in
+  a very high-level manner. However, it needs to be stated that the
+  efficiency of the generated code depends on many variables,
+  including e.g. the finite differencing stencil radius and the number
+  of operations in the generated compute kernels. Non-trivial kernels
+  typically require system-dependent tuning to execute efficiently, as
+  GPUs and other accelerators generally show a rather unforgiving
+  performance behaviour. The thorns <tt>McLachlan/ML_WaveToy</tt> and
+  <tt>McLachlan/ML_WaveToy_CL</tt> are examples, generated from the
+  same Kranc script, showing the generated C++ and OpenCL code.</p>