Speaker: László András Hernádi (Inst. 193-02 CG)

Programmers are accustomed writing code to be executed on CPUs. CPUs have means
to optimize single threaded performance at runtime.GPGPUs with a large number of
computing units combined with fast context-switching they are well-suited for parallel
computing. Since 2006 GPGPUs have emerged and allow highly parallelizable algorithms
using their high number of cores and fast context-switching. The present work focuses on
frameworks capable of running scientific applications on GPGPUs. It considers various
frameworks, starting from pragma-based frameworks like OpenMP down to low-level
languages like PTX. It gives an overview of available technologies and provides code
samples as well as benchmarks. In order to support CPU programmers choosing a suitable
framework for their given problem, this thesis considers aspects of readability, required
changes and performance.
For owners of recent Nvidia graphics cards we recommend CUDA because of the
available IDE and documentation. Developers of small C / C++ or Fortran projects
should consider OpenMP or OpenACC with a few directives to advise the offloading
to GPGPUs. Java Developers on the other hand will stay flexible with aparapi which
generates OpenCL code from special Java classes and run it on GPGPU. These frameworks
have the advantage of achievable significant speed-ups – compared to sequential code –
by telling the compiler which regions should be parallelized and what to calculate on
GPGPUs.
Further, the present work points out advantages and disadvantages of each framework,
common framework-related pitfalls, and gives recommendations on choosing a framework.
To illustrate the frameworks, kernels as dense matrix-matrix-multiplication and sparse
matrix-vector-multiplication are shown.
Finally, various benchmarks show effects of optimizations to performance.

 

Details

Category

Duration

20 + 10
Supervisor: Johannes Unterguggenberger