You're not currently signed in. Sign in »

Languages and Abstractions for High-Performance Scientific Computing (CS 598APK) Fall 2018

What Where
Time/place Wed/Fri 2:00pm-3:15pm 1109 Siebel / Catalog
Class URL
Class recordings Echo 360
Piazza Discuss »
Calendar View »




Paper Presentations

Paper presentations: Times/Topics/Slides (Times spreadsheet for posterity)

Why You Should Take this Class

Visualization of an array access pattern

GPU kernel source code

Software for large-scale problems is stretched between three key requirements: high-performacne, typically parallel implementation, asymptotically optimal algorithms, and often highly technical application domains. This tension contributes considerably to making HPC software difficult to write and hard to maintain. If you are faced with this problem, this class can help you find and develop possible solution approaches.

Abstractions, tools, and languages can help restore separation of concerns and ease creation and maintenance of such software. Proven approaches to this problem include domain-specific mini-languages (`DSLs'), code generation, as well as 'active' libraries.

This class begins with a quick, but thorough examination of the problem setting: What machines are we realistically confronted with now and in the foreseeable future? What are determinants of performance? How can we measure and understand performance?

From the hardware level, we will then move towards a view of abstractions: Concepts and simplifications of complex hardware realities that are sufficiently thin to allow user/developer to reason about expected performance while achieving a substantial simplification of the programming task.

We will discuss, design, and evaluate a number of such systems, with the goal of putting you in a position to

As we progress, we will examine a number of increasingly high-level program representations, ranging from instruction sets to compiler IRs, through CUDA/OpenCL/'SIMT' models, to polyhedral and other higher-level representations. Along the way, you will design toy program representations and transformations for limited-scale tasks and examine your achievable and practically achieved performance.

We will pay careful attention to semantics and correctness in the context of program representation and ultimate code generation, but we will prefer the definition of specialized or simplified semantics over extensive compiler analyses that might help prove the validity of transformations.

To complement the many excellent distributed-memory offerings in the department (e.g. CS 484, CS 554), this class focuses more on 'on-node'/shared-memory performance.

Prerequisites / What You Should Already Know

While this class is being offered in a CS department, it is deliberately open to graduate students in the engineering disciplines who do extensive computational work and face this range of problems every day.

Suggested Papers for Student Presentations

See this list for an idea on the focus of this class. These papers can also serve as the basis for mid-semester paper presentations.


Andreas Kloeckner

Andreas Kloeckner



Office: 4318 Siebel

Course Outline

I will insert links to class material, books, and papers into this tree as time goes on.

Note: the section headings in this tree are clickable to reveal more detail.


These scribbled PDFs are an unedited reflection of what we wrote during class. They need to be viewed in the context of the class discussion that led to them. See the lecture videos for that.

If you would like actual, self-contained class notes, look in the outline above.

These scribbles are provided here to provide a record of our class discussion, to be used in perhaps the following ways:

  • as a way to cross-check your own notes
  • to look up a formula that you know was shown in a certain class
  • to remind yourself of what exactly was covered on a given day

By continuing to read them, you acknowledge that these files are provided as supplementary material on an as-is basis.


If you have submitted your SSH key, an account has been created you on a family of machines managed by members of the scientific computing area. See the linked page for more information on access and usage.

Virtual Machine Image

While you are free to install Python and Numpy on your own computer to do homework, the only supported way to do so is using the supplied virtual machine image.

Download Virtual Machine »

Grading Policies

View grading policies »