Languages and Abstractions for High-Performance Scientific Computing (CS 598APK) Fall 2018
What | Where |
---|---|
Time/place | Wed/Fri 2:00pm-3:15pm 1109 Siebel / Catalog |
Class URL | https://bit.ly/hpcabstr-f18 |
Class recordings | Echo 360 |
Piazza | Discuss » |
Calendar | View » |
Assignments
Current
-
- Poster Session: In class on Wednesday Dec 12 - CS Poster Printing Requests
- Project Report and Materials Submission Deadline: Tuesday Dec 18 11pm
- Poster Template (optional)
-
Presentation Response Dec 7 (Due: Fri Dec 14)
- Presentation Response Dec 5 (Due: Wed Dec 12)
- Presentation Response Nov 30 (Due: Fri Dec 7)
- Paper Presentation: Slides Submission
Past
- Presentation Response Nov 28 (Due: Wed Dec 5)
- Presentation Response Nov 16 (Due: Fri Nov 30)
- HW4 (Due: Wed Nov 28)
- Presentation Response Nov 9 (Due: Fri Nov 16)
- Presentation Response Nov 2 (Due: Fri Nov 9)
- Consent for HW3 code release (Due: Mon Nov 5)
- Presentation Response Oct 26 (Due: Fri Nov 2)
- Presentation Response Oct 19 (Due: Fri Oct 26)
-
HW3 (Due: Wed Oct 31)
-
Project Proposal (First draft due: Wed Oct 10)
- Presentation Response Oct 12 (Due: Fri Oct 19)
-
HW2 (Due: Fri Sep 28) (Consent for code release)
-
Presentation Response Oct 5 (Due: Fri Oct 12)
- HW1 (Due: Fri Sep 14)
- Presentation Response Sep 21 (Due: Fri Sep 28)
- SSH Public Key Submssion (Due: Fri Sep 7)
- Talk topic selection (Due Fri Sep 7)
Paper Presentations
Paper presentations: Times/Topics/Slides (Times spreadsheet for posterity)
Why You Should Take this Class
Software for large-scale problems is stretched between three key requirements: high-performacne, typically parallel implementation, asymptotically optimal algorithms, and often highly technical application domains. This tension contributes considerably to making HPC software difficult to write and hard to maintain. If you are faced with this problem, this class can help you find and develop possible solution approaches.
Abstractions, tools, and languages can help restore separation of concerns and ease creation and maintenance of such software. Proven approaches to this problem include domain-specific mini-languages (`DSLs'), code generation, as well as 'active' libraries.
This class begins with a quick, but thorough examination of the problem setting: What machines are we realistically confronted with now and in the foreseeable future? What are determinants of performance? How can we measure and understand performance?
From the hardware level, we will then move towards a view of abstractions: Concepts and simplifications of complex hardware realities that are sufficiently thin to allow user/developer to reason about expected performance while achieving a substantial simplification of the programming task.
We will discuss, design, and evaluate a number of such systems, with the goal of putting you in a position to
- know the available landscape of tools
- make informed choices among them for a computational task at hand
- design your own abstractions and place them in context with respect to the state of the art.
As we progress, we will examine a number of increasingly high-level program representations, ranging from instruction sets to compiler IRs, through CUDA/OpenCL/'SIMT' models, to polyhedral and other higher-level representations. Along the way, you will design toy program representations and transformations for limited-scale tasks and examine your achievable and practically achieved performance.
We will pay careful attention to semantics and correctness in the context of program representation and ultimate code generation, but we will prefer the definition of specialized or simplified semantics over extensive compiler analyses that might help prove the validity of transformations.
To complement the many excellent distributed-memory offerings in the department (e.g. CS 484, CS 554), this class focuses more on 'on-node'/shared-memory performance.
Prerequisites / What You Should Already Know
- C
- Python
- Familiarity with parallel programming: You should have measured and wondered about the performance of a code you wrote
- Having taken a class on compiler construction (such as CS 426) will be helpful, but is not required.
While this class is being offered in a CS department, it is deliberately open to graduate students in the engineering disciplines who do extensive computational work and face this range of problems every day.
Suggested Papers for Student Presentations
See this list for an idea on the focus of this class. These papers can also serve as the basis for mid-semester paper presentations.
Instructor

Course Outline
I will insert links to class material, books, and papers into this tree as time goes on.
Note: the section headings in this tree are clickable to reveal more detail.
-
Introduction
- Notes
- About This Class
- Why Bother with Parallel Computers?
- Lowest Accessible Abstraction: Assembly
- Architecture of an Execution Pipeline
- Architecture of a Memory System
- Shared-Memory Multiprocessors
- Demo: Assembly Reading Comprehension
- Demo: Cache Organization on Your Machine
- Demo: DGEMM Performance
- Demo: Lock Contention
- Demo: More Pipeline Performance Mysteries
- Demo: NUMA and Bandwidths
- Demo: Pipeline Performance Mysteries
- Demo: Talk Time Assignment
- Demo: Talk Topic Assignment
- Demo: Threads vs Cache
- Code: timing.h
-
Machine Abstractions
- C
- OpenCL/CUDA
- Convergence, Differences in Machine Mapping
- Lower-Level Abstractions: SPIR-V, PTX
- Demo: AVX Playground
- Demo: Hello GPU
- Demo: Object Orientation vs Performance
- Demo: PTX and SASS
- Demo: Pointer Aliasing
- Demo: Register Pressure
- Demo: Ways to SIMD
-
Performance: Expectation, Experiment, Observation
- Forming Expectations of Performance
- Timing Experiments and Potential Issues
- Profiling and Observable Quantities
- Practical Tools: perf, toplev, likwid
- Demo: Forming Architectural Performance Expectations
- Demo: Using Performance Counters
-
Performance-Oriented Languages and Abstractions
- Expression Trees
- Parallel Patterns and Array Languages
- Demo: 01 Expression Trees
- Demo: 02 Traversing Trees
- Demo: 03 Defining Custom Node Types
- Demo: 04 Common Operations
- Demo: 05 Reflection in Python
- Demo: 06 Towards Execution
- Demo: Expression Templates
- Polyhedral Representation and Transformation
CAUTION!
These scribbled PDFs are an unedited reflection of what we wrote during class. They need to be viewed in the context of the class discussion that led to them. See the lecture videos for that.
If you would like actual, self-contained class notes, look in the outline above.
These scribbles are provided here to provide a record of our class discussion, to be used in perhaps the following ways:
- as a way to cross-check your own notes
- to look up a formula that you know was shown in a certain class
- to remind yourself of what exactly was covered on a given day
By continuing to read them, you acknowledge that these files are provided as supplementary material on an as-is basis.
- 2018-10-24-Note-15-27.pdf
- scribbles-2018-08-29.pdf
- scribbles-2018-08-31.pdf
- scribbles-2018-09-05.pdf
- scribbles-2018-09-07.pdf
- scribbles-2018-09-12.pdf
- scribbles-2018-09-14.pdf
- scribbles-2018-09-19.pdf
- scribbles-2018-09-26.pdf
- scribbles-2018-10-03.pdf
- scribbles-2018-10-10.pdf
- scribbles-2018-10-17.pdf
- scribbles-2018-10-24.pdf
- scribbles-2018-10-31.pdf
- scribbles-2018-11-07.pdf
- scribbles-2018-11-14.pdf
Computing
If you have submitted your SSH key, an account has been created you on a family of machines managed by members of the scientific computing area. See the linked page for more information on access and usage.
Virtual Machine Image
While you are free to install Python and Numpy on your own computer to do homework, the only supported way to do so is using the supplied virtual machine image.