|Time/place||Wed/Fri 2:00pm-3:15pm 1109 Siebel / Catalog|
|Class recordings||Echo 360|
Presentation Response Dec 7 (Due: Fri Dec 14)
HW3 (Due: Wed Oct 31)
Project Proposal (First draft due: Wed Oct 10)
Presentation Response Oct 5 (Due: Fri Oct 12)
Software for large-scale problems is stretched between three key requirements: high-performacne, typically parallel implementation, asymptotically optimal algorithms, and often highly technical application domains. This tension contributes considerably to making HPC software difficult to write and hard to maintain. If you are faced with this problem, this class can help you find and develop possible solution approaches.
Abstractions, tools, and languages can help restore separation of concerns and ease creation and maintenance of such software. Proven approaches to this problem include domain-specific mini-languages (`DSLs'), code generation, as well as 'active' libraries.
This class begins with a quick, but thorough examination of the problem setting: What machines are we realistically confronted with now and in the foreseeable future? What are determinants of performance? How can we measure and understand performance?
From the hardware level, we will then move towards a view of abstractions: Concepts and simplifications of complex hardware realities that are sufficiently thin to allow user/developer to reason about expected performance while achieving a substantial simplification of the programming task.
We will discuss, design, and evaluate a number of such systems, with the goal of putting you in a position to
As we progress, we will examine a number of increasingly high-level program representations, ranging from instruction sets to compiler IRs, through CUDA/OpenCL/'SIMT' models, to polyhedral and other higher-level representations. Along the way, you will design toy program representations and transformations for limited-scale tasks and examine your achievable and practically achieved performance.
We will pay careful attention to semantics and correctness in the context of program representation and ultimate code generation, but we will prefer the definition of specialized or simplified semantics over extensive compiler analyses that might help prove the validity of transformations.
To complement the many excellent distributed-memory offerings in the department (e.g. CS 484, CS 554), this class focuses more on 'on-node'/shared-memory performance.
While this class is being offered in a CS department, it is deliberately open to graduate students in the engineering disciplines who do extensive computational work and face this range of problems every day.
See this list for an idea on the focus of this class. These papers can also serve as the basis for mid-semester paper presentations.
I will insert links to class material, books, and papers into this tree as time goes on.
Note: the section headings in this tree are clickable to reveal more detail.
These scribbled PDFs are an unedited reflection of what we wrote during class. They need to be viewed in the context of the class discussion that led to them. See the lecture videos for that.
If you would like actual, self-contained class notes, look in the outline above.
These scribbles are provided here to provide a record of our class discussion, to be used in perhaps the following ways:
By continuing to read them, you acknowledge that these files are provided as supplementary material on an as-is basis.
If you have submitted your SSH key, an account has been created you on a family of machines managed by members of the scientific computing area. See the linked page for more information on access and usage.
While you are free to install Python and Numpy on your own computer to do homework, the only supported way to do so is using the supplied virtual machine image.