Exam 3 Study Guide/Practice Aid

Content

The exam covers content from eigenvalue sensitivity through optimization: Newton's method in 1D. Roughly in terms of subsections, that is

Eigenvalues: sensitivity (Bauer-Fike)
Eigenvalues: power iteration and variants
Eigenvalues: Rayleigh quotient
Eigenvalues: Schur form
Eigenvalues: QR algorithm and Hessenberg reduction
Eigenvalues: Krylov space methods and Arnoldi iteration
Computing the SVD
Nonlinear equations: setup, existence, sensitivity, multiplicity
Rates of convergence
Stopping criteria
Bisection method
Fixed-point iteration (1D)
Newton's method (1D)
Secant method
Newton's method ($n$D) and Broyden's method
Optimization: setup, convexity, optimality conditions, sensitivity
Golden section search
Optimization: Newton's method (1D)

Some questions to ask yourself

Eigenvalue Sensitivity

What does the Bauer-Fike theorem say about how much an eigenvalue can move when $A$ is perturbed to $A + E$? $$ |\mu - \lambda_k| \le \text{cond}(X) \|E\| $$ where $X^{-1}AX = D$ and $\lambda_k$ is the closest eigenvalue of $A$ to $\mu$.
Why is the Bauer-Fike bound tight for matrices with ill-conditioned eigenvector matrices?
What happens to the bound when $A$ is symmetric? (Eigenvectors are orthogonal, so $\text{cond}_2(X)=1$.)
Why may the bound overestimate sensitivity for an individual eigenvalue?

Power Iteration

What is power iteration? Given $A$ and $\boldsymbol x_0$, can I write out the first few iterates?
To what eigenvector does power iteration converge, and with what convergence factor? $$ \text{factor} = \left|\frac{\lambda_2}{\lambda_1}\right| $$
What can go wrong with power iteration? (Equal-magnitude eigenvalues, complex eigenvalues, starting vector orthogonal to the dominant eigenvector.)
How does a shift ($A - \sigma I$) change which eigenvalue power iteration converges to?
How does inversion ($A^{-1}$) change power iteration? What eigenvalue does it converge to?
How does shift-invert ($(A - \sigma I)^{-1}$) guide convergence? To which eigenvalue does it converge?
Can I compute the convergence factor for each variant above?

Rayleigh Quotient

What is the Rayleigh quotient $\frac{\boldsymbol x^T A \boldsymbol x}{\boldsymbol x^T \boldsymbol x}$?
What value does it take when $\boldsymbol x$ is an eigenvector?
What is Rayleigh quotient iteration? What is its convergence rate?

Schur Form

What is the Schur form $A = QUQ^T$? What are the properties of $Q$ and $U$?
Why does the Schur form always exist?
Where do the eigenvalues appear in the Schur form?
How can eigenvectors be recovered from the Schur form (back-substitution on $U - \lambda I$)?
What happens for matrices with complex eigenvalues (real Schur form with $2\times 2$ blocks)?

QR Algorithm

What is orthogonal iteration? What is the QR iteration? How are they related?
Why do the iterates $\bar X_k$ all have the same eigenvalues as $A$?
How does the QR iteration converge to the Schur form?
How can a shift be incorporated into the QR iteration, and why does this accelerate convergence?
What is upper Hessenberg form? How is it obtained (via Householder similarity transforms)?
Why does reducing to Hessenberg form first make the QR iteration cheaper ($O(n^2)$ per step instead of $O(n^3)$)?
Why does the QR iteration preserve Hessenberg form?
What does the overall procedure look like for symmetric matrices (tridiagonal form $\to$ QR to diagonal)?

Krylov Space Methods and Arnoldi Iteration

What is the Krylov space $\mathcal K_k = \text{span}\{\boldsymbol x_0, A\boldsymbol x_0, \ldots, A^{k-1}\boldsymbol x_0\}$?
Why is the Krylov matrix $K_k$ ill-conditioned, and how does orthogonalization fix it?
What is Arnoldi iteration? How does it relate to Gram-Schmidt applied to the Krylov sequence?
What factorization does Arnoldi produce: $AQ_k = Q_{k+1}H$? What is the shape of $H$?
What is Lanczos iteration? When does it apply instead of Arnoldi?
What are Ritz values? How are they used as approximate eigenvalues of $A$?

Computing the SVD

How can the SVD be reduced to an eigenvalue problem (via $A^T A$)?
What is the "non-kiddy" (bidiagonalization) approach, and why is it preferred?
Can I use the SVD of $A$ to compute the SVD of $A^T A$?

Nonlinear Equations: Setup and Theory

What is the goal of nonlinear equation solving: find $\boldsymbol x$ such that $\boldsymbol f(\boldsymbol x) = \boldsymbol 0$?
How can existence of a root be established? (Intermediate value theorem, contraction mapping theorem, inverse function theorem.)
What is the contraction mapping theorem? What is a fixed point?
What is the conditioning/sensitivity of root finding? Why must we use absolute condition numbers?
What is a root of multiplicity $m$? How does multiplicity affect conditioning?

Rates of Convergence

What does it mean for a method to converge with rate $r$? $$ \lim_{k\to\infty} \frac{\|\boldsymbol e_{k+1}\|} {\|\boldsymbol e_k\|^r} = C,\quad 0 < C < \infty $$
What is linear convergence ($r=1$)? How many digits does it gain per step?
What is quadratic convergence ($r=2$)? Why does $C$ "not matter much" once the error is small?
What is superlinear convergence?
What is the convergence rate of power iteration? Of Rayleigh quotient iteration? Of bisection? Of Newton's method? Of the secant method?

Stopping Criteria

Comment on the reliability of each of the following:
1. $|f(x)| < \varepsilon$ (small residual)
2. $\|x_{k+1} - x_k\| < \varepsilon$ (absolute step)
3. $\|x_{k+1} - x_k\| / \|x_k\| < \varepsilon$ (relative step)
What can go wrong with each criterion?

Bisection Method

What is the bisection method? What assumption is needed?
What is its convergence rate and constant?

Fixed-Point Iteration

What is fixed-point iteration: $x_{k+1} = g(x_k)$?
Under what condition on $g'(x^*)$ does it converge locally?
What is the convergence rate in that case?
What happens if $g'(x^*) = 0$? What convergence rate results?

Newton's Method (1D)

Derive Newton's method from the linear Taylor approximation of $f$ at $x_k$: $$ x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)} $$
Show that $g'(x^*) = 0$ for Newton, implying quadratic convergence toward simple roots.
What are the drawbacks of Newton's method? (Local convergence only; derivative must be known.)

Secant Method

How does the secant method approximate $f'(x_k)$? $$ x_{k+1} = x_k - \frac{f(x_k)(x_k - x_{k-1})}{f(x_k) - f(x_{k-1})} $$
What is the convergence rate of the secant method ($r \approx 1.618 = (1+\sqrt 5)/2$)?
What are its drawbacks compared to Newton's method?
Why is it called a quasi-Newton method?

Newton's Method ($n$D)

What does Newton's method look like in $n$ dimensions? $$ \boldsymbol x_{k+1} = \boldsymbol x_k - J_{\boldsymbol f}(\boldsymbol x_k)^{-1} \boldsymbol f(\boldsymbol x_k) $$
Why is it expensive per iteration?
What is Broyden's method? Why can't we simply generalize the secant formula to $n$ dimensions directly?

Optimization: Setup and Theory

What is the optimization problem: minimize $f(\boldsymbol x)$ subject to equality and inequality constraints?
What is the difference between constrained and unconstrained optimization? Between linear and nonlinear programming?
What does it mean for $f$ to be convex? Strictly convex?
If $f$ is convex, what can be said about local minima? If strictly convex?
What are the necessary and sufficient conditions for $x^*$ to be a local minimum?
- 1D: $f'(x^*) = 0$ (necessary); $f''(x^*) > 0$ (sufficient)
- $n$D: $\nabla f(\boldsymbol x^*) = \boldsymbol 0$ (necessary); $H_f(\boldsymbol x^*)$ positive definite (sufficient)
What is the Hessian $H_f$? Is it symmetric?
How can positive definiteness of the Hessian be tested computationally?

Sensitivity of Optimization

Why can we expect only half as many digits in $x^*$ as in $f(x^*)$ when minimizing by function values alone?
In $n$D, how does the conditioning of $H_f(\boldsymbol x^*)$ relate to the sensitivity of $\boldsymbol x^*$?

Golden Section Search

What is unimodality? Why is it needed for golden section search?
How does golden section search bracket the minimum?
What is the golden ratio $\tau = (\sqrt{5}-1)/2$, and why does it arise?
What is the convergence rate of golden section search?

Optimization: Newton's Method (1D)

Derive Newton's method for optimization from the quadratic Taylor approximation: $$ x_{k+1} = x_k - \frac{f'(x_k)}{f''(x_k)} $$
Why is this identical to Newton's method applied to $f'$?
What is its convergence rate?
What are its drawbacks? (Local convergence; requires $f''$.)
How can it be combined with a bracketing strategy for robustness?