Exam 3 Study Guide/Practice Aid

Content

The exam covers content from eigenvalue sensitivity through optimization: Newton's method in 1D. Roughly in terms of subsections, that is

  • Eigenvalues: sensitivity (Bauer-Fike)
  • Eigenvalues: power iteration and variants
  • Eigenvalues: Rayleigh quotient
  • Eigenvalues: Schur form
  • Eigenvalues: QR algorithm and Hessenberg reduction
  • Eigenvalues: Krylov space methods and Arnoldi iteration
  • Computing the SVD
  • Nonlinear equations: setup, existence, sensitivity, multiplicity
  • Rates of convergence
  • Stopping criteria
  • Bisection method
  • Fixed-point iteration (1D)
  • Newton's method (1D)
  • Secant method
  • Newton's method ($n$D) and Broyden's method
  • Optimization: setup, convexity, optimality conditions, sensitivity
  • Golden section search
  • Optimization: Newton's method (1D)

Some questions to ask yourself

Eigenvalue Sensitivity

  • What does the Bauer-Fike theorem say about how much an eigenvalue can move when $A$ is perturbed to $A + E$? $$ |\mu - \lambda_k| \le \text{cond}(X) \|E\| $$ where $X^{-1}AX = D$ and $\lambda_k$ is the closest eigenvalue of $A$ to $\mu$.
  • Why is the Bauer-Fike bound tight for matrices with ill-conditioned eigenvector matrices?
  • What happens to the bound when $A$ is symmetric? (Eigenvectors are orthogonal, so $\text{cond}_2(X)=1$.)
  • Why may the bound overestimate sensitivity for an individual eigenvalue?

Power Iteration

  • What is power iteration? Given $A$ and $\boldsymbol x_0$, can I write out the first few iterates?
  • To what eigenvector does power iteration converge, and with what convergence factor? $$ \text{factor} = \left|\frac{\lambda_2}{\lambda_1}\right| $$
  • What can go wrong with power iteration? (Equal-magnitude eigenvalues, complex eigenvalues, starting vector orthogonal to the dominant eigenvector.)
  • How does a shift ($A - \sigma I$) change which eigenvalue power iteration converges to?
  • How does inversion ($A^{-1}$) change power iteration? What eigenvalue does it converge to?
  • How does shift-invert ($(A - \sigma I)^{-1}$) guide convergence? To which eigenvalue does it converge?
  • Can I compute the convergence factor for each variant above?

Rayleigh Quotient

  • What is the Rayleigh quotient $\frac{\boldsymbol x^T A \boldsymbol x}{\boldsymbol x^T \boldsymbol x}$?
  • What value does it take when $\boldsymbol x$ is an eigenvector?
  • What is Rayleigh quotient iteration? What is its convergence rate?

Schur Form

  • What is the Schur form $A = QUQ^T$? What are the properties of $Q$ and $U$?
  • Why does the Schur form always exist?
  • Where do the eigenvalues appear in the Schur form?
  • How can eigenvectors be recovered from the Schur form (back-substitution on $U - \lambda I$)?
  • What happens for matrices with complex eigenvalues (real Schur form with $2\times 2$ blocks)?

QR Algorithm

  • What is orthogonal iteration? What is the QR iteration? How are they related?
  • Why do the iterates $\bar X_k$ all have the same eigenvalues as $A$?
  • How does the QR iteration converge to the Schur form?
  • How can a shift be incorporated into the QR iteration, and why does this accelerate convergence?
  • What is upper Hessenberg form? How is it obtained (via Householder similarity transforms)?
  • Why does reducing to Hessenberg form first make the QR iteration cheaper ($O(n^2)$ per step instead of $O(n^3)$)?
  • Why does the QR iteration preserve Hessenberg form?
  • What does the overall procedure look like for symmetric matrices (tridiagonal form $\to$ QR to diagonal)?

Krylov Space Methods and Arnoldi Iteration

  • What is the Krylov space $\mathcal K_k = \text{span}\{\boldsymbol x_0, A\boldsymbol x_0, \ldots, A^{k-1}\boldsymbol x_0\}$?
  • Why is the Krylov matrix $K_k$ ill-conditioned, and how does orthogonalization fix it?
  • What is Arnoldi iteration? How does it relate to Gram-Schmidt applied to the Krylov sequence?
  • What factorization does Arnoldi produce: $AQ_k = Q_{k+1}H$? What is the shape of $H$?
  • What is Lanczos iteration? When does it apply instead of Arnoldi?
  • What are Ritz values? How are they used as approximate eigenvalues of $A$?

Computing the SVD

  • How can the SVD be reduced to an eigenvalue problem (via $A^T A$)?
  • What is the "non-kiddy" (bidiagonalization) approach, and why is it preferred?
  • Can I use the SVD of $A$ to compute the SVD of $A^T A$?

Nonlinear Equations: Setup and Theory

  • What is the goal of nonlinear equation solving: find $\boldsymbol x$ such that $\boldsymbol f(\boldsymbol x) = \boldsymbol 0$?
  • How can existence of a root be established? (Intermediate value theorem, contraction mapping theorem, inverse function theorem.)
  • What is the contraction mapping theorem? What is a fixed point?
  • What is the conditioning/sensitivity of root finding? Why must we use absolute condition numbers?
  • What is a root of multiplicity $m$? How does multiplicity affect conditioning?

Rates of Convergence

  • What does it mean for a method to converge with rate $r$? $$ \lim_{k\to\infty} \frac{\|\boldsymbol e_{k+1}\|} {\|\boldsymbol e_k\|^r} = C,\quad 0 < C < \infty $$
  • What is linear convergence ($r=1$)? How many digits does it gain per step?
  • What is quadratic convergence ($r=2$)? Why does $C$ "not matter much" once the error is small?
  • What is superlinear convergence?
  • What is the convergence rate of power iteration? Of Rayleigh quotient iteration? Of bisection? Of Newton's method? Of the secant method?

Stopping Criteria

  • Comment on the reliability of each of the following:
    1. $|f(x)| < \varepsilon$ (small residual)
    2. $\|x_{k+1} - x_k\| < \varepsilon$ (absolute step)
    3. $\|x_{k+1} - x_k\| / \|x_k\| < \varepsilon$ (relative step)
  • What can go wrong with each criterion?

Bisection Method

  • What is the bisection method? What assumption is needed?
  • What is its convergence rate and constant?

Fixed-Point Iteration

  • What is fixed-point iteration: $x_{k+1} = g(x_k)$?
  • Under what condition on $g'(x^*)$ does it converge locally?
  • What is the convergence rate in that case?
  • What happens if $g'(x^*) = 0$? What convergence rate results?

Newton's Method (1D)

  • Derive Newton's method from the linear Taylor approximation of $f$ at $x_k$: $$ x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)} $$
  • Show that $g'(x^*) = 0$ for Newton, implying quadratic convergence toward simple roots.
  • What are the drawbacks of Newton's method? (Local convergence only; derivative must be known.)

Secant Method

  • How does the secant method approximate $f'(x_k)$? $$ x_{k+1} = x_k - \frac{f(x_k)(x_k - x_{k-1})}{f(x_k) - f(x_{k-1})} $$
  • What is the convergence rate of the secant method ($r \approx 1.618 = (1+\sqrt 5)/2$)?
  • What are its drawbacks compared to Newton's method?
  • Why is it called a quasi-Newton method?

Newton's Method ($n$D)

  • What does Newton's method look like in $n$ dimensions? $$ \boldsymbol x_{k+1} = \boldsymbol x_k - J_{\boldsymbol f}(\boldsymbol x_k)^{-1} \boldsymbol f(\boldsymbol x_k) $$
  • Why is it expensive per iteration?
  • What is Broyden's method? Why can't we simply generalize the secant formula to $n$ dimensions directly?

Optimization: Setup and Theory

  • What is the optimization problem: minimize $f(\boldsymbol x)$ subject to equality and inequality constraints?
  • What is the difference between constrained and unconstrained optimization? Between linear and nonlinear programming?
  • What does it mean for $f$ to be convex? Strictly convex?
  • If $f$ is convex, what can be said about local minima? If strictly convex?
  • What are the necessary and sufficient conditions for $x^*$ to be a local minimum?
    • 1D: $f'(x^*) = 0$ (necessary); $f''(x^*) > 0$ (sufficient)
    • $n$D: $\nabla f(\boldsymbol x^*) = \boldsymbol 0$ (necessary); $H_f(\boldsymbol x^*)$ positive definite (sufficient)
  • What is the Hessian $H_f$? Is it symmetric?
  • How can positive definiteness of the Hessian be tested computationally?

Sensitivity of Optimization

  • Why can we expect only half as many digits in $x^*$ as in $f(x^*)$ when minimizing by function values alone?
  • In $n$D, how does the conditioning of $H_f(\boldsymbol x^*)$ relate to the sensitivity of $\boldsymbol x^*$?

Golden Section Search

  • What is unimodality? Why is it needed for golden section search?
  • How does golden section search bracket the minimum?
  • What is the golden ratio $\tau = (\sqrt{5}-1)/2$, and why does it arise?
  • What is the convergence rate of golden section search?

Optimization: Newton's Method (1D)

  • Derive Newton's method for optimization from the quadratic Taylor approximation: $$ x_{k+1} = x_k - \frac{f'(x_k)}{f''(x_k)} $$
  • Why is this identical to Newton's method applied to $f'$?
  • What is its convergence rate?
  • What are its drawbacks? (Local convergence; requires $f''$.)
  • How can it be combined with a bracketing strategy for robustness?