Exam 4 Study Guide/Practice Aid
Content
The exam covers content from gradient descent through differentiation matrices. Roughly in terms of subsections, that is
- Optimization
- gradient descent and steepest descent
- Newton's method ($n$D)
- quasi-Newton methods (BFGS)
- nonlinear least squares (Gauss-Newton, Levenberg-Marquardt)
- constrained optimization, KKT conditions
- Interpolation
- setup, Vandermonde matrices
- Lagrange, Newton, and orthogonal polynomial bases
- Chebyshev polynomials and nodes
- error estimates
- piecewise interpolation and splines
- Quadrature
- interpolatory quadrature, Newton-Cotes
- accuracy and stability
- Gaussian quadrature
- composite quadrature
- Numerical differentiation
- accuracy
- differentiation matrices
Some questions to ask yourself
Gradient Descent / Steepest Descent
- What is the steepest descent direction for a function $f$? $$ \boldsymbol{s}_k = -\nabla f(\boldsymbol{x}_k) $$
- What is the role of the line search? What method can be used?
- Write out one full step of gradient descent.
- What is the convergence rate of steepest descent? (Linear convergence.)
- For the quadratic model $f(\boldsymbol{x}) = \frac{1}{2}\boldsymbol{x}^T A \boldsymbol{x} + \boldsymbol{c}^T \boldsymbol{x}$ with $A$ SPD, what is the convergence factor? $$ \|\boldsymbol{e}_{k+1}\|_A = \frac{\kappa(A)-1}{\kappa(A)+1}\|\boldsymbol{e}_k\|_A $$
- Why does ill-conditioning slow down gradient descent?
- What are the (weak) Wolfe conditions, and why are they used
instead of an exact line search?
- Sufficient decrease: $\phi(\alpha) \le \phi(0) + c_1\alpha\phi'(0)$
- Curvature condition: $\phi'(\alpha) \ge c_2\phi'(0)$
Heavy Ball and Conjugate Gradient
- What is the heavy ball (momentum) method? $$ \boldsymbol{x}_{k+1} = \boldsymbol{x}_k - \alpha_k\nabla f(\boldsymbol{x}_k) + \beta_k(\boldsymbol{x}_k - \boldsymbol{x}_{k-1}) $$
- For the quadratic model with optimal $\alpha,\beta$, what convergence factor does heavy ball achieve? $$ \|\boldsymbol{e}_{k+1}\|_A = \frac{\sqrt{\kappa(A)}-1}{\sqrt{\kappa(A)}+1}\|\boldsymbol{e}_k\|_A $$
- What is stochastic gradient descent (SGD)? When is it used?
- What is the conjugate gradient method for optimization? How does it relate to Lanczos iteration and the Krylov space?
Newton's Method ($n$D) for Optimization
- Derive Newton's method for optimization in $n$ dimensions from the quadratic Taylor approximation: $$ H_f(\boldsymbol{x}_k)\boldsymbol{s}_k = -\nabla f(\boldsymbol{x}_k), \qquad \boldsymbol{x}_{k+1} = \boldsymbol{x}_k + \boldsymbol{s}_k $$
- What are the drawbacks? (Local convergence; requires second derivatives; fails when $H_f$ is nearly indefinite.)
- What is the convergence rate of Newton's method for optimization? (Quadratic, near a strict local minimum.)
- What is the Nelder-Mead method?
Quasi-Newton Methods (BFGS)
- What is the secant condition for a Hessian approximation $B_{k+1}$? $$ B_{k+1}\boldsymbol{s}_k = \boldsymbol{y}_k,\quad \boldsymbol{s}_k = \boldsymbol{x}_{k+1}-\boldsymbol{x}_k,\quad \boldsymbol{y}_k = \nabla f(\boldsymbol{x}_{k+1})-\nabla f(\boldsymbol{x}_k) $$
- What is the BFGS update formula?
- Why does BFGS retain positive definiteness of $B_{k+1}$ (given the Wolfe curvature condition)?
- What is the convergence behavior of BFGS in practice?
Nonlinear Least Squares
- What is the nonlinear least-squares problem? $$ \min_{\boldsymbol{x}} \varphi(\boldsymbol{x}) = \tfrac{1}{2}\|\boldsymbol{r}(\boldsymbol{x})\|_2^2 $$
- Derive the gradient of $\varphi$: $\nabla\varphi = J_{\boldsymbol{r}}^T\boldsymbol{r}$.
- What is the Gauss-Newton method? How does it approximate the full Hessian? $$ J^T J\,\boldsymbol{s}_k = -J^T\boldsymbol{r}(\boldsymbol{x}_k) $$ Why is this a least-squares problem?
- When does Gauss-Newton converge well, and when does it struggle? (Large residual at the solution $\Rightarrow$ poor convergence.)
- What is Levenberg-Marquardt? $$ (J^T J + \mu_k I)\boldsymbol{s}_k = -J^T\boldsymbol{r}(\boldsymbol{x}_k) $$ What does $\mu_k\to 0$ and $\mu_k\to\infty$ correspond to?
- How does the trust-region criterion $\rho_k$ guide the choice of $\mu_k$ in Levenberg-Marquardt?
Constrained Optimization and KKT Conditions
- State the general constrained optimization problem (equality constraints $\boldsymbol{g}=\boldsymbol{0}$, inequality constraints $\boldsymbol{h}\ge\boldsymbol{0}$).
- What is the Lagrangian? $$ \mathcal{L}(\boldsymbol{x},\boldsymbol{\lambda}_g,\boldsymbol{\lambda}_h) = f(\boldsymbol{x}) - \boldsymbol{\lambda}_g^T\boldsymbol{g}(\boldsymbol{x}) - \boldsymbol{\lambda}_h^T\boldsymbol{h}(\boldsymbol{x}) $$
- State the KKT necessary conditions:
- $\nabla_{\boldsymbol{x}}\mathcal{L} = \boldsymbol{0}$
- $\boldsymbol{g}(\boldsymbol{x}^*) = \boldsymbol{0}$
- $\boldsymbol{h}(\boldsymbol{x}^*) \ge \boldsymbol{0}$
- $\boldsymbol{\lambda}_h^* \ge \boldsymbol{0}$
- Complementarity: $\boldsymbol{h}(\boldsymbol{x}^*)\cdot\boldsymbol{\lambda}_h^* = 0$
- What does it mean for a constraint to be active at $\boldsymbol{x}^*$?
- Why does the sign of the multiplier matter for inequality constraints?
- What is the constraint qualification, and what can go wrong when it fails? (e.g., two active constraints with parallel gradients $\Rightarrow$ multipliers not unique or KKT fails to be necessary.)
- What is Farkas' Lemma and how does it underpin KKT?
- What computational approach solves the KKT equations? (Apply Newton's method to the a subset of the KKT conditions.)
Interpolation: Setup and Vandermonde Matrices
- What is the interpolation problem? Given nodes $(x_i)$ and values $(y_i)$, find $f$ with $f(x_i)=y_i$.
- How is the problem made unique? (Pick a basis $\{\varphi_j\}$, set $N_\text{func}=N$, solve the Vandermonde system $V\boldsymbol{\alpha}=\boldsymbol{y}$.)
- What is the Vandermonde matrix $V_{ij} = \varphi_j(x_i)$?
- What happens to $V$ when a node is repeated? (Two rows become identical $\Rightarrow$ $V$ is singular.)
- What is the Lebesgue constant $\Lambda$, and what does it measure?
- How does $\Lambda$ connect interpolation to best-approximation? $\|f - Pf\|_\infty \le (1+\Lambda)\|f-q\|_\infty$ for any polynomial $q$.
Interpolation Bases
- What are the Lagrange basis polynomials $\varphi_j$? (They make $V=I$; $\varphi_j(x_i)=\delta_{ij}$.)
- Write down the Lagrange interpolant for nodes $(x_i)$ and values $(y_i)$.
- What is the Newton (triangular) basis? How are its coefficients found? ($O(n^2)$ via forward substitution.)
- Why are monomial bases on equispaced nodes problematic? (Near-linear dependence $\Rightarrow$ ill-conditioned $V$.)
- What are orthogonal polynomials? How are Legendre polynomials constructed? (Gram-Schmidt on monomials under $\langle f,g\rangle=\int_{-1}^1 fg\,dx$.)
- State the three-term recurrence for Legendre polynomials.
- What are Chebyshev polynomials? Give three equivalent definitions.
- What are the Chebyshev nodes (extrema and roots of $T_k$)? Why are edge-clustered nodes beneficial?
- Why is Chebyshev interpolation preferred over equispaced polynomial interpolation?
Interpolation Error
- State the interpolation error formula: $$ f(x) - p_{n-1}(x) = \frac{f^{(n)}(\xi)}{n!}\prod_{i=1}^n(x-x_i) $$
- What is the simplified error bound when $h=x_n-x_1$? $$ \max_x|f(x)-p_{n-1}(x)| \le C M h^n, \quad M=\max|f^{(n)}| $$
- How do Chebyshev nodes minimize $\max_x|\prod(x-x_i)|$?
Piecewise Interpolation and Splines
- What is piecewise linear interpolation? What is its error order? ($O(h^2)$)
- What is a natural cubic spline? How many unknowns and equations are there?
- What conditions are imposed at interior nodes for a cubic spline? (Continuity of $f$, $f'$, $f''$.)
- What are the "extra" boundary conditions for a natural spline? ($f''=0$ at the endpoints.)
- What are B-splines?
Quadrature: Interpolatory/Newton-Cotes
- Derive interpolatory quadrature from the Lagrange interpolant: $$ \int_a^b f\,dx \approx \sum_i \omega_i f(x_i), \qquad \omega_i = \int_a^b \ell_i(x)\,dx $$
- What is the method of undetermined coefficients for finding quadrature weights?
- To what polynomial degree are the midpoint, trapezoidal, and Simpson's rules exact? (Midpoint: 1; Trapezoidal: 1; Simpson's: 3.)
- Why do rules with an odd number of nodes gain an extra degree of exactness? (Cancellation of odd-order error terms by symmetry.)
- What are the drawbacks of Newton-Cotes quadrature at high order? (Possible large negative weights $\Rightarrow$ instability; Runge-like divergence.)
Quadrature: Accuracy and Stability
- Derive the accuracy bound for interpolatory quadrature: using the interpolation error, $$ \left|\int_a^b f\,dx - \sum_i\omega_i f(x_i)\right| \le C h^{n+1}\|f^{(n)}\|_\infty $$ (quadrature gains one order over interpolation).
- Why do quadratures with large negative weights have poor stability?
- What is Gaussian quadrature? How many points does it need to integrate degree-$(2n-1)$ polynomials exactly? ($n$ points.)
- How are Gaussian quadrature nodes related to orthogonal polynomials?
Composite Quadrature
- What is composite quadrature? Why is it preferred over a single high-order rule?
- Derive the error bound for composite quadrature using $m$ panels of length $h$: $$ \left|\int_a^b f - \text{composite}\right| \le C\|f^{(n)}\|_\infty h^n(b-a) $$ (composite loses one order vs. non-composite).
- What is adaptivity? Why is it useful?
Numerical Differentiation and Finite Differences
- Why is numerical differentiation harder than integration? (Ill-conditioned; amplifies noise; subject to cancellation.)
- What is the accuracy order of numerical differentiation with $n$ interpolation points? ($O(h^{n-1})$ — one order lower than interpolation.)
Differentiation Matrices
- How is numerical differentiation cast as a matrix-vector operation? $$ D = V' V^{-1}, \qquad f'(\boldsymbol{x}) \approx D f(\boldsymbol{x}) $$ where $V'_{ij} = \varphi_j'(x_i)$.
- Does $D$ depend on the function being differentiated? (No. Depends on the nodes and the interpolation basis.)
- What is the nullspace of $D$? (Constant vectors; rows of $D$ always sum to zero.)
- How do you compute second derivatives with $D$? ($D^2$.)
- How does $D$ change under a shift of the nodes? (It does not: $D_{\boldsymbol{x}+c} = D_{\boldsymbol{x}}$.)
- How does $D$ change under a scaling of the nodes? ($D_{\alpha\boldsymbol{x}} = D_{\boldsymbol{x}}/\alpha$.)
- How do rows of $D$ relate to finite difference formulas? (Row $i$ gives the FD weights for $f'(x_i)$.)
- How is a single small $D$ (e.g. $3\times 3$) applied to a large equispaced grid? (First row for left boundary, interior row for interior points, last row for right boundary.)