Chapter 22
This chapter brings together a great deal of what we have studied so far in this course.
The goal is to be able to classify critical points of functions of any number of variables.
Recall Taylor series of a smooth function $f$ in $n$ variables about a point $\mathbf{x}=\mathbf{x}_0$ is given by \begin{align*} f(\mathbf{x})=&\;f(\mathbf{x}_0)+(\nabla f(\mathbf{x}_0))^T(\mathbf{x}-\mathbf{x}_0)+\frac{1}{2}(\mathbf{x}-\mathbf{x}_0)^TH(\mathbf{x}_0)(\mathbf{x}-\mathbf{x}_0)\\ &\;+\langle \text{ higher order terms}\rangle, \end{align*}
Recall Taylor series of a smooth function $f$ in $n$ variables about a point $\mathbf{x}=\mathbf{x}_0$ is given by \begin{align*} f(\mathbf{x})=&\;f(\mathbf{x}_0)+(\nabla f(\mathbf{x}_0))^T(\mathbf{x}-\mathbf{x}_0)+\frac{1}{2}(\mathbf{x}-\mathbf{x}_0)^TH(\mathbf{x}_0)(\mathbf{x}-\mathbf{x}_0)\\ &\;+\langle \text{ higher order terms}\rangle, \end{align*}
where $\mathbf{x}= \left( \begin{array}{c} x_1\\x_2\\\vdots \\ x_n \end{array}\right)\;$
Recall Taylor series of a smooth function $f$ in $n$ variables about a point $\mathbf{x}=\mathbf{x}_0$ is given by \begin{align*} f(\mathbf{x})=&\;f(\mathbf{x}_0)+(\nabla f(\mathbf{x}_0))^T(\mathbf{x}-\mathbf{x}_0)+\frac{1}{2}(\mathbf{x}-\mathbf{x}_0)^TH(\mathbf{x}_0)(\mathbf{x}-\mathbf{x}_0)\\ &\;+\langle \text{ higher order terms}\rangle, \end{align*}
where $\mathbf{x}= \left( \begin{array}{c} x_1\\x_2\\\vdots \\ x_n \end{array}\right)\;$
$$ \text{and }\,H(\mathbf{x}_0)=\left( \begin{array}{cccc} \frac{\partial^2 f}{\partial x_1 \partial x_1}(\mathbf{x}_0)&\frac{\partial^2 f}{\partial x_1 \partial x_2}(\mathbf{x}_0)&\dots \frac{\partial^2 f}{\partial x_1 \partial x_n}(\mathbf{x}_0)\\ \frac{\partial^2 f}{\partial x_2 \partial x_1}(\mathbf{x}_0)&\frac{\partial^2 f}{\partial x_2 \partial x_2}(\mathbf{x}_0)&\dots \frac{\partial^2 f}{\partial x_2 \partial x_n}(\mathbf{x}_0)\\ \vdots&\vdots&\vdots\\ \frac{\partial^2 f}{\partial x_n \partial x_1}(\mathbf{x}_0)&\frac{\partial^2 f}{\partial x_n \partial x_2}(\mathbf{x}_0)&\dots \frac{\partial^2 f}{\partial x_n \partial x_n}(\mathbf{x}_0) \end{array} \right).$$
Recall Taylor series of a smooth function $f$ in $n$ variables about a point $\mathbf{x}=\mathbf{x}_0$ is given by \begin{align*} f(\mathbf{x})=&\;f(\mathbf{x}_0)+(\nabla f(\mathbf{x}_0))^T(\mathbf{x}-\mathbf{x}_0)+\frac{1}{2}(\mathbf{x}-\mathbf{x}_0)^TH(\mathbf{x}_0)(\mathbf{x}-\mathbf{x}_0)\\ &\;+\langle \text{ higher order terms}\rangle, \end{align*}
where $\mathbf{x}= \left( \begin{array}{c} x_1\\x_2\\\vdots \\ x_n \end{array}\right)\;$ and $\;H(\mathbf{x}_0)=\left( \begin{array}{cccc} \frac{\partial^2 f}{\partial x_1 \partial x_1}(\mathbf{x}_0)&\frac{\partial^2 f}{\partial x_1 \partial x_2}(\mathbf{x}_0)&\dots \frac{\partial^2 f}{\partial x_1 \partial x_n}(\mathbf{x}_0)\\ \frac{\partial^2 f}{\partial x_2 \partial x_1}(\mathbf{x}_0)&\frac{\partial^2 f}{\partial x_2 \partial x_2}(\mathbf{x}_0)&\dots \frac{\partial^2 f}{\partial x_2 \partial x_n}(\mathbf{x}_0)\\ \vdots&\vdots&\vdots\\ \frac{\partial^2 f}{\partial x_n \partial x_1}(\mathbf{x}_0)&\frac{\partial^2 f}{\partial x_n \partial x_2}(\mathbf{x}_0)&\dots \frac{\partial^2 f}{\partial x_n \partial x_n}(\mathbf{x}_0) \end{array} \right).$
Note that $H(\mathbf{x}_0)=H(\mathbf{x}_0)^T$, i.e. $H(\mathbf{x}_0)$ is a real symmetric matrix.
In the following, let $f:\mathbb{R}^n\longrightarrow \mathbb{R}$.
Definition 1. A point $\mathbf{x}_0$ is said to be a critical point if $\nabla f(\mathbf{x}_0)=\mathbf{0}$ or $\nabla f(\mathbf{x}_0)$ is undefined.
Definition 2.
A critical point $\mathbf{x}_0$ satisfying
$
\nabla f(\mathbf{x}_0)=\mathbf{0}
$
is a local maximum if there exists some $\epsilon > 0$ such that
$\qquad \quad f(\mathbf{x}_0) \ge f(\mathbf{x})$
for all $\mathbf{x}$ such that $||\mathbf{x}-\mathbf{x}_0||\lt \epsilon.$
In the following, let $f:\mathbb{R}^n\longrightarrow \mathbb{R}$.
Definition 3.
A critical point $\mathbf{x}_0$ satisfying
$
\nabla f(\mathbf{x}_0)=\mathbf{0}
$
is a local minimum if there exists some $\epsilon > 0$ such that
$\qquad \quad f(\mathbf{x}_0) \le f(\mathbf{x})$
for all $\mathbf{x}$ such that $||\mathbf{x}-\mathbf{x}_0||\lt \epsilon.$
Definition 4. A critical point $\mathbf{x}_0$ satisfying $ \nabla f(\mathbf{x}_0)=\mathbf{0} $ is a saddle point if it is neither a local maximum nor a local minimum, i.e. there exist $\mathbf{x}_1, \mathbf{x}_2$ around $\mathbf{x}_0$ such that $||\mathbf{x}_1-\mathbf{x}_0||<\epsilon,~~||\mathbf{x}_2-\mathbf{x}_0||<\epsilon$ for some $\epsilon > 0$, such that $$f(\mathbf{x}_1)\lt f(\mathbf{x}_0)\lt f(\mathbf{x}_2).$$
In the following, let $f:\mathbb{R}^n\longrightarrow \mathbb{R}$.
Definition 1. A point $\mathbf{x}_0$ is said to be a critical point if $\nabla f(\mathbf{x}_0)=\mathbf{0}$ or $\nabla f(\mathbf{x}_0)$ is undefined.
Definition 2.
A critical point $\mathbf{x}_0$ satisfying
$
\nabla f(\mathbf{x}_0)=\mathbf{0}
$
is a local maximum if there exists some $\epsilon > 0$ such that
$\qquad \quad f(\mathbf{x}_0) \ge f(\mathbf{x})$
for all $\mathbf{x}$ such that $||\mathbf{x}-\mathbf{x}_0||\lt \epsilon.$
Definition 3.
A critical point $\mathbf{x}_0$ satisfying
$
\nabla f(\mathbf{x}_0)=\mathbf{0}
$
is a local minimum if there exists some $\epsilon > 0$ such that
$\qquad \quad f(\mathbf{x}_0) \le f(\mathbf{x})$
for all $\mathbf{x}$ such that $||\mathbf{x}-\mathbf{x}_0||\lt \epsilon.$
Definition 4. A critical point $\mathbf{x}_0$ satisfying $ \nabla f(\mathbf{x}_0)=\mathbf{0} $ is a saddle point if it is neither a local maximum nor a local minimum, i.e. there exist $\mathbf{x}_1, \mathbf{x}_2$ around $\mathbf{x}_0$ such that $||\mathbf{x}_1-\mathbf{x}_0||<\epsilon,~~||\mathbf{x}_2-\mathbf{x}_0||<\epsilon$ for some $\epsilon > 0$, such that $$f(\mathbf{x}_1)\lt f(\mathbf{x}_0)\lt f(\mathbf{x}_2).$$
1. $f(x_1,x_2) = x_1^2+x_2^2.$ Then $ \nabla f = \left( \begin{array}{c} 2x_1\\ 2x_2 \end{array} \right)=\mathbf 0. $ Thus, the critical point at $(0,0)$ is a minimum.
1. $f(x_1,x_2) = x_1^2+x_2^2.$ The critical point at $(0,0)$ is a minimum.
2. $f(x_1,x_2) = -x_1^2-x_2^2.$ Critical point at $(0,0)$ is a maximum.
3. $f(x_1,x_2) = -x_1^2+x_2^2.$ Critical point at $(0,0)$ is a saddle point.
In MATH1052: We used the "Second derivative test" for functions of two variables.
In MATH2001/7000: We consider a variant of this test that generalises easily to higher dimensions.
Let $\mathbf{x}_0$ be a critical point satisfying $\nabla f(\mathbf{x}_0)=\mathbf{0}$ $\implies$ Taylor series about $\mathbf{x}_0$ is $$f(\mathbf{x})=f(\mathbf{x}_0)+\frac{1}{2}(\mathbf{x}-\mathbf{x}_0)^TH(\mathbf{x}_0) (\mathbf{x}-\mathbf{x}_0)+ \langle \text{ higher order terms }\rangle.$$
Let $\mathbf{x_0}$ be a critical point satisfying $\nabla f(\mathbf{x}_0)=\mathbf{0}$ $\implies$ Taylor series about $\mathbf{x}_0$ is $$f(\mathbf{x})=f(\mathbf{x}_0)+\frac{1}{2}(\mathbf{x}-\mathbf{x}_0)^TH(\mathbf{x}_0) (\mathbf{x}-\mathbf{x}_0)+ \langle \text{ higher order terms }\rangle.$$
Without loss of generality, we take $\mathbf{x}_0=\mathbf{0}$ (i.e., by shifting/translating variables if necessary). We have, $$f(\mathbf{x})=f(\mathbf{0})+\frac{1}{2}\mathbf{x}^TH\mathbf{x}+\langle \text{ higher order terms }\rangle.$$ Here $H=H(\mathbf{0}).$ Thus, the behaviour about $\mathbf{0}$ (i.e., the critical point) depends on this second order term.
Let $\mathbf{x_0}$ be a critical point satisfying $\nabla f(\mathbf{x}_0)=\mathbf{0}$ $\implies$ Taylor series about $\mathbf{x}_0$ is $$f(\mathbf{x})=f(\mathbf{x}_0)+\frac{1}{2}(\mathbf{x}-\mathbf{x}_0)^TH(\mathbf{x}_0) (\mathbf{x}-\mathbf{x}_0)+ \langle \text{ higher order terms }\rangle.$$
Without loss of generality, we take $\mathbf{x}_0=\mathbf{0}$ (i.e. by shifting/translating variables if necessary). We have, $$f(\mathbf{x})=f(\mathbf{0})+\frac{1}{2}\mathbf{x}^TH\mathbf{x}+\langle \text{ higher order terms }\rangle.$$ Here $H=H(\mathbf{0}).$ Thus, the behaviour about $\mathbf{0}$ (i.e., the critical point) depends on this second order term.
Observe that $H$ is real symmetric $\implies~H$ is orthogonally diagonalisable, i.e., there exists an orthogonal matrix $P$ such that $P^THP=D$ with some diagonal matrix $D$.
Observe that $H$ is real symmetric $\implies~H$ is orthogonally diagonalisable, i.e., there exists an orthogonal matrix $P$ such that $P^THP=D$ with some diagonal matrix $D$.
Let $\left\{ \mathbf e_1, \mathbf e_2, \ldots, \mathbf e_n\right\}$ be the orthonormal set of eigenvectors of $H$. Form the orthogonal matrix \[ P = \big ( \mathbf e_1~|~ \mathbf e_2~|~ \ldots~|~ \mathbf e_n \big). \] Then $P^THP = D,$ with $ D = \left( \begin{array}{cccc} \lambda_1 & 0 &\cdots & 0 \\ 0 & \lambda_2 & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & \lambda_n \\ \end{array} \right), \; \lambda_i\in \R. $
That is $H = PDP^T.$
It follows that $$ \mathbf{x}^TH\mathbf{x} =(\mathbf{x}^TP)D(P^T\mathbf{x})=\mathbf{y}^T D \mathbf{y} $$ i.e., the diagonalization suggests setting $\mathbf{y}=P^T\mathbf{x}$. Note that the critical point is still at $\mathbf{y}=\mathbf{0},$ because $P^T\mathbf{0}=\mathbf{0}.$
Let $F$ denote the function $f$ expressed in the new coordinates
$ \mathbf{y}= \left( \begin{array}{c} y_1\\y_2\\ \vdots \\ y_n \end{array} \right),\;$ i.e., $\;F(\mathbf{y})=f(\mathbf{x}(\mathbf{y})).$
Let $F$ denote the function $f$ expressed in the new coordinates
$ \mathbf{y}= \left( \begin{array}{c} y_1\\y_2\\ \vdots \\ y_n \end{array} \right),\;$ i.e., $\;F(\mathbf{y})=f(\mathbf{x}(\mathbf{y})).$
$\Ra ~F(\mathbf{y})$ | $=\;f(\mathbf{0})+\frac{1}{2} \mathbf{y}^TD\mathbf{y}+\langle \text{ higher order terms }\rangle$ | |
$=\;f(\mathbf{0})+\frac{1}{2}\left(\lambda_1y_1^2 +\lambda_2y_2^2+\dots +\lambda_ny_n^2\right)$ | ||
$\qquad \quad+\;\langle \text{ higher order terms }\rangle$ |
👉 quadratic form $\lambda_1y_1^2 +\dots +\lambda_ny_n^2.$
Case 1: If $\lambda_i\gt 0$ for all $i =1, 2, \ldots , n,$ then the quadratic form is strictly positive in every direction from the critical point. Thus we have a local minimum.
Case 2: If $\lambda_i\lt 0$ for all $i =1, 2, \ldots , n,$ then the quadratic form is strictly negative in every direction from the critical point. We have a local maximum.
👉 quadratic form $\lambda_1y_1^2 +\dots +\lambda_ny_n^2.$
Case 3: If any pair of $\lambda_i, \lambda_j$ have opposite sign for $i\neq j$, then the quadratic form is positive in some direction and negative in others. We have a saddle.
Case 4: If all non-zero $\lambda_i$ have same sign but there are some $\lambda_i=0,$ then we can not indentify the type of critical point. The test is inconclusive.
👉 quadratic form $\lambda_1y_1^2 +\dots +\lambda_ny_n^2.$
Case 1: If $\lambda_i\gt 0$ for all $i =1, 2, \ldots , n,$ then the quadratic form is strictly positive in every direction from the critical point. We have a local minimum.
Case 2: If $\lambda_i\lt 0$ for all $i =1, 2, \ldots , n,$ then the quadratic form is strictly negative in every direction from the critical point. We have a local maximum.
Case 3: If any pair of $\lambda_i, \lambda_j$ have opposite sign for $i\neq j$, then the quadratic form is positive in some direction and negative in others. We have a saddle.
Case 4: If all non-zero $\lambda_i$ have same sign but there are some $\lambda_i=0,$ then we can not indentify the type of critical point. The test is inconclusive.
From MATH1052: Complete the square in $x$: \[ Q = a \left[ \left(x+\frac{b}{2a}y\right)^2+ \frac{4ac -b^2}{4a^2}y^2 \right] \] \[ = a \left[ \left(x+\frac{b}{2a}y\right)^2+ \frac{D}{4a^2}y^2 \right]\quad \] \[ = a \left[ u^2+ \frac{D}{4a^2}v^2 \right]\qquad\qquad\;\; \]
Thus we have \[ Q= a \left[ u^2+ \frac{D}{4a^2}v^2 \right],\;\; D = 4ac-b^2, u = x+ \frac{b}{2a}y, v = y. \]
Case 1. $a\gt 0, D\gt 0:$ minimum
Case 2. $a\lt 0, D\gt 0:$ maximum
Case 3. $D\lt 0:$ saddle
Case 4. $D = 0:$ inconclusive
In MATH2001: We analyse the same expression but using eigenvalues. Consider $Q$ written as follows:
\[ Q = \frac{1}{2} \left( \begin{array}{cc} x & y \end{array} \right) \left( \begin{array}{cc} 2a & b\\ b & 2c \end{array} \right) \left( \begin{array}{c} x \\ y \end{array} \right) \]
\[ Q = \frac{1}{2} \left( \begin{array}{cc} x & y \end{array} \right) \underbrace{\left( \begin{array}{cc} 2a & b\\ b & 2c \end{array} \right)}_{{\Large H}} \left( \begin{array}{c} x \\ y \end{array} \right) \]
Thus $0 = \text{det}(H-\lambda I)$ $ =\text{det} \left( \begin{array}{cc} 2a - \lambda & b\\ b & 2c - \lambda \end{array} \right). $
$\Ra$ $\lambda_{\pm} = (a+c)\pm \sqrt{(a+c)^2-D},\,$ where $ \,D = \text{det}(H). $
\[ Q = \frac{1}{2} \left( \begin{array}{cc} x & y \end{array} \right) \underbrace{\left( \begin{array}{cc} 2a & b\\ b & 2c \end{array} \right)}_{{\Large H}} \left( \begin{array}{c} x \\ y \end{array} \right) \]
$\Ra$ $\lambda_{\pm} = (a+c)\pm \sqrt{(a+c)^2-D},\,$ where $ \,D = \text{det}(H). $
$ Q = \frac{1}{2}\left(\lambda_{+}\zeta_{+}^2 + \lambda_{-}\zeta_{-}^2\right), $ $\;\; \left( \begin{array}{c} \zeta_{+} \\ \zeta_{-} \end{array} \right) = P^T \left( \begin{array}{c} x \\ y \end{array} \right). $
$ Q = \frac{1}{2}\left(\lambda_{+}\zeta_{+}^2 + \lambda_{-}\zeta_{-}^2\right), $ $\;\; \left( \begin{array}{c} \zeta_{+} \\ \zeta_{-} \end{array} \right) = P^T \left( \begin{array}{c} x \\ y \end{array} \right). $
Case 1. $D\lt 0$ $\Ra$ $\lambda_{+}, \lambda_{-}$ have opposite sign $\Ra$ Saddle.
Case 2. $D= 0$ $\Ra$ $ \lambda_{+}=0$ or $\lambda_{-} = 0$ $\Ra$ Inconclusive.
Case 3. $D\gt 0$ $\Ra$ $4ac-b^2\gt 0$ $\Ra$ $ac \gt 0$.