Lecture 2: Math Review
Linear Algebra Related
Gradient Vector
Properties
- $\nabla_{\mathbf{x}}(\mathbf{x^T\mathbf{y}}) = \nabla_{\mathbf{x}}(\mathbf{y}^T\mathbf{x}) = \mathbf{y}$
- $\nabla_{\mathbf{x}}(\mathbf{x}^T\mathbf{x}) = 2\mathbf{x}$
- $\nabla_{\mathbf{x}}(\mathbf{x}^T\mathbf{A}\mathbf{y}) = \mathbf{A}\mathbf{y}$
- $\nabla_{\mathbf{x}}(\mathbf{y}^T\mathbf{A}\mathbf{x}) = \mathbf{A}^T\mathbf{y}$
- $\nabla_{\mathbf{x}}(\mathbf{x}^T\mathbf{A}\mathbf{x}) = \mathbf{A}\mathbf{x}+\mathbf{A}^T\mathbf{x}$
Eigenvalue $\lambda$ & Eigenvector $\mathbf{v}$
$\lambda$ is called eigenvalue, $\mathbf{v}$ is called eigenvector.
Probability Related
Useful formula
$P(B|A) = \frac{P(A\cap B)}{P(A)}$
$P(A\cap B) = P(A)P(B|A)$
Formula of total probability:
Bayes Theorem:
PDF, CDF in discrete and continuous form.
Moments:
$r_{th}$ moment: $E(X^r)$, $\mu=E(X)$ which is the $1_{th}$ moment
$r_{th}$ central moment: $\mu_r = E((x-\mu)^r)$, $\mu_0=1, \mu_1=0, \mu_2 = \sigma$
Optimization Related
Optimization Problem contains three parts: Object function; Var that you should optimize; Constraints
minimize $\mathbf{x}$ $f_0(\mathbf{x})$ (objective function)
Subject to $f_i(\mathbf{x}), i=1,…,m$ (inequality constraints)
$h_i(\mathbf{x}), i=1,…,m$ (equality constraints)
Active constraint
A constraint is active at $\mathbf{x}$ is that $\mathbf{x}$ is on the boundary of its feasible region ($f_i(\mathbf{x})=0$)
Lagrangian
Consider the standard optimization problem that without the equality constraints, we have a optimal value $p^*$
Assume we have Lagrangian $\mathcal{L}$:
Lagrangian Dual Function
Define the Dual Function $g$:
Lower Bound Property
If $\mathbf{\lambda}\ge0$ and $\mathbf{x}$ is primal feasible, then $g(\mathbf{\lambda})\le f_0(\mathbf{x})$ Since that $\lambda_i \ge 0$ and $f_i(\mathbf{x})\le 0$
Lagrange Dual Problem
Find the best lower bound on $p^*$:
maximize $\mathbf{\lambda}$ $g(\mathbf{\lambda})$
subject to $\mathbf{\lambda}\ge 0$
We define $d^$ as the optimal value for the dual function $g(\mathbf{\lambda})$ and we always have $d^\le p^$ since $g(\mathbf{\lambda})\le f_0(\mathbf{x})$. And $p^ - d^$ is called *optimal duality gap.
Important conclusion: For convex problems, we (usually) have $p^ = d^$
Optimize the convex problem (examples can be seen in slides)
- Find the function that we should optimize and find the constraints
- Find the dual function
- maximize the dual function which is the same as minimizing the primal function.
Duality in Algorithm
It can be used to find the stop of the iteration by doing:
Maximizing $g(\mathbf{\lambda})$ when minimizing $f_0(\mathbf{x})$, until the duality gap reach some threshold.
Complementary Slackness
Suppose $\mathbf{x}^, \mathbf{\lambda}^$ are optima, dual optimal with zero duality gap, then we have:
Since that $\lambda_i^ \ge 0$ and $f_i(\mathbf{x^})\le0$, so that $\lambda_i^f_i(\mathbf{x^}) = 0$
So we can know that if $\lambda_i^ = 0$, then $i_{th}$ constraint is inactive. Also if $\lambda_i^>0$, then $i_{th}$ constraint is active.
KKT Optimality Conditions
Suppose that $f_i$ are differentiable and $\mathbf{x^}, \mathbf{\lambda^}$ are optimal with zero duality gap.
From Complementary Slackness, we know that :
We know that $\mathbf{x^*}$ optimize $\mathcal{L(\mathbf{x}, \mathbf{\lambda})}$.
So we know that: $\nabla_{\mathbf{x}}f_0(\mathbf{x^})+\sum_i\mathbf{\lambda^}\nabla_{\mathbf{x}}f_i(\mathbf{x^*}) = 0$
Conclusion on KKT