Lecture 2: Math Review

  • Gradient Vector

    Properties

    • $\nabla_{\mathbf{x}}(\mathbf{x^T\mathbf{y}}) = \nabla_{\mathbf{x}}(\mathbf{y}^T\mathbf{x}) = \mathbf{y}$
    • $\nabla_{\mathbf{x}}(\mathbf{x}^T\mathbf{x}) = 2\mathbf{x}$
    • $\nabla_{\mathbf{x}}(\mathbf{x}^T\mathbf{A}\mathbf{y}) = \mathbf{A}\mathbf{y}$
    • $\nabla_{\mathbf{x}}(\mathbf{y}^T\mathbf{A}\mathbf{x}) = \mathbf{A}^T\mathbf{y}$
    • $\nabla_{\mathbf{x}}(\mathbf{x}^T\mathbf{A}\mathbf{x}) = \mathbf{A}\mathbf{x}+\mathbf{A}^T\mathbf{x}$
  • Eigenvalue $\lambda$ & Eigenvector $\mathbf{v}$

    $\lambda$ is called eigenvalue, $\mathbf{v}$ is called eigenvector.

  • Useful formula

    • $P(B|A) = \frac{P(A\cap B)}{P(A)}$

    • $P(A\cap B) = P(A)P(B|A)$

    • Formula of total probability:

    • Bayes Theorem:

    • PDF, CDF in discrete and continuous form.

    • Moments:

      $r_{th}$ moment: $E(X^r)$, $\mu=E(X)$ which is the $1_{th}$ moment

      $r_{th}$ central moment: $\mu_r = E((x-\mu)^r)$, $\mu_0=1, \mu_1=0, \mu_2 = \sigma$

  • Optimization Problem contains three parts: Object function; Var that you should optimize; Constraints

    • minimize $\mathbf{x}$ $f_0(\mathbf{x})$ (objective function)

    • Subject to $f_i(\mathbf{x}), i=1,…,m$ (inequality constraints)

      ​ $h_i(\mathbf{x}), i=1,…,m$ (equality constraints)

  • Active constraint

    A constraint is active at $\mathbf{x}$ is that $\mathbf{x}$ is on the boundary of its feasible region ($f_i(\mathbf{x})=0$)

  • Lagrangian

    Consider the standard optimization problem that without the equality constraints, we have a optimal value $p^*$

    Assume we have Lagrangian $\mathcal{L}$:

  • Lagrangian Dual Function

    Define the Dual Function $g$:

  • Lower Bound Property

    If $\mathbf{\lambda}\ge0$ and $\mathbf{x}$ is primal feasible, then $g(\mathbf{\lambda})\le f_0(\mathbf{x})$ Since that $\lambda_i \ge 0$ and $f_i(\mathbf{x})\le 0$

  • Lagrange Dual Problem

    Find the best lower bound on $p^*$:

    ​ maximize $\mathbf{\lambda}$ $g(\mathbf{\lambda})$

    ​ subject to $\mathbf{\lambda}\ge 0$

    We define $d^$ as the optimal value for the dual function $g(\mathbf{\lambda})$ and we always have $d^\le p^$ since $g(\mathbf{\lambda})\le f_0(\mathbf{x})$. And $p^ - d^$ is called *optimal duality gap.

    Important conclusion: For convex problems, we (usually) have $p^ = d^$

  • Optimize the convex problem (examples can be seen in slides)

    • Find the function that we should optimize and find the constraints
    • Find the dual function
    • maximize the dual function which is the same as minimizing the primal function.
  • Duality in Algorithm

    It can be used to find the stop of the iteration by doing:

    ​ Maximizing $g(\mathbf{\lambda})$ when minimizing $f_0(\mathbf{x})$, until the duality gap reach some threshold.

  • Complementary Slackness

    Suppose $\mathbf{x}^, \mathbf{\lambda}^$ are optima, dual optimal with zero duality gap, then we have:

    Since that $\lambda_i^ \ge 0$ and $f_i(\mathbf{x^})\le0$, so that $\lambda_i^f_i(\mathbf{x^}) = 0$

    So we can know that if $\lambda_i^ = 0$, then $i_{th}$ constraint is inactive. Also if $\lambda_i^>0$, then $i_{th}$ constraint is active.

  • KKT Optimality Conditions

    Suppose that $f_i$ are differentiable and $\mathbf{x^}, \mathbf{\lambda^}$ are optimal with zero duality gap.

    From Complementary Slackness, we know that :

    We know that $\mathbf{x^*}$ optimize $\mathcal{L(\mathbf{x}, \mathbf{\lambda})}$.

    So we know that: $\nabla_{\mathbf{x}}f_0(\mathbf{x^})+\sum_i\mathbf{\lambda^}\nabla_{\mathbf{x}}f_i(\mathbf{x^*}) = 0$

  • Conclusion on KKT

评论