Lecture 1: Introduction

(1). Examples of Machine Learning Applications

  • Learning associations
  • Supervised learning
    • Classification
    • Regression
  • Unsupervised learning
  • Reinforcement learning

(2). Illustrative Example: Polynomial Curve Fitting

  • Regression problem:

    • input variable: x
    • target variable: t
    • To find a curve that fit these points
  • Polynomial function for fitting data:

    The function for x is nonlinear but for $\textbf{w}$ is linear.

  • Error function (loss):

    Our aim is to minimize the loss function to fit the curve. But sometimes we will get overfitting which we can detect by calculating the RMS error.

  • Regularization:

    We can rewrite the loss function as:

    In order to restrict the parameters we get by optimize the loss function to get rid of overfitting.

    As $\lambda$ goes bigger, the RMS error goes down but when $\lambda$ is big enough, the RMS error goes up.

(3). Something about classification

  • Training set $\chi$: from where the model learn the informations

  • Class $\mathcal{C}$: the expected best fitted range of a specific class of data.

  • Hypothesis Class $\mathcal{H}$: The learning algorithm finds a hypothesis $h\in\mathcal{H}$ to approximate the class $\mathcal{C}$.

  • Empirical Error: $\mathcal{C}(x)$ is usually unknown, so we use empirical error to test how well $h(x)$ matches $\mathcal{C}(x)$.

  • Version Space:

    • Most specific hypothesis $\mathcal{S}$: tightest rectangle in $\mathcal{H}$ that includes all positive example but no negative example.
    • Most general hypothesis $\mathcal{G}$: largest rectangle in $\mathcal{H}$ that includes all positive example but no negative example.
    • Version space: the set in $\mathcal{H}$ between $\mathcal{S}$ and $\mathcal{G}$.
  • Vapnik-Chervonenkis (VC) Dimension:

    N points can lay out $2^N$ ways as +/-. So the definition of VC dimension of a hypothesis $\mathcal{H}$ is: the maximum number of points that can be shattered by $\mathcal{H}$.

    In other words: if all the $2^N$ ways can find a $h\in \mathcal{H}$ that can separate the these $N$ points by +/- correctly, then the maximum value of $N$ is the VC dimension of $\mathcal{H}$.

    An example:

    alt

    alt

  • Multiple Classes: A K-class classification problem can be considered as K 2-classes problems.

(4). Something about regression

  • alt

(5). Dimensions of Supervised Learning Algorithm

  • Model:

  • Loss function:

  • Optimization procedure/algorithm:

评论