Lecture 1: Introduction
(1). Examples of Machine Learning Applications
- Learning associations
- Supervised learning
- Classification
- Regression
- Unsupervised learning
- Reinforcement learning
(2). Illustrative Example: Polynomial Curve Fitting
Regression problem:
- input variable: x
- target variable: t
- To find a curve that fit these points
Polynomial function for fitting data:
The function for x is nonlinear but for $\textbf{w}$ is linear.
Error function (loss):
Our aim is to minimize the loss function to fit the curve. But sometimes we will get overfitting which we can detect by calculating the RMS error.
Regularization:
We can rewrite the loss function as:
In order to restrict the parameters we get by optimize the loss function to get rid of overfitting.
As $\lambda$ goes bigger, the RMS error goes down but when $\lambda$ is big enough, the RMS error goes up.
(3). Something about classification
Training set $\chi$: from where the model learn the informations
Class $\mathcal{C}$: the expected best fitted range of a specific class of data.
Hypothesis Class $\mathcal{H}$: The learning algorithm finds a hypothesis $h\in\mathcal{H}$ to approximate the class $\mathcal{C}$.
Empirical Error: $\mathcal{C}(x)$ is usually unknown, so we use empirical error to test how well $h(x)$ matches $\mathcal{C}(x)$.
Version Space:
- Most specific hypothesis $\mathcal{S}$: tightest rectangle in $\mathcal{H}$ that includes all positive example but no negative example.
- Most general hypothesis $\mathcal{G}$: largest rectangle in $\mathcal{H}$ that includes all positive example but no negative example.
- Version space: the set in $\mathcal{H}$ between $\mathcal{S}$ and $\mathcal{G}$.
Vapnik-Chervonenkis (VC) Dimension:
N points can lay out $2^N$ ways as +/-. So the definition of VC dimension of a hypothesis $\mathcal{H}$ is: the maximum number of points that can be shattered by $\mathcal{H}$.
In other words: if all the $2^N$ ways can find a $h\in \mathcal{H}$ that can separate the these $N$ points by +/- correctly, then the maximum value of $N$ is the VC dimension of $\mathcal{H}$.
An example:
Multiple Classes: A K-class classification problem can be considered as K 2-classes problems.
(4). Something about regression
(5). Dimensions of Supervised Learning Algorithm
Model:
Loss function:
Optimization procedure/algorithm: