Probabilistic classifiers
Mis à jour :
Discriminative learning algorithms are algorithms that try to learn $p(y | x)$ directly (such as logistic regression), or algorithms that try to learn mappings directly from the space of inputs $X$ to the labels ${0, 1}$, (such as the perceptron algorithm). |
Generative learning algorithms: Algorithms that instead try to model $p(x | y)$ (and $p(y)$). |
Bayes: After modeling the class priors $p(y)$ and $p(x | y)$, our algorithm, we can then use Bayes rule to derive the posterior distribution on y$ given $x$ |
Note: $p(x) = p(x | y = 1)p(y = 1) + p(x | y = 0)p(y = 0)$ |
Actually, if were calculating $p(y | x)$ in order to make a prediction, then we don’t actually need to calculate the denominator: |
1. Gaussian Discriminant Analysis
Context: $p(x | y)$ is distributed according to a multivariate normal distribution. |
1.1. The multivariate normal distribution
Multivariate normal distribution in $n$-dimensions is parameterized by a mean vector $\mu ∈ \mathbb{R}^{n}$ and a covariance matrix $\Sigma ∈ \mathbb{R}^{n\times n}$, where $\Sigma \geq 0$ is symmetric and positive semi-definite:
Covariance: of a vector-valued random variable $Z$ is defined as $Cov(Z) = E[(Z − E[Z])(Z − E[Z])^{T} ]$.
Note: Equivalently $Cov(Z) = E[ZZ^{T}] − (E[Z])(E[Z])^{T}$
Takeway: $X ∼ N (\mu, \Sigma) \Rightarrow Cov(X) = \Sigma$
1.2. The Gaussian Discriminant Analysis model
Assume our data obey to the following:
Best parameters (Log likelihod estimation)
2. Discussion: GDA and logistic regression
Link to logistic regression If we view the quantity $p(y = 1 | x; \phi, \mu_{0}, \mu_{1}, \Sigma)$ as a function of x, then we find that: |
where $\beta$ is some appropriate function of $\phi, \Sigma, \mu_{0}, \mu_{1}$.
Result: $p(x | y)$ is multivariate gaussian (with shared $\Sigma$) $\Rightarrow p(y | x)$ follows a logistic function. |
Notes:
- Assumptions:
- GDA makes stronger modeling assumptions about the data than does logistic regression.
- When these modeling assumptions are correct, GDA is asymptotically efficient.
- Even for small training set sizes, we would generally expect GDA to better
- With weaker assumptions, logistic regression is more robust.
- Non-Gaussian data:
- Logistic regression also work well on Poisson data like this.
- GDA don’t work well with non gaussian data.
Laisser un commentaire