Probabilistic classifiers

1 minute de lecture

Mis à jour :

Discriminative learning algorithms are algorithms that try to learn $p(yx)$ directly (such as logistic regression), or algorithms that try to learn mappings directly from the space of inputs $X$ to the labels ${0, 1}$, (such as the perceptron algorithm).
Generative learning algorithms: Algorithms that instead try to model $p(xy)$ (and $p(y)$).
Bayes: After modeling the class priors $p(y)$ and $p(xy)$, our algorithm, we can then use Bayes rule to derive the posterior distribution on y$ given $x$
Note: $p(x) = p(xy = 1)p(y = 1) + p(xy = 0)p(y = 0)$
Actually, if were calculating $p(yx)$ in order to make a prediction, then we don’t actually need to calculate the denominator:

1. Gaussian Discriminant Analysis

Context: $p(xy)$ is distributed according to a multivariate normal distribution.

1.1. The multivariate normal distribution

Multivariate normal distribution in $n$-dimensions is parameterized by a mean vector $\mu ∈ \mathbb{R}^{n}$ and a covariance matrix $\Sigma ∈ \mathbb{R}^{n\times n}$, where $\Sigma \geq 0$ is symmetric and positive semi-definite:

Covariance: of a vector-valued random variable $Z$ is defined as $Cov(Z) = E[(Z − E[Z])(Z − E[Z])^{T} ]$.

Note: Equivalently $Cov(Z) = E[ZZ^{T}] − (E[Z])(E[Z])^{T}$

Takeway: $X ∼ N (\mu, \Sigma) \Rightarrow Cov(X) = \Sigma$

1.2. The Gaussian Discriminant Analysis model

Assume our data obey to the following:

Best parameters (Log likelihod estimation)

2. Discussion: GDA and logistic regression

Link to logistic regression If we view the quantity $p(y = 1x; \phi, \mu_{0}, \mu_{1}, \Sigma)$ as a function of x, then we find that:

where $\beta$ is some appropriate function of $\phi, \Sigma, \mu_{0}, \mu_{1}$.

Result: $p(xy)$ is multivariate gaussian (with shared $\Sigma$) $\Rightarrow p(yx)$ follows a logistic function.

Notes:

  1. Assumptions:
    • GDA makes stronger modeling assumptions about the data than does logistic regression.
    • When these modeling assumptions are correct, GDA is asymptotically efficient.
    • Even for small training set sizes, we would generally expect GDA to better
    • With weaker assumptions, logistic regression is more robust.
  2. Non-Gaussian data:
    • Logistic regression also work well on Poisson data like this.
    • GDA don’t work well with non gaussian data.

Laisser un commentaire