Linear Discriminant Analysis

2 minute de lecture

Mis à jour :

LDA is closely related to PCA as it is linear.

  • PCA, the transformation is baed on minimizing mean square error between original data vectors and data vectors that can be estimated fro the reduced dimensionality data vectors

  • LDA, the transformation is based on maximizing a ratio of “between-class variance” to “within-class variance” with the goal of reducing data variation in the same class and increasing the separation between classes.

Here is the graphic intuition:

LDA vs PCA

Goals:

  1. Obtain a scalar y by projecting the samples X onto a line:
  1. Find $\beta^{*}$ to maximize the ratio of “between-class variance” to “within-class variance”.

1. Two classes problem

1.1. Head the problem

Mean-class vector: Mean vector of each class with its projection

Between-class variance: a measure of separation between two classes is to choose the distance between the projected means, which is in y-space

Within-class variance: Variance of the elements of the $k$-th class relative to its mean

Objective function: We are looking for a projection where examples from the class are projected very close to each other and at the same time, the projected means are as farther apart as possible. Naturally:

1.2. Transform the problem

Goal: We need to express $J(\beta)$ as a function of $\beta$:

Scatter in feature space-x: Measure of how the points are spread away from the mean of a given class

Within-class scatter matrix: Total scatter of all classes

Between-class scatter matrix: Distance Matrix of the class means

Objective function reformulation: With these notations, we obtain:

1.3. Solve the problem

Fisher’s linear discriminant: By setting the derivative to zero we obtain:

2. Multi class

Goal: seek $(C − 1)$ projections $[y_{1}, y_{2}, . . . y_{C−1}]$ by means of $(C − 1)$ projection vectors $\beta_{i}$ arranged by columns into a projection matrix $\Theta = [\beta_{1}\beta_{2}. . .\beta_{C−1}]$, where:

2.1. Objective function

We obtain similarly (proof):

Goal: seek the projection matrix $\Theta^{*}$ that maximize this ratio.

Fisher’s criterion: is maximized when the projection matrix $\Theta^{*}$ is composed of the eigenvectors of:

and under the assumptions that $S_{W}$ is

  • Non-singular matrix
  • Invertible

LDA and dimensionality reduction: Noticed that, there will be at most $(C−1)$ eigenvectors with non-zero real corresponding eigenvalues $λ_{i}$. This is because $S_{B}$ is of rank $(C − 1)$ or less. So we can see that LDA can represent a massive reduction in the dimensionality of the problem.


Laisser un commentaire