Logistic Regression

1 minute de lecture

Mis à jour :

TL; DR: Logistic Regression is a simple but oftentimes efficient algorithm that tackles binary classification.

1. Logistic regression

Its basic principle is to cast the result of a least square regressor $\beta^{T}x$ through a sigmoïd function $g$, which squash the values toward $[0, 1]$:

Sigmoïd function

Note: The derivative $g’(z) = g(z)(1-g(z))$ has the obvious advantage of being analytical, thereby making computations much faster.

Probabilistic framework: We endow our classification model with a set of probabilistic assumptions.

By maximizing the log-likelihood for $m$ training examples, we obtain:

TODO:

  • Add derivation problem + sol.
  • Add vectorization of the computations
  • Precise terminology used for the formula

Stochastic gradient ascent rule:

Notes:

  • The perceptron, is the logistic regression when you “force” its output values to be either $0$ or $1$ or exactly.

Newton method

Newton’s algorithm: For a function $f : \mathbb{R} \rightarrow \mathbb{R}$ we want to minimize at $\beta: f(\beta) = 0$, we start at an initial value $\beta$ and iterate:

Key idea: If we want to maximize some function, use Newton’s algorithm with the derivates to find the maximum:

Newton-Raphson method (Multidimensional setting):

where $H$ is the Hessian matrix matrix.

Notes:

  • Newton’s method typically enjoys faster convergence than (batch) gradient descent
  • Because of the inverse Hessian, Newton Method is more expensive than gradient descent.
  • When Newton’s method is applied to maximize the logistic regres- sion log likelihood function l(θ), the resulting method is also called Fisher scoring.

Sources


Laisser un commentaire