Logistic Regression
Mis à jour :
TL; DR: Logistic Regression is a simple but oftentimes efficient algorithm that tackles binary classification.
1. Logistic regression
Its basic principle is to cast the result of a least square regressor $\beta^{T}x$ through a sigmoïd function $g$, which squash the values toward $[0, 1]$:
Note: The derivative $g’(z) = g(z)(1-g(z))$ has the obvious advantage of being analytical, thereby making computations much faster.
Probabilistic framework: We endow our classification model with a set of probabilistic assumptions.
By maximizing the log-likelihood for $m$ training examples, we obtain:
TODO:
- Add derivation problem + sol.
- Add vectorization of the computations
- Precise terminology used for the formula
Stochastic gradient ascent rule:
Notes:
- The perceptron, is the logistic regression when you “force” its output values to be either $0$ or $1$ or exactly.
Newton method
Newton’s algorithm: For a function $f : \mathbb{R} \rightarrow \mathbb{R}$ we want to minimize at $\beta: f(\beta) = 0$, we start at an initial value $\beta$ and iterate:
Key idea: If we want to maximize some function, use Newton’s algorithm with the derivates to find the maximum:
Newton-Raphson method (Multidimensional setting):
where $H$ is the Hessian matrix matrix.
Notes:
- Newton’s method typically enjoys faster convergence than (batch) gradient descent
- Because of the inverse Hessian, Newton Method is more expensive than gradient descent.
- When Newton’s method is applied to maximize the logistic regres- sion log likelihood function l(θ), the resulting method is also called Fisher scoring.
Laisser un commentaire