Naive Bayes
Mis à jour :
Naïve Bayes is a generative learning algorithm for discrete valued input. In particular, it is known to work great on texts classification tasks like spam detection.
1. Model
We represent an email via a feature vector whose length is equal to the number of words in the dictionary. Specifically, if an email contains the $i$-th word of the dictionary, then we will set $x_{i} = 1$; otherwise, we let $x_{i} = 0$
Naive Bayes assumption: The $x_{i}$’s are conditionally independent given $y$.
Under this, we obtain:
Let us set the following parameters:
Joint-likelihood:
Maximizing with respect to our $2n+1$ parameters $\phi_{i\mid y=1}, \phi_{i\mid y=0}, \phi_{y}$ lead to:
In Python, this leads to:
def get_class_priors(y_train):
# Count classes
unique_labels, counts = np.unique(y_train, return_counts=True)
# Class priors
classes_prior = {}
for i, label in enumerate(unique_labels):
classes_prior[int(label)] = float(counts[i])/len(y_train)
return classes_prior
To make a prediction on a new example with features $x$
Multi class: Model $p(x_{i} | y)$ as multinomial rather than as Bernoulli. |
Simple hack: Discretize values to use this algorithm.
2. Laplace smoothing
TL; DR: Naive Bayes algorithm works well for many problems, but there is a simple change that makes it work much better, especially for text classification.
Problem: The first time Naïve Bayes see a word, it thinks the probability of seeing it in either type class.
Furthermore it is statistically a bad idea to estimate the probability of some event to be zero just because you haven’t seen it before in your finite training set.
Laplace smoothing: Change the Maximum likelihood estimate:
where $k$ is the number of classes.
Notes:
- Under certain (arguably quite strong) conditions, it can be shown that the Laplace smoothing actually gives the optimal estimator of the $\phi_{j}$’s. (What conditions?)
- Naïve Bayes with Laplace smoothing:
Laisser un commentaire