Naive Bayes

1 minute de lecture

Mis à jour :

Naïve Bayes is a generative learning algorithm for discrete valued input. In particular, it is known to work great on texts classification tasks like spam detection.

1. Model

We represent an email via a feature vector whose length is equal to the number of words in the dictionary. Specifically, if an email contains the $i$-th word of the dictionary, then we will set $x_{i} = 1$; otherwise, we let $x_{i} = 0$

Naive Bayes assumption: The $x_{i}$’s are conditionally independent given $y$.

Under this, we obtain:

Let us set the following parameters:

Joint-likelihood:

Maximizing with respect to our $2n+1$ parameters $\phi_{i\mid y=1}, \phi_{i\mid y=0}, \phi_{y}$ lead to:

In Python, this leads to:

def get_class_priors(y_train):
    # Count classes
    unique_labels, counts = np.unique(y_train, return_counts=True)
    # Class priors
    classes_prior = {}
    for i, label in enumerate(unique_labels):
        classes_prior[int(label)] = float(counts[i])/len(y_train)
    return classes_prior

To make a prediction on a new example with features $x$

Multi class: Model $p(x_{i}y)$ as multinomial rather than as Bernoulli.

Simple hack: Discretize values to use this algorithm.

2. Laplace smoothing

TL; DR: Naive Bayes algorithm works well for many problems, but there is a simple change that makes it work much better, especially for text classification.

Problem: The first time Naïve Bayes see a word, it thinks the probability of seeing it in either type class.

Furthermore it is statistically a bad idea to estimate the probability of some event to be zero just because you haven’t seen it before in your finite training set.

Laplace smoothing: Change the Maximum likelihood estimate:

where $k$ is the number of classes.

Notes:

  • Under certain (arguably quite strong) conditions, it can be shown that the Laplace smoothing actually gives the optimal estimator of the $\phi_{j}$’s. (What conditions?)
  • Naïve Bayes with Laplace smoothing:

Laisser un commentaire