Basics of Deep Learning

1 minute de lecture

Mis à jour :

Activation functions

 Logistic sigmoidHyperbolic TangentReLU
Definition$g(z) = \frac{1}{1 + exp(-z)}$$g(z) = tanh(z)$$g(z) = max (0, z)$
SaturationSaturationSaturationNo saturation
UsefullnessFirst activation functionFirst activation functionWorks well in practice
At 0 $\approx id$Non differentiable

Generalizations of ReLU: perform sometimes better.

  • Absolute value rectification: $g(z) = \mid z \mid$ useful for features invariant to polarity change.
  • Leaky ReLU: $g(z) = \text{max}(0, z) + \alpha \text{ min}(0, z)$ with $\alpha \approx 0.001$ fixed
  • Parametric ReLU (PReLU): $g(z) = \text{max}(0, z) + \alpha \text{ min}(0, z)$ where $\alpha$ is learned

Loss Function

Negative log-likelihood

Key idea: Maximizing the likelihood / minimizing the negative log-likelihood:

where:

  • $x$: network input
  • $y$: $x$’s label / expected value for $x$
  • $p_{data}$ is the distribution of $(x, y)$ for a training set.
  • $p_{model}(y \mid x):$ how we compute the probability for a value $y$ from the network output for $x$

Note: Negative log-likelihood = cross-entropy between the training data and the model distributions.

Mean Squared Error

If we choose $p_{model}(y\mid x) = \mathcal{N}(y; f(x, \theta), I)$, the negaive log-likelihood becomes:

Keras: Final layer should be linear:

model.add(Dense(n)) # no activation function
model.compile(loss = 'mean_squared_error', optimizer = ..)

Categorical cross-entropy

Context (Multi-class classification) $y$ can take integer values in $[0, n[$

Goal: Find $p$ with $p_{i} = p_{model}(y = i \mid x)$ such that $p_{i} \in [0, 1]$ and $\sum p_{i} = 1$

Softmax is a soft binarization of “maximum values returns 1, other values return 0”. Namely:

The loss function thus becomes:

Intuition:

  1. Squashes a vector of size $n$ between $0$ and $1$.
  2. Normalization $\Rightarrow$ sum of this whole vector equates $1$.
  3. Output of the softmax are the probabilities that the sample belongs to a certain class.

Keras:

from keras.utils import np_utils
y_train = np_utils.to_categorical(y_train, nb_classes)
model.add(Dense(n, activation = 'softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer = ..)
model.fit(X_train, Y_train, .. )

Generalization

We want to perform well on new, previously unseen inputs (generalization). Therefore, the quantity we truly want to minimize is the test error. More details about how to handle this can be found here.


Laisser un commentaire