Basics of Deep Learning
Mis à jour :
Activation functions
Logistic sigmoid | Hyperbolic Tangent | ReLU | |
---|---|---|---|
Definition | $g(z) = \frac{1}{1 + exp(-z)}$ | $g(z) = tanh(z)$ | $g(z) = max (0, z)$ |
Saturation | Saturation | Saturation | No saturation |
Usefullness | First activation function | First activation function | Works well in practice |
At 0 | $\approx id$ | Non differentiable |
Generalizations of ReLU: perform sometimes better.
- Absolute value rectification: $g(z) = \mid z \mid$ useful for features invariant to polarity change.
- Leaky ReLU: $g(z) = \text{max}(0, z) + \alpha \text{ min}(0, z)$ with $\alpha \approx 0.001$ fixed
- Parametric ReLU (PReLU): $g(z) = \text{max}(0, z) + \alpha \text{ min}(0, z)$ where $\alpha$ is learned
Loss Function
Negative log-likelihood
Key idea: Maximizing the likelihood / minimizing the negative log-likelihood:
where:
- $x$: network input
- $y$: $x$’s label / expected value for $x$
- $p_{data}$ is the distribution of $(x, y)$ for a training set.
- $p_{model}(y \mid x):$ how we compute the probability for a value $y$ from the network output for $x$
Note: Negative log-likelihood = cross-entropy between the training data and the model distributions.
Mean Squared Error
If we choose $p_{model}(y\mid x) = \mathcal{N}(y; f(x, \theta), I)$, the negaive log-likelihood becomes:
Keras: Final layer should be linear:
model.add(Dense(n)) # no activation function
model.compile(loss = 'mean_squared_error', optimizer = ..)
Categorical cross-entropy
Context (Multi-class classification) $y$ can take integer values in $[0, n[$
Goal: Find $p$ with $p_{i} = p_{model}(y = i \mid x)$ such that $p_{i} \in [0, 1]$ and $\sum p_{i} = 1$
Softmax is a soft binarization of “maximum values returns 1, other values return 0”. Namely:
The loss function thus becomes:
Intuition:
- Squashes a vector of size $n$ between $0$ and $1$.
- Normalization $\Rightarrow$ sum of this whole vector equates $1$.
- Output of the softmax are the probabilities that the sample belongs to a certain class.
Keras:
from keras.utils import np_utils
y_train = np_utils.to_categorical(y_train, nb_classes)
model.add(Dense(n, activation = 'softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer = ..)
model.fit(X_train, Y_train, .. )
Generalization
We want to perform well on new, previously unseen inputs (generalization). Therefore, the quantity we truly want to minimize is the test error. More details about how to handle this can be found here.
Laisser un commentaire