Recurrent Neural Networks
Mis à jour :
A major limitation of Vanilla Neural Network and Convolutional Neural Networks is that API is rather constrained:
- They accept a fixed-sized vector as input (e.g. an image) and output (e.g. probabilities of different classes)
- These models perform this mapping using a fixed amount of computational steps (e.g. the number of layers in the model).
Their internal structure is therefore ill-suited for data shaped in sequences (text, audio…).
The enthusiasm around the Recurrent Neural Network is justified as they allow us to operate over sequences of vectors (in the input and/or the output). Here is a non-exhaustive taxonomy of the structures RNN can have:
Note: No pre-specified constraints on the lengths sequences because the recurrent transformation (green) is fixed and can be applied as many times as we like.
This structure us much more powerful than the fixed network setting. RNNs combine the input vector with their state vector with a fixed (but learned) function to produce a new state vector. Intuitively, the RNN can be perceived as running a fixed program with certain inputs and some internal variables (i.e. they describe programs). Theory states that RNNs are Turing-Complete in the sense that they can to simulate arbitrary programs (with proper weights)
If training vanilla neural nets is optimization over functions, training recurrent nets is optimization over programs.
Sequential processing in absence of sequences.The takeaway is that even if your data is not in form of sequences, you can still formulate and train powerful models that learn to process it sequentially. You’re learning stateful programs that process your fixed-sized data.
Sources
The Unreasonable Effectiveness of Recurrent Neural Networks
Laisser un commentaire