First order scattering transform

9 minute de lecture

Mis à jour :

Much like Fourier transform expresses periodic functions as a sum of sinus and cosinus, the Wavelet transform expresses signals as a weighted sum of a special kind of functions, wavelets. Both use some inner product (scalar product and convolution) of an input signal and a given kernel / mask. The difference lies in the kernel the kernel of the transformation.

Once the weights are obtained for a given image, they can be used for any purpose: denoise it, compress it, etc. A wavelet is a mathematical function useful in digital signal processing and image compression. It is suitable to perform frequency analysis of stationary functions. In particular, they make it possible to recover weak signals from noise. What’s even more interesting is that this clean up is done without blurrying kernels or muddling the details. Here are some applications:

  • Medical image analysis Processing of X-ray and magnetic-resonance images
  • Internet communications: Compressing images to a greater extent than is generally possible with other methods (25% of the size with JPEG quality).

Quand je formatte comme ça c’est que c’est pas clair / pas bien formulé Its principle is simple: Analyze an image and convert it into a set of mathematical expressions that can then be decoded by a receiver (see modulation).

It should be noted that there are two types of wavelets transform, serving different applications:

  • Continuous Wavelet Transform time frequency analysis and filtering of time localized frequency components the key
  • Discrete Wavelet Transform denoising and compression of signals and images as it helps represent many naturally occurring signals and images with fewer coefficients this enables a sparser representation.

These two transforms differ based on how they discretize the scale and the translation parameters. We will focus on the latter in this blogpost.

1. Wavelets

A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases, and then decreases back to zero. It can typically be visualized as a “brief oscillation” like one recorded by a seismograph or heart monitor. Generally, wavelets are intentionally crafted to have specific properties that make them useful for signal processing. Using a “reverse, shift, multiply and integrate” technique called convolution, wavelets can be combined with known portions of a damaged signal to extract information from the unknown portions.

1.2. Definition of a wavelet

Swag alert: Any square integrable function $\psi \,\in \,L^{2}({\mathbb {R}})$ can be represented as a certain orthonormal series generated by a wavelet. Specifically, a function is called an orthonormal wavelet if it can be used to define a Hilbert basis, that is a complete orthonormal system, for the Hilbert space $L^{2}\left({\mathbb {R}}\right)$.

From wikipedia “What is a Hilbert basis? The mathematical concept of a Hilbert space, named after David Hilbert, generalizes the notion of Euclidean space. It extends the methods of vector algebra and calculus from the two-dimensional Euclidean plane and three-dimensional space to spaces with any finite or infinite number of dimensions. A Hilbert space is an abstract vector space possessing the structure of an inner product that allows length and angle to be measured. Furthermore, Hilbert spaces are complete: there are enough limits in the space to allow the techniques of calculus to be used.”

The Hilbert space is particularly suited to measure the differences of signals, whether there are audio or images. And much like vectorial spaces, there are equiped with a basis $B = { \psi_{jk}: j, k \in \mathbb{Z} }$. The wavelets $\psi_{jk}$ are functions in the set of real numbers to the set of real numbers, each of which is derived from the mother wavelet $\psi$ using transformation like scaling, translation, and rotation.

SIMPLE CASE! NOT GABOR!

  • Simplest case: translation + scaling
  • Gabor case: rotation + scaling

In the Hilbert configuration, the inner product (i.e. similarity) becomes:

where $\overline{x}$ means the dual and $\delta$ is the kronecker function.

The integral wavelet transform is the integral transform defined as

And the wavelet coefficients $c_$ are then given by

Here, $a\;=\;2^$ is called the binary dilation or dyadic dilation, and $b\;=\;k2^$ is the binary or dyadic position.

TODO:

  • Finish this simpler case
  • Write Gabor equivalent
  • We used a continuous definition of the wavelet. We are in the discrete case. Find the appropriate notations?

1.2. Gabor wavelet

First, they have received considerable attention because the characteristics of certain cells in the visual cortex of some mammals can be approximated by these filters. Furthermore, these filters have been shown to posses optimal localization properties in both spatial and frequency domain and thus are well suited for texture segmentation problems [13,20]. Finally, they have been used in many applications, such as texture segmentation, target detection, fractal dimension management, document analysis, edge detection, retina identification, image coding and image representation [21].

Key takeaway: Gabor filters are orientation-sensitive filters, made of a Gaussian modulated by a sinusoidal.

More precisely, a Gabor filter can be viewed as a sinusoidal plane $h$ of particular frequency and orientation, modulated by a Gaussian envelope $g$ (called the carrier). It can be written as:

The impulse response $h$ is defined by a sinusoidal wave (a plane wave for 2D Gabor filters) multiplied by a Gaussian function. Here are some:

Illustration

As we see in the image below, the filters are equiped with hyperparameters that allow us to have a better control and granularity over the resulting image. The role of this hyperparameters is more thorougly investigated in section 2.2.

The typically travel in packs, one for each direction. A gabor filter set with a given direction gives a strong response for locations of the target images that have structures in this given direction. For instance, if your target image is made of a periodic grating in a diagonal direction, a gabor filter set will give you a strong response only if its direction matches the one of the grating.

Because of the multiplication-convolution property (Convolution theorem), the Fourier transform of a Gabor filter’s impulse response is the convolution of the Fourier transform of the harmonic function (sinusoidal function) and the Fourier transform of the Gaussian function. The filter has a real and an imaginary component representing orthogonal directions.

TODO: Finish equation

In this equation, the hyperparameters are given by:

  • $\lambda$ represents the wavelength of the sinusoidal factor.
  • $\theta$ represents the orientation of the normal to the parallel stripes of a Gabor function.
  • $\psi$ is the phase offset.
  • $\sigma$ is the sigma/standard deviation of the Gaussian envelope.
  • $\gamma$ is the spatial aspect ratio, and specifies the ellipticity of the support of the Gabor function.

4. First order scattering transform

4.1. Avoiding information loss

A scattering first-order transform is defined from:

  1. A mother wavelet $\psi$ based on a Gabor filters (see section 2.2.)
  2. A low-pass filter $\phi$ (which one)

An input signal $x$ is filtered by a collection of dilated band-pass wavelets obtained from $\psi$:

followed by a modulus:

TODO: Define modulus:

Finally, the results is passed through a low-pass filter (dilation of $\phi$):

The chosen wavelets decompose the signal in a basis in which transient structure of a signal is represented more compactly.

TODO: Explain what is a transient features?

The frequency plane needs to be covered by the support of the filters to avoid an information loss. This issue is solved by the action of the Euclidean group via rotation $r_{-\theta}$ and dilation by $j \leq J$:

In this case, each wavelet $\psi_{j, \theta}$ has a bandwidth $1/(2^{j}\sigma_{0})$ and its central frequency is $2^{j}r_{-\theta}\omega_{0}$. If a filter has a compact support in the frequency domain, then due to Nyquist principle, we can reduce the spatial sampling of the resulting convolution. This approximation is done in the case of Gabor filters. As we shall see, this localization in frequency is also fundamental because it permits to obtain a smooth envelope. The parameters $j \leq J$ and $\theta \in \Theta$ are discretized and $\sigma_{0}$ is adjusted such that a wavelet transform preserves all the energy of $\tilde{x}$. The scattering first-order transform $S$ parametrized by $J$ is defined as:

4.2. Encoding

TODO:

  • C’est plus ou moins le script que tu as fait pout MNSIT. Il faudrait avoir une fonction qui le prend pour une seule image et on est bien. Also, faut que le process soit bien documenté.

4.3. Decoding

Following, an input image $x$ is reconstructed from its first-order scattering $S_{x}$ coefficient of scales $J$, via a $l^{2}$-norm minimization:

A gradient descent is used as all the operators are weakly differentiable.

TODO: Là il faut qu’on demande à Oyallon.. J’ai trouvé des papers tout ça mais il faut se plonger dedans vénère.

5. Experiments - SIMO

5.1. Motivation - SIMO + Louis

Il faut clairement creuser la section 2.6 Scattering Networks for Image Processing de la thèse du mec. Ca justifie pas mal nos recherches d’un point de vue théorique + low vision.

TODO: Explain in-depth!
For appropriate wavelets, first order coefficients are equivalent to SIFT coefficients Low04. Indeed, SIFT computes the local sum of image gradient amplitudes among image gradients having nearly the same direction, in a histogram having 8 different direction bins. The DAISY can produce an approximation of the coefficients.

6. Sources

Laisser un commentaire