Statistics basics

4 minute de lecture

Mis à jour :

1. Convergence of random variables

1. What is a random variable?

Random variables are central objects in statistics and probabilities because they are applications $X: \Omega \to E$, where the codomain $\Omega$ is the set of eventualities and the codomain $E$ is a measurable space. Usually $X$ is real-valued (i.e. $E=\mathbb{R}$).

Probability / distribution of a RV: Depending on the type of $X$, wa can equip it with a probability distribution:

  • Discrete: $P(X = x)$ or $P_{\theta}(X =x)$
  • Continuous: $f(x)$ or $f_{\theta}(x)$

Mean / variance of a RV: $\mathbb{E}[X]$ or $E_{\theta}[X]$ (resp. $V[X] / V_{\theta}[X]$) stands for the statistical expectation (resp. the variance)

What are i.i.d. RV? Independent and Identically Distributed means that $X$ and $Y$ come from the same distribution. We note this $X ⊥ Y$ and a criterion to check whether $X ⊥ Y$ is: for any measurable functions $h$ and $$,

Additional terminology PDF, CDF and iff resp. means Probability Density Function, Cumulative Distribution Function and “if and only if’’

1.2. Convergences of the multivariate case

Let $(x_{n})_{n\in\mathbb{N}} \in \mathbb{R}^{d}$ a sequence of r.V. and $\textbf{x} \in \mathbb{R}^{d}$:

Almost sure convergence: This is the type of stochastic convergence that is most similar to pointwise convergence known from elementary real analysis.

This means that the values of $n$ approach the value of $X$, in the sense (see almost surely) that events for which $X_{n}$ does not converge to $X$ have probability $0$. Using the probability space $(\Omega ,\mathcal {F}, \text{P})$ and the concept of the random variable as a function from $\Omega \text{ to } \mathbb{R}$, this is equivalent to the statement

Convergence in probability: The probability of an unusual outcome gets smaller as $n$ increases. Mathematically, this translates to:

Convergence in $L^{p}$: if the expectation of the $L^{p}$ norm converges towards 0.

Convergene in distribution: A sequence of random variables converges in distribution if for any continuous and bounded functions $g$, one has:

Note: The CV in distribution of a sequence of r.V. is stronger than the CV of each component!

2. Cornerstone results

2.1. Convergence characterization

How to characterise the CV in distribution?

TODO: Characteristic fonction:

Theorem (Levy continuity Theorem) Let $\phi_{n}(u) = \mathbb{E}(\text{exp}(iu^{t}x_{n}))$ and $\phi(u) = \mathbb{E}(\text{exp}(iu^{t}x)$ the characteristic functions of $x_{n}$ and $x$. Then:

Proposition (a.s., P, dist. convergences) If $x_{n} \rightarrow x$ then $h(x_{n})\rightarrow h(x)$ if $h$ is a continuous function.

3.2. SLLN and CLT

Theorem Let $(x_{n})$ a sequence of iid rV in $\mathbb{R}^{d}$ such that $\mathbb{E}[x_{1}] < +\infty$. Let $\mu = \mathbb{E}[X_{1}]$ the expectation of $x_{1}$. Then:
Central limit theorem: Let $x_{n}$ a sequence of iid rV in $R^{d}$ s.t. $\mathbb{E}[x_{1}^{2}] < \infty$. Let $\mu = \mathbb{E}[x_{1}]$ and $\sigma = \mathbb{E}[x_{1}^{t}x_{1}] - \mathbb{E}[x_{1}]\mathbb{E}[x_{1}]$ the convariance matrix of $x_{1}$. If we let $\hat{x_{n}} = \frac{1}{n}\sum_{x_{i}}$ the empirical mean then we obtain:

3.2. Slutsky theorem

In probability theory, Slutsky’s theorem extends some properties of algebraic operations on convergent sequences of real numbers to sequences of random variables.

Slutsky theorem: Let $(x_{n}){n\in \mathbb{N}^{\text{*}}}$ a sequence of r.V. in $\mathbb{R}^{d}$ that converges in distribution to $x$. Let $(y{n}){n\in \mathbb{N}^{\text{*}}}$ a sequence of r.V. in $\mathbb{R}^{m}$ (defined on the same proba. space as $(x{n}){n\in \mathbb{N}^{\text{*}}}$ that converges almost surely (or in P, or in dist.) towards a constant a. Thus, the sequence dist . $(x{n},y_{n})_{n\in \mathbb{N}^{\text{*}}}$ converges in distribution towards $(x,a)$:

Important applications of Slutsky theorem:

  • Sum: $x_{n} + y_{n} \xrightarrow[]{dist} x + a \text{ if } m=d$
  • Product: $x_{n} \cdot y_{n} \xrightarrow[]{dist} x \cdot a \text{ if } m=1$
  • Division: $x_{n}/y_{n} \xrightarrow[]{dist} x/a \text{ if } m=1 \text{ and } a \neq 0$

3.4. Delta method

Delta method: The delta method is a general method for deriving the variance of a function of asymptotically normal random variables with known variance.

Let $(x_{n})_{n\in \mathbb{N}^{\text{*}}}$ a sequence of r.V. in $\mathbb{R}^{d}$ and $\theta$ a deterministic vector of $\mathbb{R}^{d}$. Let $h:\mathbb{R}^{d} \mapsto \mathbb{R}^{m}$ a function that is differentiable (at least) at point $\theta$.

Let us denote $\frac{\partial h}{\partial \theta^{t}}$ the $m \times d$ matrix such that:

Assumption:

Result:

Note: There is a particular case if $x \sim N(0, \Sigma)$ then

4.1. Gamma and Beta distribution


Laisser un commentaire