Statistics basics
Mis à jour :
1. Convergence of random variables
1. What is a random variable?
Random variables are central objects in statistics and probabilities because they are applications $X: \Omega \to E$, where the codomain $\Omega$ is the set of eventualities and the codomain $E$ is a measurable space. Usually $X$ is real-valued (i.e. $E=\mathbb{R}$).
Probability / distribution of a RV: Depending on the type of $X$, wa can equip it with a probability distribution:
- Discrete: $P(X = x)$ or $P_{\theta}(X =x)$
- Continuous: $f(x)$ or $f_{\theta}(x)$
Mean / variance of a RV: $\mathbb{E}[X]$ or $E_{\theta}[X]$ (resp. $V[X] / V_{\theta}[X]$) stands for the statistical expectation (resp. the variance)
What are i.i.d. RV? Independent and Identically Distributed means that $X$ and $Y$ come from the same distribution. We note this $X ⊥ Y$ and a criterion to check whether $X ⊥ Y$ is: for any measurable functions $h$ and $$,
Additional terminology PDF, CDF and iff resp. means Probability Density Function, Cumulative Distribution Function and “if and only if’’
1.2. Convergences of the multivariate case
Let $(x_{n})_{n\in\mathbb{N}} \in \mathbb{R}^{d}$ a sequence of r.V. and $\textbf{x} \in \mathbb{R}^{d}$:
Almost sure convergence: This is the type of stochastic convergence that is most similar to pointwise convergence known from elementary real analysis.
This means that the values of $n$ approach the value of $X$, in the sense (see almost surely) that events for which $X_{n}$ does not converge to $X$ have probability $0$. Using the probability space $(\Omega ,\mathcal {F}, \text{P})$ and the concept of the random variable as a function from $\Omega \text{ to } \mathbb{R}$, this is equivalent to the statement
Convergence in probability: The probability of an unusual outcome gets smaller as $n$ increases. Mathematically, this translates to:
Convergence in $L^{p}$: if the expectation of the $L^{p}$ norm converges towards 0.
Convergene in distribution: A sequence of random variables converges in distribution if for any continuous and bounded functions $g$, one has:
Note: The CV in distribution of a sequence of r.V. is stronger than the CV of each component!
2. Cornerstone results
2.1. Convergence characterization
How to characterise the CV in distribution?
TODO: Characteristic fonction:
Theorem (Levy continuity Theorem) Let $\phi_{n}(u) = \mathbb{E}(\text{exp}(iu^{t}x_{n}))$ and $\phi(u) = \mathbb{E}(\text{exp}(iu^{t}x)$ the characteristic functions of $x_{n}$ and $x$. Then:
Proposition (a.s., P, dist. convergences) If $x_{n} \rightarrow x$ then $h(x_{n})\rightarrow h(x)$ if $h$ is a continuous function.
3.2. SLLN and CLT
Theorem Let $(x_{n})$ a sequence of iid rV in $\mathbb{R}^{d}$ such that $\mathbb{E}[ | x_{1} | ] < +\infty$. Let $\mu = \mathbb{E}[X_{1}]$ the expectation of $x_{1}$. Then: |
Central limit theorem: Let $x_{n}$ a sequence of iid rV in $R^{d}$ s.t. $\mathbb{E}[ | x_{1} | ^{2}] < \infty$. Let $\mu = \mathbb{E}[x_{1}]$ and $\sigma = \mathbb{E}[x_{1}^{t}x_{1}] - \mathbb{E}[x_{1}]\mathbb{E}[x_{1}]$ the convariance matrix of $x_{1}$. If we let $\hat{x_{n}} = \frac{1}{n}\sum_{x_{i}}$ the empirical mean then we obtain: |
3.2. Slutsky theorem
In probability theory, Slutsky’s theorem extends some properties of algebraic operations on convergent sequences of real numbers to sequences of random variables.
Slutsky theorem: Let $(x_{n}){n\in \mathbb{N}^{\text{*}}}$ a sequence of r.V. in $\mathbb{R}^{d}$ that converges in distribution to $x$. Let $(y{n}){n\in \mathbb{N}^{\text{*}}}$ a sequence of r.V. in $\mathbb{R}^{m}$ (defined on the same proba. space as $(x{n}){n\in \mathbb{N}^{\text{*}}}$ that converges almost surely (or in P, or in dist.) towards a constant a. Thus, the sequence dist . $(x{n},y_{n})_{n\in \mathbb{N}^{\text{*}}}$ converges in distribution towards $(x,a)$:
Important applications of Slutsky theorem:
- Sum: $x_{n} + y_{n} \xrightarrow[]{dist} x + a \text{ if } m=d$
- Product: $x_{n} \cdot y_{n} \xrightarrow[]{dist} x \cdot a \text{ if } m=1$
- Division: $x_{n}/y_{n} \xrightarrow[]{dist} x/a \text{ if } m=1 \text{ and } a \neq 0$
3.4. Delta method
Delta method: The delta method is a general method for deriving the variance of a function of asymptotically normal random variables with known variance.
Let $(x_{n})_{n\in \mathbb{N}^{\text{*}}}$ a sequence of r.V. in $\mathbb{R}^{d}$ and $\theta$ a deterministic vector of $\mathbb{R}^{d}$. Let $h:\mathbb{R}^{d} \mapsto \mathbb{R}^{m}$ a function that is differentiable (at least) at point $\theta$.
Let us denote $\frac{\partial h}{\partial \theta^{t}}$ the $m \times d$ matrix such that:
Assumption:
Result:
Note: There is a particular case if $x \sim N(0, \Sigma)$ then
Laisser un commentaire