The Random Network Model

4 minute de lecture

Mis à jour : November 15, 2018

Network science aims to build models that reproduce the properties of real networks. As most encountered networks are irregular alnd look like they were spun randomly.

Assumption: Creating a random network boils down to placing the links randomly between the nodes.

Definition (Random Network): A random network consists of $N$ nodes where each node pair is connected with probability $p$.

Model construction:

Step 1: Start with N isolated nodes.
Step 1: Select a node pair and generate a random number between $0$ and 1. If the number exceeds $p$, connect the selected node pair with a link, otherwise leave them disconnected.
Step 3: Repeat Step 2 for each of the $N(N-1)/2$ node pairs.

Terminology: random network is called the Erdős-Rényi network.

Number of Links

Let us first note that the probability that a particular realization of a random network has exactly L links is

$p_{L} = \binom{\frac{N(N-1)}{2}}{L}p^{L}(1-p)^{\frac{N(N-1)}{2} - L}$

Expected number of links: $\langle L \rangle = p\frac{N(N-1)}{2} = pL_{max}$

Average degree: $\langle k \rangle = \frac{2\langle L \rangle}{N}$

Degree Distribution

Binomial vs. Poisson

The exact form of the degree distribution of a random network is the binomial distribution. For sparse networks: $N ›› \langle k \rangle$, the binomial is well approximated by a Poisson distribution.

Both formulas describe the same distribution and have the identical properties
Expressed in terms of different parameters:
- Binomial distribution: $p$ and $N$ $p_{k} = \binom{N-1}{k}p^{k}(1-p)^{N - 1 - k}$
- Poisson distribution: $\langle k \rangle$ $p_{k} = e^{-\langle k \rangle}\frac{\langle k \rangle}{k!}$

Poisson distribution is only an approximation to the degree distribution of a random network, thanks to its analytical simplicity, it is the preferred form for $p_{k}$.

Real Networks are Not Poisson

Result: in a large random network the degree of most nodes is in the narrow vicinity of \langle k \rangle

Degree distribution revisited: Using the sterling approximation, we rewrite:

$p_{k}=\frac{e^{-\langle k \rangle}}{\sqrt{2\pi k}}(\frac{e\langle k \rangle}{k})^{k}$

The above predicts that in a random network the chance of observing a hub decreases faster than exponentially.

Dire consequence: random network model underestimates the size and the frequency of the high degree nodes, as well as the number of low degree nodes.

The Evolution of a Random Network

To quantify how a network evolve, we first inspect how the size of the largest connected cluster within the network, $N_{G}$, varies with $\langle k \rangle$.

Property:

Giant component $\Leftrightarrow$ each node has on average more than one link

Four topologically distinct regimes:

Subcritical Regime ($0 ‹ \langle k \rangle ‹ 1$): network consists of numerous tiny components, whose size follows the exponential distribution (comparable sizes).
Critical Point ($\langle k \rangle = 1$): most nodes are located in numerous small components, whose size distribution follows the power law (components of rather different sizes coexist). Small components are mainly trees, while the giant component may contain loops.
Supercritical Regime ($\langle k \rangle › 1$): Numerous isolated components coexist with the giant component, their size distribution following (3.35). These small components are trees, while the giant component contains loops and cycles. The supercritical regime lasts until all nodes are absorbed by the giant component.
Connected Regime: $\langle k \rangle › ln(N)$
- Average degree at which this happens $\langle k \rangle=ln(N)$
- $ln(N) / N → 0$ for large $N \Rightarrow$ Network is relatively sparse
- Network is a complete graph $\Leftrightarrow \langle k \rangle = N - 1$.

Real Networks are Supercritical

The average degree of real networks is well beyond the $\langle k \rangle = 1$ threshold.

Verified with real networks: Real networks are supercritical $\Rightarrow$ Networks are expected to have a giant component.
Not verified in real networks: Giant component should coexist with many disconnected components.

Note:</i> Real Networks can stay connected despite failing the $\langle k \rangle ›› ln(N)$ criteria

Small Worlds

Small world phenomenon states that if you choose any two individuals anywhere on Earth, you will find a path of at most six acquaintances between them.

Meaning: distance between two randomly chosen nodes in a network is short.

What does short (or small) mean, i.e. short compared to what?

Expected number of nodes up to distance $d$ from our starting node: $N(d) \approx 1 + \langle k \rangle + \langle k \rangle^{2} + … + \langle k \rangle^{d} = \frac{\langle k \rangle^{d} - 1}{\langle k \rangle - 1}$

identify the maximum distance, d_{max}:

$\begin{align} N(d_{max}) \approx N \\ \langle k \rangle^{d_{max}} \approx N \\ d_{max} \approx \frac{ln(N)}{ln(\langle k rangle)} \end{align}$

Typically the small world property is defined by:

$\langle d \rangle \approx \frac{ln(N)}{ln(\langle k rangle)}$

Key takeaways:

Average path length or the diameter depends logarithmically on the system size.
The $\frac{1}{ln \rangle k \rangle}$ term implies that the denser the network, the smaller is the distance between the nodes.
In real networks the number of nodes at distance d › ‹d› drops rapidly

How do we explain the existence of these short distances?

Partager sur

Twitter Facebook Google+ LinkedIn

Louis de Vitry

The Random Network Model

Number of Links

Degree Distribution

Real Networks are Not Poisson

The Evolution of a Random Network

Real Networks are Supercritical

Small Worlds

How do we explain the existence of these short distances?

Partager sur

Laisser un commentaire

Vous pourriez aimer

Spark introduction

Hadoop

Link prediction

Link prediction

Spreading through Network