Graph theory

7 minute de lecture

Mis à jour : August 14, 2018

Networks are a central aspect of any systems, and Graph theory - a branch of Mathematics - is fundamental to grasp and represent those networks. From degrees to degree distributions, from paths to distances and learn to distinguish weighted, directed and bipartite networks.

Goal: Introduce a graph-theoretic formalism to map the networks wiring diagram.

Networks and graphs
Key metrics
Paths and distance
1. Shortest Path and distance
2. How to find the shortest path?
Edge topology
Edge topology
Clustering coefficient

1. Networks and graphs

TL; DR: Network is a catalog of nodes/vertices with the links/edges that describe the direct interactions between them.

Note: The number of nodes / edges are key informations about a graph.

Direction in a graph: Depending on wether edges have a direction, we will call directed / undirected. It can be both.

Terminology:

	Graph	Networks
Type	Representation	Real
Elements	Vertex	Nodes
Relations	Edge	Links

Remark: Choice of representation must be done with care regarding the significance of nodes and links.

2. Key metrics: Degree, Average Degree and Degree Distribution

2.1. Directed vs. Undirected

Degree: number of links it has to other nodes

$L = \frac{1}{2}\sum_{i=1}^{N}k_{i}$

Note: We introduce the $\frac{1}{2}$ factor as each link is counted twice.

Average Degree: $\langle k \rangle = \frac{1}{N}\sum_{i=1}^{N}k_{i}$

Incoming / outcoming: In directed networks we distinguish between:

$k_{i}^{in}$ : Incoming degree (number of links that point to node $i$)
$k^{i}_{out}$ : Outgoing degree (number of links from node $i$)
$k^{i} = k_{i}^{in} + k^{i}_{out}$ : Node’s total degree

Definition (Number of links): $L = \sum_{i=1}^{N} k_{i}^{in} = \sum_{i=1}^{N} k_{i}^{out}$

Definition (Average degree): $\langle k^{in} \rangle = \frac{1}{n} k_{i}^{in}$

Definition (Degree Distribution): probability that a randomly selected node in the network has degree $k$. For a network with N nodes:

$p_{k} = N_{k}/N$

Note: Central role in network theory following the discovery of scale-free networks

Definition (Hub): most connected node.

Advice: The degree distribution is often shown on a log-log plot

2.2. Adjacency Matrix

Adjacency matrix: $A_{ij} = \begin{cases} 1 \text{ if } \exists \text{ link from j to i}\\ 0 \text{ otherwise }\\ \end{cases}$

Property:

Undirected networkd $\Rightarrow$ adjacency matrix symmetric.
Undirected network $\langle k_{i} \rangle = \sum_{j=1}^{N} A_{ij} = \sum_{j=1}^{N} A_{ji}$
Directed networks:
- $k_{i}^{in} = \sum_{j=1}^{N} A_{ji}$
- $k_{i}^{out} = \sum_{j=1}^{N} A_{ij}$

Definition: $L_{max}$ is the maximum number of links.

2.3. Real Networks are Sparse

Definition (Network sparse): if $L ‹‹ L_{max}$

Storage: Lighter in memory to store a sparse matrix with a list rather than the full adjacent matrix.

2.4. Weighted Networks

Definition: A network is weighted if $A_{ij} = w_{ij}$

Metcalfe’s law: $\text{Value}(N)\propto N^{2}$

It translates to network externality in economics.

Issues with the law:

Most real networks are sparse. Hence the value of the network increases only linearly with N.
As the links have weights, not all links are of equal value.

2.5. Bipartite Networks

Definition (bipartite graph): network whose nodes can be divided into two disjoint sets $U$ and $V$ such that each link connects a $U$-node to a $V$-node.

Two projections for each bipartite network:

Connects two U- nodes by a link if they are linked to the same V-node.
Connects the V-nodes by a link if they connect to the same U-node

3. Paths and Distances

Physical distance plays a key role in determining the interactions between the components of physical systems.

Definition (Path): is a route that runs along the links of the network. A path’s length represents the number of links the path contains.

3.1. Shortest Path and distance

The shortest path between nodes $i$ and $j$ is the path with the fewest number of links.

Definition: The shortest path is often called the distance $d_{ij}$.

Notes:

Multiple shortest paths of the same length $d$ between a pair of nodes.
The shortest path never contains loops or intersects itself.
Undirected network: $d_{ij} = d_{ji}$
Directed network: $\exists p_{ij} \nRightarrow \exists p_{ji}$

Goal: Find the shortest path.

3.2. How to find the shortest path?

Adjacency matrix find the shortest path and their numbers for points $i$ and $j$.

$d_{ij} = 1$ if a direct link exists
$d_{ij} = 2$ if there is a path of length two. Then $A_{ik} A_{kj} = 1$ and the number of $d_{ij} = 2$ paths is $N_{ij}^{2} = \sum_{k = 1}^{N} A_{ik}A_{kj} = A_{ij}^{2}$

Breadth first search (BFS) BFS starts from a node and labels its neighbors, then the neighbors’ neighbors, until it reaches the target node. The number of “ripples” needed to reach the target provides the distance.

def bfs_paths(graph, start, goal):
    # Start at node i, that we label with “0”.
    queue = [(start, [start])]
    # Repeat until you find the target node $j$ or there are no more nodes in the queue.
    while queue:
        (vertex, path) = queue.pop(0)
        # Find the nodes directly linked to i.
        for next in graph[vertex] - set(path):
            # Find the labeled nodes adjacent to it
            if next == goal:
                yield path + [next]
            else:
                # Label them with n + 1 and put them in the queue.
                queue.append((next, path + [next]))

If $j$ does not have a label, then $d_{ij} = \infty$.

Computational complexity $\mathcal{0}(N + L)$ because each node needs to be entered and removed from the queue at most once, and each link has to be tested only once.

Diameter d_${max}$ The longest shortest path in a graph / distance between the two furthest nodes.

Compute the diameter:

Choose a node to start
Compute all the outgoing distances from that point (but the first one if the network is undirected)
Repeat with another node for all nodes

4. Edge topology

Average Path Length: $\langle d \rangle = \frac{1}{N^{2}}\sum_{j=1}^{N}\sum_{i=1}^{N} d_{i, j}$

Cycle: A path with the same start and end node. In the graph shown on the left we have only one cycle, as shown by the orange line.

Eulerian Path: A path that traverses each link exactly once.

Hamiltonian Path: A path that visits each node exactly once. We show two Hamiltonian paths in orange and in blue.

4.1. Connectedness

Goal: Quantify how well are nodes connected to each other (with paths).

Undirected network: $\text{i and j are } \begin{cases} \text{connected if }d_{ij}<\infty\\ \text{disconnected if }d_{ij}=\infty \end{cases}$

Definition: (Clusters C): are subnetworks so that

$\forall i \in C, \forall j \notin C: d_{ij} = \infty$

How to find clusters?

Small / Moderate graphs $\rightarrow$ Linear Agebra:
- Rearange the adjacency matrix so as to have a block diagonal form.
- The blocks are the clusters.
Large networks $\rightarrow$ BFS algorithm:
- Step 1: Choose node $i$ randomly and perform a BFS. Label all nodes reached this way with $n=1$.
- Step 2: If all nodes are labeled $\Rightarrow$ network connected.
- Step 3: The network consists of several components.
  - Label $n → n + 1$.
  - Choose an unmarked node $j$, label it with $n$.
  - Use BFS on unmarked node $j$ and label them all with $n$.
  - Return to Step 2.

4.2. Clustering Coefficient

TL;DR: The clustering coefficient captures the degree to which the neighbors of a given node link to each other.

Definition (Local clustering coefficient): For a node $i$ with degree $k_{i}$: $C_{i} = \frac{1L_{i}}{k_{i}(k_{i}-1)}$

where $L_{i}$ is the number of links between the neighbors of $i$.

Definition (average clustering coefficient): $\langle C \rangle = \frac{1}{N}C_{i}$

Probabilistic interpretation $\langle C \rangle$ is the probability that two neighbors of a randomly selected node link to each other.

Notes:Clustering coefficient can be generalized to directed and weighted networks as well.

Partager sur

Twitter Facebook Google+ LinkedIn

Louis de Vitry