Graph theory
Mis à jour :
Networks are a central aspect of any systems, and Graph theory - a branch of Mathematics - is fundamental to grasp and represent those networks. From degrees to degree distributions, from paths to distances and learn to distinguish weighted, directed and bipartite networks.
Goal: Introduce a graph-theoretic formalism to map the networks wiring diagram.
1. Networks and graphs
TL; DR: Network is a catalog of nodes/vertices with the links/edges that describe the direct interactions between them.
Note: The number of nodes / edges are key informations about a graph.
Direction in a graph: Depending on wether edges have a direction, we will call directed / undirected. It can be both.
Terminology:
Graph | Networks | |
---|---|---|
Type | Representation | Real |
Elements | Vertex | Nodes |
Relations | Edge | Links |
Remark: Choice of representation must be done with care regarding the significance of nodes and links.
2. Key metrics: Degree, Average Degree and Degree Distribution
2.1. Directed vs. Undirected
Degree: number of links it has to other nodes
Note: We introduce the $\frac{1}{2}$ factor as each link is counted twice.
Average Degree:
Incoming / outcoming: In directed networks we distinguish between:
$k_{i}^{in}$ : Incoming degree (number of links that point to node $i$)
$k^{i}_{out}$ : Outgoing degree (number of links from node $i$)
$k^{i} = k_{i}^{in} + k^{i}_{out}$ : Node’s total degree
Definition (Number of links):
Definition (Average degree):
Definition (Degree Distribution): probability that a randomly selected node in the network has degree $k$. For a network with N nodes:
Note: Central role in network theory following the discovery of scale-free networks
Definition (Hub): most connected node.
Advice: The degree distribution is often shown on a log-log plot
2.2. Adjacency Matrix
Adjacency matrix:
Property:
Undirected networkd $\Rightarrow$ adjacency matrix symmetric.
Undirected network
Directed networks:
- $k_{i}^{in} = \sum_{j=1}^{N} A_{ji}$
- $k_{i}^{out} = \sum_{j=1}^{N} A_{ij}$
Definition: $L_{max}$ is the maximum number of links.
2.3. Real Networks are Sparse
Definition (Network sparse): if $L ‹‹ L_{max}$
Storage: Lighter in memory to store a sparse matrix with a list rather than the full adjacent matrix.
2.4. Weighted Networks
Definition: A network is weighted if
Metcalfe’s law: $\text{Value}(N)\propto N^{2}$
It translates to network externality in economics.
Issues with the law:
- Most real networks are sparse. Hence the value of the network increases only linearly with N.
- As the links have weights, not all links are of equal value.
2.5. Bipartite Networks
Definition (bipartite graph): network whose nodes can be divided into two disjoint sets $U$ and $V$ such that each link connects a $U$-node to a $V$-node.
Two projections for each bipartite network:
- Connects two U- nodes by a link if they are linked to the same V-node.
- Connects the V-nodes by a link if they connect to the same U-node
3. Paths and Distances
Physical distance plays a key role in determining the interactions between the components of physical systems.
Definition (Path): is a route that runs along the links of the network. A path’s length represents the number of links the path contains.
3.1. Shortest Path and distance
The shortest path between nodes $i$ and $j$ is the path with the fewest number of links.
Definition: The shortest path is often called the distance $d_{ij}$.
Notes:
- Multiple shortest paths of the same length $d$ between a pair of nodes.
- The shortest path never contains loops or intersects itself.
- Undirected network: $d_{ij} = d_{ji}$
- Directed network: $\exists p_{ij} \nRightarrow \exists p_{ji}$
Goal: Find the shortest path.
3.2. How to find the shortest path?
Adjacency matrix find the shortest path and their numbers for points $i$ and $j$.
- $d_{ij} = 1$ if a direct link exists
- $d_{ij} = 2$ if there is a path of length two. Then $A_{ik} A_{kj} = 1$ and the number of $d_{ij} = 2$ paths is
Breadth first search (BFS) BFS starts from a node and labels its neighbors, then the neighbors’ neighbors, until it reaches the target node. The number of “ripples” needed to reach the target provides the distance.
def bfs_paths(graph, start, goal):
# Start at node i, that we label with “0”.
queue = [(start, [start])]
# Repeat until you find the target node $j$ or there are no more nodes in the queue.
while queue:
(vertex, path) = queue.pop(0)
# Find the nodes directly linked to i.
for next in graph[vertex] - set(path):
# Find the labeled nodes adjacent to it
if next == goal:
yield path + [next]
else:
# Label them with n + 1 and put them in the queue.
queue.append((next, path + [next]))
If $j$ does not have a label, then $d_{ij} = \infty$.
Computational complexity $\mathcal{0}(N + L)$ because each node needs to be entered and removed from the queue at most once, and each link has to be tested only once.
Diameter d_${max}$ The longest shortest path in a graph / distance between the two furthest nodes.
Compute the diameter:
- Choose a node to start
- Compute all the outgoing distances from that point (but the first one if the network is undirected)
- Repeat with another node for all nodes
4. Edge topology
Average Path Length: $\langle d \rangle = \frac{1}{N^{2}}\sum_{j=1}^{N}\sum_{i=1}^{N} d_{i, j}$
Cycle: A path with the same start and end node. In the graph shown on the left we have only one cycle, as shown by the orange line.
Eulerian Path: A path that traverses each link exactly once.
Hamiltonian Path: A path that visits each node exactly once. We show two Hamiltonian paths in orange and in blue.
4.1. Connectedness
Goal: Quantify how well are nodes connected to each other (with paths).
Undirected network:
Definition: (Clusters C): are subnetworks so that
How to find clusters?
- Small / Moderate graphs $\rightarrow$ Linear Agebra:
- Rearange the adjacency matrix so as to have a block diagonal form.
- The blocks are the clusters.
- Large networks $\rightarrow$ BFS algorithm:
- Step 1: Choose node $i$ randomly and perform a BFS. Label all nodes reached this way with $n=1$.
- Step 2: If all nodes are labeled $\Rightarrow$ network connected.
- Step 3: The network consists of several components.
- Label $n → n + 1$.
- Choose an unmarked node $j$, label it with $n$.
- Use BFS on unmarked node $j$ and label them all with $n$.
- Return to Step 2.
4.2. Clustering Coefficient
TL;DR: The clustering coefficient captures the degree to which the neighbors of a given node link to each other.
Definition (Local clustering coefficient): For a node $i$ with degree $k_{i}$:
where $L_{i}$ is the number of links between the neighbors of $i$.
Definition (average clustering coefficient): $\langle C \rangle = \frac{1}{N}C_{i}$
Probabilistic interpretation $\langle C \rangle$ is the probability that two neighbors of a randomly selected node link to each other.
Notes:Clustering coefficient can be generalized to directed and weighted networks as well.
Laisser un commentaire