Cheatsheet of Latex Code for Graph Neural Network(GNN) Equations
rockingdingo 20220717 #graph neural network #gnn #gcn #gat #graphsageCheatsheet of Latex Code for Graph Neural Network(GNN) models equations
Navigation
In this blog, we will summarize the latex code of equations of Graph Neural Network(GNN) models, which are useful as quick reference for your research. For common notation, we denote G=(V,E) as the graph. V as the set of nodes with size V=N, and E as the set of N_e edges as E = N_e. A is denoted as the adjacency matrix. For each node v, we use h_v and o_v as hidde state and output vector of each node.
 1. Graph Neural Network
 1.1 Graph Laplacian
 1.2 Graph Convolutional Networks(GCN)
 1.3 GraphSage
 1.4 Graph Attention Network(GAT)
1. Graph Neural Network

1.1 Graph Laplacian
Equation
Latex Code
L=I_{N}D^{\frac{1}{2}}AD^{\frac{1}{2}} \\ L=U\Lambda U^{T}
Explanation
Graph Laplacian matrix equals to the Identity matrix minus the matrix multiplication of three parts, the (1/2) power od Degree matrix D, the adjacency matrix A, and (1/2) power od Degree matrix D. U is the eigenvectors of the normalized graph Laplacian L. The graph laplacian come from the graph Fourier transform F. The original signal x is first transformed to domain F(X) and inverse resulted signal is transformed back using the inverse graph Fourier transform F^{1}.

1.2 Graph Convolutional Networks(GCN)
Equation
Latex Code
H^{(l+1)}=\sigma(\tilde{D}^{\frac{1}{2}}\tilde{A}\tilde{D}^{\frac{1}{2}}H^{l}W^{l})\\ \tilde{A}=A+I_{N}\\ \tilde{D}_{ii}=\sum_{j}\tilde{A}_{ij} \\ H^{0}=X \\ \mathcal{L}=\sum_{l \in Y}\sum^{F}_{f=1} Y_{lf} ln Z_{lf}
Explanation
In this formulation, W indicates layerspecific trainable weight matrix. H^{0} is the original inputs feature matrix X as H^{0}=X, with dimension as N * D, and H^{l} indicates the lth layer hidden representation of graph. The model is trained with semisupervised classification labels and the loss function L is defined above. You can check more detailed information in this ICLR paper, Semisupervised classification with graph convolutional networks for more details.

1.3 GraphSage
Equation
Model Structure
Loss Function
Latex Code
h^{0}_{v} \leftarrow x_{v} \\ \textbf{for} k \in \{1,2,...,K\} \text{do}\\ \textbf{for} v \in V \text{do} \\ h^{k}_{N_{v}} \leftarrow \textbf{AGGREGATE}_{k}(h^{k1}_{u}, u \in N(v)); \\ h^{k}_{v} \leftarrow \sigma (W^{k} \textbf{concat}(h^{k1}_{v},h^{k}_{N_{v}})) \\ \textbf{end} \\ h^{k}_{v}=h^{k}_{v}/h^{k}_{v}_{2},\forall v \in V \\ \textbf{end} \\ z_{v} \leftarrow h^{k}_{v}
J_{\textbf{z}_{u}}=\log (\sigma (\textbf{z}_{u}^{T}\textbf{z}_{v}))  Q \mathbb{E}_{v_{n} \sim p_n(v)} \log(\sigma (\textbf{z}_{u}^{T}\textbf{z}_{v_{n}}))
Explanation
AGGREGATE function must operate on a set of unordered neighbour node vectors of each node v. Common choices includes Mean aggregator, Pooling aggregator, LSTM aggregator (random permutation of neighbours). The final loss functions is calculated in a unsupervised settings. Positive neighhour v is the node that cooccurs within fixedlength random walk of each node v. Negative neighhour is sampled from distribution of p_n(v). The final loss function of GraphSage is calculated as J_{\textbf{z}_{u}}, which is similar to NCE noise contrastive loss, where similar items pairs have higher values while unrelated items pairs have lower values. You can check more detailed information in this paper, Inductive Representation Learning on Large Graphs for more details.

1.4 Graph Attention Network(GAT)
Equation
Latex Code
h=\{\vec{h_{1}},\vec{h_{2}},...,\vec{h_{N}}\}, \\ \vec{h_{i}} \in \mathbb{R}^{F} \\ W \in \mathbb{R}^{F \times F^{'}} \\ e_{ij}=a(Wh_{i},Wh_{j}) \\ k \in \mathcal{N}_{i},\text{ neighbourhood nodes}\\ a_{ij}=\text{softmax}_{j}(e_{ij})=\frac{\exp(e_{ij})}{\sum_{k \in \mathcal{N}_{i}} \exp(e_{ik})}
Explanation
GAT applies graph attentional layer to model the graph propagation. In each layer, the node i has attention on all the other nodes j. And the attention coefficient is calculated. For the attention calculation, only the set of neighbours nodes N_{i} of each node i contributes to the final softmax attention calculation. You can check more detailed information in this paper, GRAPH ATTENTION NETWORKS for more details.