Cheatsheet of Latex Code for Graph Neural Network(GNN) Equations

rockingdingo 2022-07-17 #graph neural network #gnn #gcn #gat #graphsage

Cheatsheet of Latex Code for Graph Neural Network(GNN) models equations


In this blog, we will summarize the latex code of equations of Graph Neural Network(GNN) models, which are useful as quick reference for your research. For common notation, we denote G=(V,E) as the graph. V as the set of nodes with size |V|=N, and E as the set of N_e edges as |E| = N_e. A is denoted as the adjacency matrix. For each node v, we use h_v and o_v as hidde state and output vector of each node.

1. Graph Neural Network

  • 1.1 Graph Laplacian


    Latex Code
                L=I_{N}-D^{-\frac{1}{2}}AD^{-\frac{1}{2}} \\
                L=U\Lambda U^{T}

    Graph Laplacian matrix equals to the Identity matrix minus the matrix multiplication of three parts, the (-1/2) power od Degree matrix D, the adjacency matrix A, and (-1/2) power od Degree matrix D. U is the eigenvectors of the normalized graph Laplacian L. The graph laplacian come from the graph Fourier transform F. The original signal x is first transformed to domain F(X) and inverse resulted signal is transformed back using the inverse graph Fourier transform F^{-1}.

  • 1.2 Graph Convolutional Networks(GCN)


    Latex Code
                \tilde{D}_{ii}=\sum_{j}\tilde{A}_{ij} \\
                H^{0}=X \\ 
                \mathcal{L}=-\sum_{l \in Y}\sum^{F}_{f=1} Y_{lf} ln Z_{lf}

    In this formulation, W indicates layer-specific trainable weight matrix. H^{0} is the original inputs feature matrix X as H^{0}=X, with dimension as N * D, and H^{l} indicates the l-th layer hidden representation of graph. The model is trained with semi-supervised classification labels and the loss function L is defined above. You can check more detailed information in this ICLR paper, Semi-supervised classification with graph convolutional networks for more details.

  • 1.3 GraphSage


    Model Structure

    Loss Function

    Latex Code
                h^{0}_{v} \leftarrow x_{v} \\
                \textbf{for} k \in \{1,2,...,K\} \text{do}\\
                \textbf{for} v \in V \text{do} \\
                h^{k}_{N_{v}} \leftarrow \textbf{AGGREGATE}_{k}(h^{k-1}_{u}, u \in N(v)); \\
                            h^{k}_{v} \leftarrow \sigma (W^{k} \textbf{concat}(h^{k-1}_{v},h^{k}_{N_{v}})) \\
                \textbf{end} \\
                h^{k}_{v}=h^{k}_{v}/||h^{k}_{v}||_{2},\forall v \in V \\
                \textbf{end} \\
                z_{v} \leftarrow h^{k}_{v}
                J_{\textbf{z}_{u}}=-\log (\sigma (\textbf{z}_{u}^{T}\textbf{z}_{v})) - Q \mathbb{E}_{v_{n} \sim p_n(v)} \log(\sigma (-\textbf{z}_{u}^{T}\textbf{z}_{v_{n}}))

    AGGREGATE function must operate on a set of unordered neighbour node vectors of each node v. Common choices includes Mean aggregator, Pooling aggregator, LSTM aggregator (random permutation of neighbours). The final loss functions is calculated in a unsupervised settings. Positive neighhour v is the node that co-occurs within fixed-length random walk of each node v. Negative neighhour is sampled from distribution of p_n(v). The final loss function of GraphSage is calculated as J_{\textbf{z}_{u}}, which is similar to NCE noise contrastive loss, where similar items pairs have higher values while unrelated items pairs have lower values. You can check more detailed information in this paper, Inductive Representation Learning on Large Graphs for more details.

  • 1.4 Graph Attention Network(GAT)


    Latex Code
                h=\{\vec{h_{1}},\vec{h_{2}},...,\vec{h_{N}}\}, \\
                \vec{h_{i}} \in \mathbb{R}^{F} \\
                W \in \mathbb{R}^{F \times F^{'}} \\
                e_{ij}=a(Wh_{i},Wh_{j}) \\
                k \in \mathcal{N}_{i},\text{ neighbourhood nodes}\\
                a_{ij}=\text{softmax}_{j}(e_{ij})=\frac{\exp(e_{ij})}{\sum_{k \in \mathcal{N}_{i}} \exp(e_{ik})}

    GAT applies graph attentional layer to model the graph propagation. In each layer, the node i has attention on all the other nodes j. And the attention coefficient is calculated. For the attention calculation, only the set of neighbours nodes N_{i} of each node i contributes to the final softmax attention calculation. You can check more detailed information in this paper, GRAPH ATTENTION NETWORKS for more details.