GraphSage
Tags: #machine learning #graph #GNNEquation
$$h^{0}_{v} \leftarrow x_{v} \\ \textbf{for} k \in \{1,2,...,K\} \text{do}\\ \textbf{for} v \in V \text{do} \\ h^{k}_{N_{v}} \leftarrow \textbf{AGGREGATE}_{k}(h^{k-1}_{u}, u \in N(v)); \\ h^{k}_{v} \leftarrow \sigma (W^{k} \textbf{concat}(h^{k-1}_{v},h^{k}_{N_{v}})) \\ \textbf{end} \\ h^{k}_{v}=h^{k}_{v}/||h^{k}_{v}||_{2},\forall v \in V \\ \textbf{end} \\ z_{v} \leftarrow h^{k}_{v} \\ J_{\textbf{z}_{u}}=-\log (\sigma (\textbf{z}_{u}^{T}\textbf{z}_{v})) - Q \mathbb{E}_{v_{n} \sim p_n(v)} \log(\sigma (-\textbf{z}_{u}^{T}\textbf{z}_{v_{n}}))$$Latex Code
h^{0}_{v} \leftarrow x_{v} \\ \textbf{for} k \in \{1,2,...,K\} \text{do}\\ \textbf{for} v \in V \text{do} \\ h^{k}_{N_{v}} \leftarrow \textbf{AGGREGATE}_{k}(h^{k-1}_{u}, u \in N(v)); \\ h^{k}_{v} \leftarrow \sigma (W^{k} \textbf{concat}(h^{k-1}_{v},h^{k}_{N_{v}})) \\ \textbf{end} \\ h^{k}_{v}=h^{k}_{v}/||h^{k}_{v}||_{2},\forall v \in V \\ \textbf{end} \\ z_{v} \leftarrow h^{k}_{v} \\ J_{\textbf{z}_{u}}=-\log (\sigma (\textbf{z}_{u}^{T}\textbf{z}_{v})) - Q \mathbb{E}_{v_{n} \sim p_n(v)} \log(\sigma (-\textbf{z}_{u}^{T}\textbf{z}_{v_{n}}))
Have Fun
Let's Vote for the Most Difficult Equation!
Introduction
Equation
Model Structure
Loss Function
Latex Code
h^{0}_{v} \leftarrow x_{v} \\ \textbf{for} k \in \{1,2,...,K\} \text{do}\\ \textbf{for} v \in V \text{do} \\ h^{k}_{N_{v}} \leftarrow \textbf{AGGREGATE}_{k}(h^{k-1}_{u}, u \in N(v)); \\ h^{k}_{v} \leftarrow \sigma (W^{k} \textbf{concat}(h^{k-1}_{v},h^{k}_{N_{v}})) \\ \textbf{end} \\ h^{k}_{v}=h^{k}_{v}/||h^{k}_{v}||_{2},\forall v \in V \\ \textbf{end} \\ z_{v} \leftarrow h^{k}_{v}
J_{\textbf{z}_{u}}=-\log (\sigma (\textbf{z}_{u}^{T}\textbf{z}_{v})) - Q \mathbb{E}_{v_{n} \sim p_n(v)} \log(\sigma (-\textbf{z}_{u}^{T}\textbf{z}_{v_{n}}))
Explanation
AGGREGATE function must operate on a set of unordered neighbour node vectors of each node v. Common choices includes Mean aggregator, Pooling aggregator, LSTM aggregator (random permutation of neighbours). The final loss functions is calculated in a unsupervised settings. Positive neighhour v is the node that co-occurs within fixed-length random walk of each node v. Negative neighhour is sampled from distribution of p_n(v). The final loss function of GraphSage is calculated as J_{\textbf{z}_{u}}, which is similar to NCE noise contrastive loss, where similar items pairs have higher values while unrelated items pairs have lower values. You can check more detailed information in this paper, Inductive Representation Learning on Large Graphs for more details.
Reply