Cheatsheet of Latex Code for Kernel Methods and Gaussian Process

rockingdingo 2024-08-25 23:05 #kernel #svm #gaussian process #gp #deep kernel learning

Navigation

In this blog, we will summarize the latex code of most popular kernel methods and Gaussian Process models, including Support Vector Machine (SVM), Gaussian Process (GP) and Deep Kernel Learning(DKL).

1. Kernel Methods

1.1 Support Vector Machine (SVM)

1.2 Gaussian Process (GP)

1.3 Deep Kernel Learning(DKL)

1. Kernel Methods

1.1 Support Vector Machine (SVM)

Equation

Find optimal hyper plane
$\max_{w,b} \frac{2}{||w||} \\ s.t.\ y_{i}(w^{T}x_{i} + b) \geq 1, i=1,2,...,m$
Dual problem Lagrangian Relaxation
$L(w,b,\alpha)=\frac{1}{2}||w||^2 + \sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))$

Latex Code

\max_{w,b} \frac{2}{||w||} \\ s.t.\ y_{i}(w^{T}x_{i} + b) \geq 1, i=1,2,...,m \\ L(w,b,\alpha)=\frac{1}{2}||w||^2 + \sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))

Explanation

Latex code for Support Vector Machine (SVM).

1.2 Gaussian Process (GP)

Joint Gaussian Distribution assumption
$f(X)=\[f(x_{1}),f(x_{2}),...,f(x_{N}))\]^{T} \sim \mathcal{N}(\mu, K_{X,X})$
Probabilistic framework for GP
$\log p(y|X) \propto -[y^{T}(K + \sigma^{2}I)^{-1}y+\log|K + \sigma^{2}I|]$
Prediction on new unseen data
$f_{*}|X_{*},X,y \sim \mathcal{N}(\mathbb{E}(f_{*}),\text{cov}(f_{*})) \\ \mathbb{E}(f_{*}) = \mu_{X_{*}}+K_{X_{*},X}[K_{X,X}+\sigma^{2}I]^{-1}(y-\mu_{x}) \\ \text{cov}(f_{*})=K_{X_{*},X_{*}}-K_{X_{*},X}[K_{X,X}+\sigma^{2}I]^{-1}K_{X,X_{*}}$

Latex Code

            // Joint Gaussian Distribution assumption
            f(X)=\[f(x_{1}),f(x_{2}),...,f(x_{N}))\]^{T} \sim \mathcal{N}(\mu, K_{X,X})

            // Probabilistic framework for GP
            \log p(y|X) \propto -[y^{T}(K + \sigma^{2}I)^{-1}y+\log|K + \sigma^{2}I|]

            // Prediction on new unseen data
            f_{*}|X_{*},X,y \sim \mathcal{N}(\mathbb{E}(f_{*}),\text{cov}(f_{*})) \\

            \mathbb{E}(f_{*}) = \mu_{X_{*}}+K_{X_{*},X}[K_{X,X}+\sigma^{2}I]^{-1}(y-\mu_{x}) \\

            \text{cov}(f_{*})=K_{X_{*},X_{*}}-K_{X_{*},X}[K_{X,X}+\sigma^{2}I]^{-1}K_{X,X_{*}}

Explanation

Gaussian process assumes that the output of N function are not independent but correlated. It assumes the collection of N function values, represented by N-dimensional vector f, has a joint Gaussian distribution with mean vector and covariance matrix(kernel matrix). The predicted value of n^{*} test values are given by mean and variance as \mathbb{E}(f_{*}) and \text{cov}(f_{*}) respectively. See below link Deep Kernel Learning for more details.

1.3 Deep Kernel Learning(DKL)

Equation

$k(x_{i},x_{j}|\phi)=k(h(x_i,w_k),h(x_j,w_k)|w_k,\phi)$

Latex Code
```
            k(x_{i},x_{j}|\phi)=k(h(x_i,w_k),h(x_j,w_k)|w_k,\phi)
        
```
Explanation

The original data instance x_{i} is first mapped to latent space by a non-linear transformation h(x_{i}, w_{k}), usually a deep neural network with parameter w_{k}, and then passed to a kernel function k(x_{i},x_{j}|\phi). See below link Deep Kernel Learning for more details.

1.1 Support Vector Machine (SVM)

Equation

Latex Code

Explanation

1.2 Gaussian Process (GP)

Equation

Latex Code

Explanation

1.3 Deep Kernel Learning(DKL)

Equation

Latex Code

Explanation