## Cheatsheet of Latex Code for Kernel Methods and Gaussian Process

rockingdingo 2022-07-11 #kernel #svm #gaussian process #gp #deep kernel learning 2 0

Cheatsheet of Latex Code for Kernel Methods and Gaussian Process

In this blog, we will summarize the latex code of most popular kernel methods and Gaussian Process models, including Support Vector Machine (SVM), Gaussian Process (GP) and Deep Kernel Learning(DKL).

### 1. Kernel Methods

• #### 1.1 Support Vector Machine (SVM)

##### Equation

Find optimal hyper plane

Dual problem Lagrangian Relaxation

##### Latex Code
            \max_{w,b} \frac{2}{||w||} \\
s.t.\ y_{i}(w^{T}x_{i} + b) \geq 1, i=1,2,...,m  \\
L(w,b,\alpha)=\frac{1}{2}||w||^2 + \sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))

##### Explanation

Latex code for Support Vector Machine (SVM).

• #### 1.2 Gaussian Process (GP)

##### Equation

Joint Gaussian Distribution assumption

Probabilistic framework for GP

Prediction on new unseen data

##### Latex Code
            // Joint Gaussian Distribution assumption
f(X)=$f(x_{1}),f(x_{2}),...,f(x_{N}))$^{T} \sim \mathcal{N}(\mu, K_{X,X})

// Probabilistic framework for GP
\log p(y|X) \propto -[y^{T}(K + \sigma^{2}I)^{-1}y+\log|K + \sigma^{2}I|]

// Prediction on new unseen data
f_{*}|X_{*},X,y \sim \mathcal{N}(\mathbb{E}(f_{*}),\text{cov}(f_{*})) \\

\mathbb{E}(f_{*}) = \mu_{X_{*}}+K_{X_{*},X}[K_{X,X}+\sigma^{2}I]^{-1}(y-\mu_{x}) \\

\text{cov}(f_{*})=K_{X_{*},X_{*}}-K_{X_{*},X}[K_{X,X}+\sigma^{2}I]^{-1}K_{X,X_{*}}

##### Explanation

Gaussian process assumes that the output of N function are not independent but correlated. It assumes the collection of N function values, represented by N-dimensional vector f, has a joint Gaussian distribution with mean vector and covariance matrix(kernel matrix). The predicted value of n^{*} test values are given by mean and variance as \mathbb{E}(f_{*}) and \text{cov}(f_{*}) respectively. See below link Deep Kernel Learning for more details.

• #### 1.3 Deep Kernel Learning(DKL)

##### Latex Code
            k(x_{i},x_{j}|\phi)=k(h(x_i,w_k),h(x_j,w_k)|w_k,\phi)

##### Explanation

The original data instance x_{i} is first mapped to latent space by a non-linear transformation h(x_{i}, w_{k}), usually a deep neural network with parameter w_{k}, and then passed to a kernel function k(x_{i},x_{j}|\phi). See below link Deep Kernel Learning for more details.