Cheatsheet of Latex Code for Kernel Methods and Gaussian Process
Navigation
In this blog, we will summarize the latex code of most popular kernel methods and Gaussian Process models, including Support Vector Machine (SVM), Gaussian Process (GP) and Deep Kernel Learning(DKL).
- 1. Kernel Methods
- 1.1 Support Vector Machine (SVM)
- 1.2 Gaussian Process (GP)
- 1.3 Deep Kernel Learning(DKL)
1. Kernel Methods
-
1.1 Support Vector Machine (SVM)
Equation
Find optimal hyper plane
Dual problem Lagrangian Relaxation
Latex Code
\max_{w,b} \frac{2}{||w||} \\ s.t.\ y_{i}(w^{T}x_{i} + b) \geq 1, i=1,2,...,m \\ L(w,b,\alpha)=\frac{1}{2}||w||^2 + \sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))
Explanation
Latex code for Support Vector Machine (SVM).
-
1.2 Gaussian Process (GP)
Equation
Joint Gaussian Distribution assumption
Probabilistic framework for GP
Prediction on new unseen data
Latex Code
// Joint Gaussian Distribution assumption f(X)=\[f(x_{1}),f(x_{2}),...,f(x_{N}))\]^{T} \sim \mathcal{N}(\mu, K_{X,X}) // Probabilistic framework for GP \log p(y|X) \propto -[y^{T}(K + \sigma^{2}I)^{-1}y+\log|K + \sigma^{2}I|] // Prediction on new unseen data f_{*}|X_{*},X,y \sim \mathcal{N}(\mathbb{E}(f_{*}),\text{cov}(f_{*})) \\ \mathbb{E}(f_{*}) = \mu_{X_{*}}+K_{X_{*},X}[K_{X,X}+\sigma^{2}I]^{-1}(y-\mu_{x}) \\ \text{cov}(f_{*})=K_{X_{*},X_{*}}-K_{X_{*},X}[K_{X,X}+\sigma^{2}I]^{-1}K_{X,X_{*}}
Explanation
Gaussian process assumes that the output of N function are not independent but correlated. It assumes the collection of N function values, represented by N-dimensional vector f, has a joint Gaussian distribution with mean vector and covariance matrix(kernel matrix). The predicted value of n^{*} test values are given by mean and variance as \mathbb{E}(f_{*}) and \text{cov}(f_{*}) respectively. See below link Deep Kernel Learning for more details.
-
1.3 Deep Kernel Learning(DKL)
Equation
Latex Code
k(x_{i},x_{j}|\phi)=k(h(x_i,w_k),h(x_j,w_k)|w_k,\phi)
Explanation
The original data instance x_{i} is first mapped to latent space by a non-linear transformation h(x_{i}, w_{k}), usually a deep neural network with parameter w_{k}, and then passed to a kernel function k(x_{i},x_{j}|\phi). See below link Deep Kernel Learning for more details.