Cheatsheet of Latex Code for Kernel Methods and Gaussian Process

rockingdingo 2022-07-11 #kernel #svm #gaussian process #gp #deep kernel learning


Cheatsheet of Latex Code for Kernel Methods and Gaussian Process

Navigation

In this blog, we will summarize the latex code of most popular kernel methods and Gaussian Process models, including Support Vector Machine (SVM), Gaussian Process (GP) and Deep Kernel Learning(DKL).

1. Kernel Methods

  • 1.1 Support Vector Machine (SVM)

    Equation

    Find optimal hyper plane

    Dual problem Lagrangian Relaxation

    Latex Code
                \max_{w,b} \frac{2}{||w||} \\
                s.t.\ y_{i}(w^{T}x_{i} + b) \geq 1, i=1,2,...,m  \\ 
                L(w,b,\alpha)=\frac{1}{2}||w||^2 + \sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))
            
    Explanation

    Latex code for Support Vector Machine (SVM).

  • 1.2 Gaussian Process (GP)

    Equation

    Joint Gaussian Distribution assumption

    Probabilistic framework for GP

    Prediction on new unseen data

    Latex Code
                // Joint Gaussian Distribution assumption
                f(X)=\[f(x_{1}),f(x_{2}),...,f(x_{N}))\]^{T} \sim \mathcal{N}(\mu, K_{X,X})
    
                // Probabilistic framework for GP
                \log p(y|X) \propto -[y^{T}(K + \sigma^{2}I)^{-1}y+\log|K + \sigma^{2}I|]
    
                // Prediction on new unseen data
                f_{*}|X_{*},X,y \sim \mathcal{N}(\mathbb{E}(f_{*}),\text{cov}(f_{*})) \\
    
                \mathbb{E}(f_{*}) = \mu_{X_{*}}+K_{X_{*},X}[K_{X,X}+\sigma^{2}I]^{-1}(y-\mu_{x}) \\
    
                \text{cov}(f_{*})=K_{X_{*},X_{*}}-K_{X_{*},X}[K_{X,X}+\sigma^{2}I]^{-1}K_{X,X_{*}}
            
    Explanation

    Gaussian process assumes that the output of N function are not independent but correlated. It assumes the collection of N function values, represented by N-dimensional vector f, has a joint Gaussian distribution with mean vector and covariance matrix(kernel matrix). The predicted value of n^{*} test values are given by mean and variance as \mathbb{E}(f_{*}) and \text{cov}(f_{*}) respectively. See below link Deep Kernel Learning for more details.

  • 1.3 Deep Kernel Learning(DKL)

    Equation


    Latex Code
                k(x_{i},x_{j}|\phi)=k(h(x_i,w_k),h(x_j,w_k)|w_k,\phi)
            
    Explanation

    The original data instance x_{i} is first mapped to latent space by a non-linear transformation h(x_{i}, w_{k}), usually a deep neural network with parameter w_{k}, and then passed to a kernel function k(x_{i},x_{j}|\phi). See below link Deep Kernel Learning for more details.