Cheatsheet of Latex Code for Most Popular Machine Learning Equations

rockingdingo 2024-08-25 23:05 #GAN #VAE #KL-Divergence #Wasserstein #Mahalanobis

Navigation

In this blog, we will summarize the latex code for most popular machine learning equations, including multiple distance measures, generative models, etc. There are various distance measurements of different data distribution, including KL-Divergence, JS-Divergence, Wasserstein Distance(Optimal Transport), Maximum Mean Discrepancy(MMD) and so on. We will provide the latex code for machine learning models in the following sections. We will also provide latex code of Generative Adversarial Networks(GAN), Variational AutoEncoder(VAE), Diffusion Models(DDPM) for generative models in the second section.

1. Distance Measure

1.1 Kullback-Leibler Divergence(KL-Divergence)

1.2 Jensen-Shannon Divergence(JS-Divergence)

1.3 Wasserstein Distance(Optimal Transport)

1.4 Maximum Mean Discrepancy(MMD)

1.5 Mahalanobis Distance

2. Generative Models

2.1 Generative Adversarial Networks(GAN)

2.2 Variational AutoEncoder(VAE)

2.3 Diffusion Models(DDPM)

Distance Measure

Kullback-Leibler Divergence(KL-Divergence)

Equation

$$KL(P||Q)=\sum_{x}P(x)\log(\frac{P(x)}{Q(x)})$$

Latex Code

KL(P||Q)=\sum_{x}P(x)\log(\frac{P(x)}{Q(x)})

Explanation

Jensen-Shannon Divergence(JS-Divergence)

Equation

$JS(P||Q)=\frac{1}{2}KL(P||\frac{(P+Q)}{2})+\frac{1}{2}KL(Q||\frac{(P+Q)}{2})$

Latex Code

        JS(P||Q)=\frac{1}{2}KL(P||\frac{(P+Q)}{2})+\frac{1}{2}KL(Q||\frac{(P+Q)}{2})

Explanation

Wasserstein Distance(Optimal Transport)

Equation

$W_{p}(P,Q)=(\inf_{J \in J(P,Q)} \int{||x-y||^{p}dJ(X,Y)})^\frac{1}{p}$

Latex Code

        W_{p}(P,Q)=(\inf_{J \in J(P,Q)} \int{||x-y||^{p}dJ(X,Y)})^\frac{1}{p}

Explanation

Maximum Mean Discrepancy(MMD)

Equation

$\textup{MMD}(\mathbb{F},X,Y):=\sup_{f \in \mathbb{F}}(\frac{1}{m}\sum_{i=1}^{m}f(x_{i}) - \frac{1}{n}\sum_{j=1}^{n}f(y_{j}))$

Latex Code

  
        \textup{MMD}(\mathbb{F},X,Y):=\sup_{f \in \mathbb{F}}(\frac{1}{m}\sum_{i=1}^{m}f(x_{i}) - \frac{1}{n}\sum_{j=1}^{n}f(y_{j}))

Explanation

Mahalanobis Distance

Equation

$D_{M}(x,y)=\sqrt{(x-y)^{T}\Sigma^{-1}(x-y)}$

Latex Code
```
  
        D_{M}(x,y)=\sqrt{(x-y)^{T}\Sigma^{-1}(x-y)}
        
```
Explanation

Mahalanobis Distance is a distance measure between a data point and dataset of a distribution. See website for more details https://www.sciencedirect.com/topics/engineering/mahalanobis-distance.

Generative Models

Generative Adversarial Networks(GAN)

Equation

$\min_{G} \max_{D} V(D,G)=\mathbb{E}_{x \sim p_{data}(x)}[\log D(x)]+\mathbb{E}_{z \sim p_{z}(z)}[\log(1-D(G(z)))]$

Latex Code
```
        \min_{G} \max_{D} V(D,G)=\mathbb{E}_{x \sim p_{data}(x)}[\log D(x)]+\mathbb{E}_{z \sim p_{z}(z)}[\log(1-D(G(z)))]
        
```
Explanation

GAN latex code is illustrated above. See paper for more details Generative Adversarial Networks

Variational AutoEncoder(VAE)

Estimating the Log-likelihood and Posterior

Equation

$\log p_{\theta}(x)=\mathbb{E}_{q_{\phi}(z|x)}[\log p_{\theta}(x)] \\=\mathbb{E}_{q_{\phi}(z|x)}[\log \frac{p_{\theta}(x,z)}{p_{\theta}(z|x)}] \\=\mathbb{E}_{q_{\phi}(z|x)}[\log [\frac{p_{\theta}(x,z)}{q_{\phi}(z|x)} \times \frac{q_{\phi}(z|x)}{p_{\theta}(z|x)}]] \\=\mathbb{E}_{q_{\phi}(z|x)}[\log [\frac{p_{\theta}(x,z)}{q_{\phi}(z|x)} ]] +D_{KL}(q_{\phi}(z|x) || p_{\theta}(z|x))$

Latex Code

        \log p_{\theta}(x)=\mathbb{E}_{q_{\phi}(z|x)}[\log p_{\theta}(x)] \\
        =\mathbb{E}_{q_{\phi}(z|x)}[\log \frac{p_{\theta}(x,z)}{p_{\theta}(z|x)}] \\
        =\mathbb{E}_{q_{\phi}(z|x)}[\log [\frac{p_{\theta}(x,z)}{q_{\phi}(z|x)} \times \frac{q_{\phi}(z|x)}{p_{\theta}(z|x)}]] \\
        =\mathbb{E}_{q_{\phi}(z|x)}[\log [\frac{p_{\theta}(x,z)}{q_{\phi}(z|x)} ]] +D_{KL}(q_{\phi}(z|x) || p_{\theta}(z|x))\\

Explanation

Evidence Lower Bound

Equation

$\mathbb{L}_{\theta,\phi}(\mathbf{x})=\mathbb{E}_{q_{\phi}(\mathbf{z}|\mathbf{x})}[\log p_{\theta}(\mathbf{x},\mathbf{z})-\log q_{\phi}(\mathbf{z}|\mathbf{x}) ]$

Latex Code

            \mathbb{L}_{\theta,\phi}(\mathbf{x})=\mathbb{E}_{q_{\phi}(\mathbf{z}|\mathbf{x})}[\log p_{\theta}(\mathbf{x},\mathbf{z})-\log q_{\phi}(\mathbf{z}|\mathbf{x}) ]

Explanation

Reparameterization trick

Equation

$z = \mu + \epsilon \cdot \sigma$

Latex Code

            z = \mu + \epsilon \cdot \sigma

Explanation

VAE latex code is illustrated above. See paper for more details Auto-Encoding Variational Bayes

Diffusion Models(DDPM)

Explanation

See paper Denoising Diffusion Probabilistic Models for more details. See reference of the following blogpost https://lilianweng.github.io/posts/2021-07-11-diffusion-models/

1.1 Forward Process

Equation

$q(x_{t}|x_{t-1})=\mathcal{N}(x_{t};\sqrt{1-\beta_{t}}x_{t-1},\beta_{t}I) \\q(x_{1:T}|x_{0})=\prod_{t=1}^{T}q(x_{t}|x_{t-1})$

Latex Code

            q(x_{t}|x_{t-1})=\mathcal{N}(x_{t};\sqrt{1-\beta_{t}}x_{t-1},\beta_{t}I) \\q(x_{1:T}|x_{0})=\prod_{t=1}^{T}q(x_{t}|x_{t-1})

1.2 Forward Process Reparameterization Trick

Equation

$x_{t}=\sqrt{\alpha_{t}}x_{t-1}+\sqrt{1-\alpha_{t}} \epsilon_{t-1}\\=\sqrt{\alpha_{t}\alpha_{t-1}}x_{t-2} + \sqrt{1-\alpha_{t}\alpha_{t-1}} \bar{\epsilon}_{t-2}\\=\text{...}\\=\sqrt{\bar{\alpha}_{t}}x_{0}+\sqrt{1-\bar{\alpha}_{t}}\epsilon \\\alpha_{t}=1-\beta_{t}, \bar{\alpha}_{t}=\prod_{t=1}^{T}\alpha_{t}$

Latex Code

            x_{t}=\sqrt{\alpha_{t}}x_{t-1}+\sqrt{1-\alpha_{t}} \epsilon_{t-1}\\=\sqrt{\alpha_{t}\alpha_{t-1}}x_{t-2} + \sqrt{1-\alpha_{t}\alpha_{t-1}} \bar{\epsilon}_{t-2}\\=\text{...}\\=\sqrt{\bar{\alpha}_{t}}x_{0}+\sqrt{1-\bar{\alpha}_{t}}\epsilon \\\alpha_{t}=1-\beta_{t}, \bar{\alpha}_{t}=\prod_{t=1}^{T}\alpha_{t}

1.3 Reverse Process

$p_\theta(\mathbf{x}_{0:T}) = p(\mathbf{x}_T) \prod^T_{t=1} p_\theta(\mathbf{x}_{t-1} \vert \mathbf{x}_t) \\ p_\theta(\mathbf{x}_{t-1} \vert \mathbf{x}_t) = \mathcal{N}(\mathbf{x}_{t-1}; \boldsymbol{\mu}_\theta(\mathbf{x}_t, t), \boldsymbol{\Sigma}_\theta(\mathbf{x}_t, t))$

Latex Code

            p_\theta(\mathbf{x}_{0:T}) = p(\mathbf{x}_T) \prod^T_{t=1} p_\theta(\mathbf{x}_{t-1} \vert \mathbf{x}_t) \\
            p_\theta(\mathbf{x}_{t-1} \vert \mathbf{x}_t) = \mathcal{N}(\mathbf{x}_{t-1}; \boldsymbol{\mu}_\theta(\mathbf{x}_t, t), \boldsymbol{\Sigma}_\theta(\mathbf{x}_t, t))

1.4 Reverse Process Variational Lower Bound

$\begin{aligned} - \log p_\theta(\mathbf{x}_0) &\leq - \log p_\theta(\mathbf{x}_0) + D_\text{KL}(q(\mathbf{x}_{1:T}\vert\mathbf{x}_0) \| p_\theta(\mathbf{x}_{1:T}\vert\mathbf{x}_0) ) \\ &= -\log p_\theta(\mathbf{x}_0) + \mathbb{E}_{\mathbf{x}_{1:T}\sim q(\mathbf{x}_{1:T} \vert \mathbf{x}_0)} \Big[ \log\frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T}) / p_\theta(\mathbf{x}_0)} \Big] \\ &= -\log p_\theta(\mathbf{x}_0) + \mathbb{E}_q \Big[ \log\frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T})} + \log p_\theta(\mathbf{x}_0) \Big] \\ &= \mathbb{E}_q \Big[ \log \frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T})} \Big] \\ \text{Let }L_\text{VLB} &= \mathbb{E}_{q(\mathbf{x}_{0:T})} \Big[ \log \frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T})} \Big] \geq - \mathbb{E}_{q(\mathbf{x}_0)} \log p_\theta(\mathbf{x}_0) \end{aligned}$

Latex Code

            \begin{aligned}
            - \log p_\theta(\mathbf{x}_0) 
            &\leq - \log p_\theta(\mathbf{x}_0) + D_\text{KL}(q(\mathbf{x}_{1:T}\vert\mathbf{x}_0) \| p_\theta(\mathbf{x}_{1:T}\vert\mathbf{x}_0) ) \\
            &= -\log p_\theta(\mathbf{x}_0) + \mathbb{E}_{\mathbf{x}_{1:T}\sim q(\mathbf{x}_{1:T} \vert \mathbf{x}_0)} \Big[ \log\frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T}) / p_\theta(\mathbf{x}_0)} \Big] \\
            &= -\log p_\theta(\mathbf{x}_0) + \mathbb{E}_q \Big[ \log\frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T})} + \log p_\theta(\mathbf{x}_0) \Big] \\
            &= \mathbb{E}_q \Big[ \log \frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T})} \Big] \\
            \text{Let }L_\text{VLB} 
            &= \mathbb{E}_{q(\mathbf{x}_{0:T})} \Big[ \log \frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T})} \Big] \geq - \mathbb{E}_{q(\mathbf{x}_0)} \log p_\theta(\mathbf{x}_0)
            \end{aligned}

1.5 Reverse Process Variational Lower Bound Decomposition Multiple KL-Divergence

$$begin{aligned}L_\text{VLB} &= \mathbb{E}_{q(\mathbf{x}_{0:T})} \Big[ \log\frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T})} \Big] \\&= \mathbb{E}_q \Big[ \log\frac{\prod_{t=1}^T q(\mathbf{x}_t\vert\mathbf{x}_{t-1})}{ p_\theta(\mathbf{x}_T) \prod_{t=1}^T p_\theta(\mathbf{x}_{t-1} \vert\mathbf{x}_t) } \Big] \\&= \mathbb{E}_q [\underbrace{D_\text{KL}(q(\mathbf{x}_T \vert \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_T))}_{L_T} + \sum_{t=2}^T \underbrace{D_\text{KL}(q(\mathbf{x}_{t-1} \vert \mathbf{x}_t, \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_{t-1} \vert\mathbf{x}_t))}_{L_{t-1}} \underbrace{- \log p_\theta(\mathbf{x}_0 \vert \mathbf{x}_1)}_{L_0} ]\end{aligned}$$

Latex Code

            \begin{aligned}L_\text{VLB} &= \mathbb{E}_{q(\mathbf{x}_{0:T})} \Big[ \log\frac{q(\mathbf{x}_{1:T}\vert\mathbf{x}_0)}{p_\theta(\mathbf{x}_{0:T})} \Big] \\&= \mathbb{E}_q \Big[ \log\frac{\prod_{t=1}^T q(\mathbf{x}_t\vert\mathbf{x}_{t-1})}{ p_\theta(\mathbf{x}_T) \prod_{t=1}^T p_\theta(\mathbf{x}_{t-1} \vert\mathbf{x}_t) } \Big] \\&= \mathbb{E}_q [\underbrace{D_\text{KL}(q(\mathbf{x}_T \vert \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_T))}_{L_T} + \sum_{t=2}^T \underbrace{D_\text{KL}(q(\mathbf{x}_{t-1} \vert \mathbf{x}_t, \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_{t-1} \vert\mathbf{x}_t))}_{L_{t-1}} \underbrace{- \log p_\theta(\mathbf{x}_0 \vert \mathbf{x}_1)}_{L_0} ]\end{aligned}

1.6 Reverse Process Variational Lower Bound Loss Function

$\begin{aligned} L_\text{VLB} &= L_T + L_{T-1} + \dots + L_0 \\ \text{where } L_T &= D_\text{KL}(q(\mathbf{x}_T \vert \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_T)) \\ L_t &= D_\text{KL}(q(\mathbf{x}_t \vert \mathbf{x}_{t+1}, \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_t \vert\mathbf{x}_{t+1})) \text{ for }1 \leq t \leq T-1 \\ L_0 &= - \log p_\theta(\mathbf{x}_0 \vert \mathbf{x}_1) \end{aligned}$

Latex Code

            \begin{aligned}
            L_\text{VLB} &= L_T + L_{T-1} + \dots + L_0 \\
            \text{where } L_T &= D_\text{KL}(q(\mathbf{x}_T \vert \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_T)) \\
            L_t &= D_\text{KL}(q(\mathbf{x}_t \vert \mathbf{x}_{t+1}, \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_t \vert\mathbf{x}_{t+1})) \text{ for }1 \leq t \leq T-1 \\
            L_0 &= - \log p_\theta(\mathbf{x}_0 \vert \mathbf{x}_1)
            \end{aligned}

Comments

V

Victordwc 2025-07-01 01:38

Where is admin? It is important. Thank.

0

0

Follow

Reply
V

Victorvnq 2025-07-01 07:03

Can I contact admin?? I'ts important. Regards.

0

0

Follow

Reply
V

Victorvnq 2025-07-01 12:02

Доброго времени суток . Ваш форум мне показался очень привлекательным и перспективным. Хочу приобрести рекламное место для баннера в шапке, за $1500 в месяц. Оплачивать буду через WebMoney, 50% сразу, а 50% через 2 недели. И еще, адрес моего сайта https://vika-service.by - он не будет противоречить тематике? Спасибо! Напишите о Вашем решении мне в ПМ или на почту zalevskija22201@gmail.com

0

0

Follow

Reply
V

Victorynz 2025-07-01 19:30

Доброго времени суток . Ваш форум мне показался очень привлекательным и перспективным. Хочу приобрести рекламное место для баннера в шапке, за $1500 в месяц. Оплачивать буду через WebMoney, 50% сразу, а 50% через 2 недели. И еще, адрес моего сайта https://vika-service.by - он не будет противоречить тематике? Спасибо! Напишите о Вашем решении мне в ПМ или на почту zalevskija22201@gmail.com

0

0

Follow

Reply
M

Michealsem 2025-07-30 18:35

DeflationCoin - Invest in ecosystem with automatic value growth Colleagues! Presenting a unique investment opportunity - DeflationCoin is raising Series A $1-6M to complete an ecosystem of 7 revenue-generating products. We're not just making another token, but building a full entertainment conglomerate with its own currency. Our model is brilliantly simple - each product (casino, premium dating, exchange, social network) generates profits in dollars, and 20% automatically goes to buying back our own tokens. This creates a closed growth loop: more users = more profits = more buybacks = higher token price = higher company valuation. We've already shown proof-of-concept - token grew from $0.000001 to $0.31 on trading fees alone, with market cap reaching $6.58M. Now launching core products and expecting exponential growth. By investing now, you get equity in a company at $10M valuation that could grow to $1B+ in 3-5 years. This is a rare case where a crypto project has clear revenue streams from proven industries - gambling, dating and finance always work. We're simply bringing them to crypto with innovative tokenomics. Inviting participation in the round before it closes. https://deflationcoin.com/

0

0

Follow

Reply

Write Your Comment

Chatbot close

Bot
Hi TEMP_b6fd490b,
How can I help you today?

Send

Navigation

Distance Measure

Kullback-Leibler Divergence(KL-Divergence)

Jensen-Shannon Divergence(JS-Divergence)

Wasserstein Distance(Optimal Transport)

Maximum Mean Discrepancy(MMD)

Mahalanobis Distance

Generative Models

Generative Adversarial Networks(GAN)

Variational AutoEncoder(VAE)

Estimating the Log-likelihood and Posterior

Evidence Lower Bound

Reparameterization trick

Diffusion Models(DDPM)

1.1 Forward Process

1.2 Forward Process Reparameterization Trick

1.3 Reverse Process

1.4 Reverse Process Variational Lower Bound

1.5 Reverse Process Variational Lower Bound Decomposition Multiple KL-Divergence

1.6 Reverse Process Variational Lower Bound Loss Function

Comments

Write Your Comment