KTO Kahneman-Tversky Optimisation Equation
Tags: #nlp #llm #AIEquation
$$f(\pi_\theta, \pi_\text{ref}) = \mathbb{E}_{x,y\sim\mathcal{D}}[ a_{x,y} v(r_\theta(x,y) - \mathbb{E}_{Q}[r_\theta(x, y')])] + C_\mathcal{D}$$Latex Code
f(\pi_\theta, \pi_\text{ref}) = \mathbb{E}_{x,y\sim\mathcal{D}}[ a_{x,y} v(r_\theta(x,y) - \mathbb{E}_{Q}[r_\theta(x, y')])] + C_\mathcal{D}
Have Fun
Let's Vote for the Most Difficult Equation!
Introduction
Introduction
$$\theta$$ : trainable parameters of the model $$\pi_\theta: \mathcal{X} \to \mathcal{P}(\mathcal{Y})$$$$r_\theta(x,y) $$ : implied reward $$ r_\theta(x,y) = {l(y)} \log [\pi_\theta(y|x) / \pi_\text{ref}(y|x)] $$
Function $$f$$: a \textit{human-aware loss} for $$v$$ if $$\exists\ a_{x,y} \in \{-1, +1\}$$.
$$ v(r_\theta(x,y)$$: Human Value of (x,y) is denoted as v(x, y), $$ v(r_\theta(x,y) - \mathbb{E}_{Q}[r_\theta(x,y')]) $$
$$ a_{x,y} $$ : Labels $$\exists\ a_{x,y} \in \{-1, +1\}$$
$$Q(Y'|x)$$: is a reference point distribution
$$\mathcal{D}$$ : feedback data
$$C_\mathcal{D} \in \mathbb{R}$$ : data-specific constant.