Binary Cross Entropy Optimization BCO
Tags: #AI #nlp #llm #RLHFEquation
$$E_{(x, y_w, y_l) \sim \mathcal{D}} [-\log \sigma \left( r_\theta (x, y_w) - r_\theta(x, y_l) \right) ] < E_{(x, y_w, y_l) \sim \mathcal{D}} [- \log \sigma (r_\theta(x, y_w))] + E_{(x, y_w, y_l) \sim \mathcal{D}} [- \log \left( 1 - \sigma (r_\theta (x, y_l)) \right)] $$ $$ E_{(x, y_w, y_l) \sim \mathcal{D}} [- \log \sigma(r_\theta(x, y_w) - \delta) - \log \sigma(- (r_\theta(x, y_l) - \delta))] $$ $$ \mathcal{L}_\text{BCO}(\theta) = - E_{(x, y) \sim \mathcal{D}^+} [\log \sigma (r_\theta (x, y) - \delta)] - E_{(x, y) \sim \mathcal{D}^-} \left[ \frac{p_\psi (f = 1 \mid x)}{p_\psi (f = 0 \mid x)} \log \sigma (- (r_\theta (x, y) - \delta)) \right] $$Latex Code
E_{(x, y_w, y_l) \sim \mathcal{D}} [-\log \sigma \left( r_\theta (x, y_w) - r_\theta(x, y_l) \right) ] < E_{(x, y_w, y_l) \sim \mathcal{D}} [- \log \sigma (r_\theta(x, y_w))] + E_{(x, y_w, y_l) \sim \mathcal{D}} [- \log \left( 1 - \sigma (r_\theta (x, y_l)) \right)] $$ $$ E_{(x, y_w, y_l) \sim \mathcal{D}} [- \log \sigma(r_\theta(x, y_w) - \delta) - \log \sigma(- (r_\theta(x, y_l) - \delta))] $$ $$ \mathcal{L}_\text{BCO}(\theta) = - E_{(x, y) \sim \mathcal{D}^+} [\log \sigma (r_\theta (x, y) - \delta)] - E_{(x, y) \sim \mathcal{D}^-} \left[ \frac{p_\psi (f = 1 \mid x)}{p_\psi (f = 0 \mid x)} \log \sigma (- (r_\theta (x, y) - \delta)) \right]
Have Fun
Let's Vote for the Most Difficult Equation!
Introduction
For a binary classifier that assigns a reward logit, where { prompt, chosen completion } pairs are mapped to 1 and { prompt, rejected completion } pairs are mapped to 0, minimizing the binary cross-entropy loss between the true and predicted labels upper bounds the direct preference optimization loss Reward shift, Consider the case where the reward is shifted by $$ \delta $$, The binary cross-entropy loss still holds as an upper bound of the DPO loss paper: Binary Classifier Optimization for Large Language Model Alignment