Cheatsheet of Latex Code for Most Popular Transfer Learning Equations
Navigation
In this blog, we will summarize the latex code of most fundamental equations of transfer learning(TL). Different from multitask learning, transfer learning models aims to achieve the best performance on target domain (minimized target domain test errors), not the performance of source domain. Typical transfer learning methods including domain adaptation(DA), feature subspace alignment, etc. In this post, we will dicuss more details of TL equations, including many subareas like domain adaptation, Hdivergence, DomainAdversarial Neural Networks(DANN), which are useful as quick reference for your research.
 1. Domain Adaptation
 1.1 HDivergence
 1.2 Bound on Target Domain Error
 1.3 DomainAdversarial Neural Networks(DANN)
1. Domain Adaptation

1.1 HDivergence
Equation
Latex Code
d_{\mathcal{H}}(\mathcal{D},\mathcal{D}^{'})=2\sup_{h \in \mathcal{H}}\Pr_{\mathcal{D}}[I(h)]\Pr_{\mathcal{D}^{'}}[I(h)]
Explanation
The HDivergence is defined as the superior of divengence between two probability Pr(D) and Pr(D^{'}) for any hypothesis h in all hypotheses class H. In this formulation, given domain X with two data distribution D and D^{'} over X, I(h) denotes the characteristic function(indicator function) on X, which means that for subset of x in I(h), h(x) = 1. You can check more detailed information of domain adaptation and Hdivergence in this paper by Shai BenDavid, A theory of learning from different domains for more details.

1.2 Bound on Target Domain Error
Equation
Latex Code
\epsilon_{T}(h) \le \hat{\epsilon}_{S}(h) + \sqrt{\frac{4}{m}(d \log \frac{2em}{d} + \log \frac{4}{\delta })} + d_{\mathcal{H}}(\tilde{\mathcal{D}}_{S}, \tilde{\mathcal{D}}_{T}) + \lambda \\ \lambda = \lambda_{S} + \lambda_{T}
Explanation
I will explain this equation in more details. Domain adaptation literatures prove that the test error on target domain \epsilon_{T}(h) is bounded by three terms: 1. the empirical estimate of training errors on the source domain \hat{\epsilon}_{S}(h); 2. the distance divergence between source domain and target domain d(Ds, Dt), 3. Fixed term of VCDimension(d), sample size of source domain m, e as the natural logarithm. \lambda denotes a fixed term as the sum of \lambda_{S} and \lambda_{T}, which represent the errors of models training on Ds and Dt respectively. From the above analysis, we can see that if data source Ds and Dt are similar(the divergence between source and target domain distribution Ds and Dt is small), the error on target domain will also be bounded, that's how models trained on source domain will perform better on similar distributed target domains. You can check more detailed information in this NIPS 2006 paper by Shai BenDavid, Analysis of Representations for Domain Adaptation for more details.

1.3 DomainAdversarial Neural Networks(DANN)
Equation
Latex Code
\min [\frac{1}{m}\sum^{m}_{1}\mathcal{L}(f(\textbf{x}^{s}_{i}),y_{i})+\lambda \max(\frac{1}{m}\sum^{m}_{i=1}\mathcal{L}^{d}(o(\textbf{x}^{s}_{i}),1)\frac{1}{m^{'}}\sum^{m^{'}}_{i=1}\mathcal{L}^{d}(o(\textbf{x}^{t}_{i}),0))]
Explanation
In this formulation of DomainAdversarial Neural Networks(DANN), authors add a domain adaptation regularizer term to the original loss function of source domain. The domain adaptation regularizer term are calculated based on the Hdivergence of two distributions h(X_{S}) and h(X_{T}). The adversial network aims to maximize the likelihood that the domain classifier are unable to distingush a data point belongs to source domain S or target domain T. Function o(.) is the domain regressor which learns high level representation o(X) given input X. You can check more detailed information in this paper by Hana Ajakan, Pascal Germain, et al., DomainAdversarial Neural Networks for more details.