## Few-Shot Learning And Zero-Shot Learning Equations Latex Code

rockingdingo 2022-10-23 #Few-Shot #Zero-Shot #MAML #ProtoNets #Bregman Divergences 2 0

Few-Shot Learning And Zero-Shot Learning Equations Latex Code

In this blog, we will summarize the latex code of most fundamental equations of Few-Shot Learning and Zero-Shot Learning. Few-Shot Learning learns from a few-labelled examples and better generalize to unseen examples. Typical works includes Prototypical Networks, Model-Agnostic Meta-Learning (MAML), etc.

### 1. Prototypical Networks (Protonets)

• #### 1.1 Prototypes

See paper Prototypical Networks for Few-shot Learning for more detail.

##### Latex Code
            c_{k}=\frac{1}{|S_{k}|}\sum_{(x_{i},y_{i}) \in S_{k}} f_{\phi}(x) \\ p_{\phi}(y=k|x)=\frac{\exp(-d(f_{\phi}(x), c_{k}))}{\sum_{k^{'}} \exp(-d(f_{\phi}(x), c_{k^{'}})} \\\min J(\phi)=-\log p_{\phi}(y=k|x)

##### Explanation

Prototypical networks compute an M-dimensional representation c_{k} or prototype, of each class through an embedding f_{\phi}(.) with parameters \phi. Each prototype is the mean vector of the embedded support points belonging to its class k. Prototypical networks then produce a distribution over classes for a query point x based on a softmax over distances to the prototypes in the embedding space as p(y=k|x). Then the negative log-likelihood of J(\theta) is calculated over query set. #### 1.2 Prototypical Networks as Mixture Density Estimation

##### Bregman divergences

            d_{\phi}(z,z^{'})=\phi(z) - \phi(z^{'})-(z-z^{'})^{T} \nabla \phi(z^{'})


##### Mixture Density Estimation

            p_{\phi}(y=k|z)=\frac{\pi_{k} \exp(-d(z, \mu (\theta_{k})))}{\sum_{k^{'}} \pi_{k^{'}} \exp(-d(z, \mu (\theta_{k})))}

##### Explanation

The prototypi- cal networks algorithm is equivalent to performing mixture density estimation on the support set with an exponential family density. A regular Bregman divergence d_{\phi} is defined as above. \phi is a differentiable, strictly convex function of the Legendre type. Examples of Bregman divergences include squared Euclidean distance and Mahalanobis distance.

### 2. Model-Agnostic Meta-Learning (MAML)

• #### 1.1 MAML Meta-Objective

##### Latex Code
            \min_{\theta} \sum_{\mathcal{T}_{i} \sim p(\mathcal{T})} \mathcal{L}_{\mathcal{T}_{i}}(f_{\theta^{'}_{i}}) = \sum_{\mathcal{T}_{i} \sim p(\mathcal{T})} \mathcal{L}_{\mathcal{T}_{i}}(f_{\theta_{i} - \alpha \nabla_{\theta} \mathcal{L}_{\mathcal{T}_{i}} (f_{\theta}) })

##### Explanation Model-Agnostic Meta-Learning (MAML) tries to find an initial parameter vector Î¸ that can be quickly adapted via meta-task gradients to task-specific optimal parameter vectors. See paper Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks for details.