Multi-Gate Mixture of Experts MMoE

Tags: #machine learning #multi task

Equation

$$g^{k}(x)=\text{softmax}(W_{gk}x) \\ f^{k}(x)=\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\ y_{k}=h^{k}(f^{k}(x))$$

Latex Code

                                 g^{k}(x)=\text{softmax}(W_{gk}x) \\
            f^{k}(x)=\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\
            y_{k}=h^{k}(f^{k}(x))
                            

Have Fun

Let's Vote for the Most Difficult Equation!

Introduction

Equation



Latex Code

            g^{k}(x)=\text{softmax}(W_{gk}x) \\
            f^{k}(x)=\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\
            y_{k}=h^{k}(f^{k}(x))
        

Explanation

Multi-Gate Mixture of Experts (MMoE) model is firstly introduced in KDD2018 paper Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. The model introduce a MMoE layer to model the relationship of K multiple tasks using N experts. Let's assume input feature X has dimension D. There are K output tasks and N experts networks. The gating network is calculated as, g^{k}(x) is a N-dimensional vector indicating the softmax result of relative weights, W_{gk} is a trainable matrix with size R^{ND}. And f^{k}(x) is the weghted sum representation of output of N experts for task k. f_{i}(x) is the output of the i-th expert, and f^{k}(x) indicates the representation of k-th tasks as the summation of N experts.

Related Documents

Related Videos

Comments

Write Your Comment

Upload Pictures and Videos