Multi-Gate Mixture of Experts MMoE
Tags: #machine learning #multi taskEquation
$$g^{k}(x)=\text{softmax}(W_{gk}x) \\ f^{k}(x)=\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\ y_{k}=h^{k}(f^{k}(x))$$Latex Code
g^{k}(x)=\text{softmax}(W_{gk}x) \\ f^{k}(x)=\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\ y_{k}=h^{k}(f^{k}(x))
Have Fun
Let's Vote for the Most Difficult Equation!
Introduction
Equation
Latex Code
g^{k}(x)=\text{softmax}(W_{gk}x) \\ f^{k}(x)=\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\ y_{k}=h^{k}(f^{k}(x))
Explanation
Multi-Gate Mixture of Experts (MMoE) model is firstly introduced in KDD2018 paper Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. The model introduce a MMoE layer to model the relationship of K multiple tasks using N experts. Let's assume input feature X has dimension D. There are K output tasks and N experts networks. The gating network is calculated as, g^{k}(x) is a N-dimensional vector indicating the softmax result of relative weights, W_{gk} is a trainable matrix with size R^{ND}. And f^{k}(x) is the weghted sum representation of output of N experts for task k. f_{i}(x) is the output of the i-th expert, and f^{k}(x) indicates the representation of k-th tasks as the summation of N experts.
Related Documents
- See paper Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts for details.
Related Videos
Discussion
Comment to Make Wishes Come True
Leave your wishes (e.g. Passing Exams) in the comments and earn as many upvotes as possible to make your wishes come true
-
Chris BurtonDoing everything I can to ensure I pass this exam.Joel Harvey reply to Chris BurtonNice~2023-03-11 00:00:00.0 -
Clarence BriggsCrossing my fingers to pass this test.Douglas Richardson reply to Clarence BriggsNice~2023-06-05 00:00:00.0 -
Vance ShepardStriving to pass this upcoming test.Alexander Rogers reply to Vance ShepardNice~2024-05-03 00:00:00.0
Reply