Progressive Layered Extraction PLE

Tags: #machine learning #multi task

Equation

$$g^{k}(x)=w^{k}(x)S^{k}(x) \\ w^{k}(x)=\text{softmax}(W^{k}_{g}x) \\ S^{k}(x)=\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\]^{T} \\ y^{k}(x)=t^{k}(g^{k}(x)) \\ g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x) $$

Latex Code

                                 g^{k}(x)=w^{k}(x)S^{k}(x) \\
            w^{k}(x)=\text{softmax}(W^{k}_{g}x) \\
            S^{k}(x)=\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\]^{T} \\
            y^{k}(x)=t^{k}(g^{k}(x)) \\
            g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x) 
                            

Have Fun

Let's Vote for the Most Difficult Equation!

Introduction

Equation



Latex Code

            g^{k}(x)=w^{k}(x)S^{k}(x) \\
            w^{k}(x)=\text{softmax}(W^{k}_{g}x) \\
            S^{k}(x)=\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\]^{T} \\
            y^{k}(x)=t^{k}(g^{k}(x)) \\
            g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x) 
        

Explanation

Progressive Layered Extraction(PLE) model slightly modifies the original structure of MMoE models and explicitly separate the experts into shared experts and task-specific experts. Let's assume there are m_{s} shared experts and m_{t} tasks-specific experts. S^{k}(x) is a selected matrix composed of (m_{s} + m_{t}) D-dimensional vectors, with dimension as (m_{s} + m_{t}) \times D. w^{k}(x) denotes the gating function with size (m_{s} + m_{t}) and W^{k}_{g} is a trainable parameters with dimension as (m_{s} + m_{t}) \times D. t^{k} denotes the task-specific tower paratmeters. The progressive extraction layer means that the gating network g^{k,j}(x) of j-th extraction layer takes the output of previous gating layers g^{k,j-1}(x) as inputs.

Related Documents

Related Videos

Comments

Write Your Comment

Upload Pictures and Videos