Progressive Layered Extraction PLE

Tags: #machine learning #multi task

Equation

$$g^{k}(x)=w^{k}(x)S^{k}(x) \\ w^{k}(x)=\text{softmax}(W^{k}_{g}x) \\ S^{k}(x)=\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\]^{T} \\ y^{k}(x)=t^{k}(g^{k}(x)) \\ g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x) $$

Latex Code

                                 g^{k}(x)=w^{k}(x)S^{k}(x) \\
            w^{k}(x)=\text{softmax}(W^{k}_{g}x) \\
            S^{k}(x)=\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\]^{T} \\
            y^{k}(x)=t^{k}(g^{k}(x)) \\
            g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x)

Have Fun

Let's Vote for the Most Difficult Equation!

Introduction

Equation

$g^{k}(x)=w^{k}(x)S^{k}(x) \\ w^{k}(x)=\text{softmax}(W^{k}_{g}x) \\ S^{k}(x)=\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\]^{T} \\ y^{k}(x)=t^{k}(g^{k}(x)) \\ g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x)$

Latex Code

            g^{k}(x)=w^{k}(x)S^{k}(x) \\
            w^{k}(x)=\text{softmax}(W^{k}_{g}x) \\
            S^{k}(x)=\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\]^{T} \\
            y^{k}(x)=t^{k}(g^{k}(x)) \\
            g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x)

Explanation

Progressive Layered Extraction(PLE) model slightly modifies the original structure of MMoE models and explicitly separate the experts into shared experts and task-specific experts. Let's assume there are m_{s} shared experts and m_{t} tasks-specific experts. S^{k}(x) is a selected matrix composed of (m_{s} + m_{t}) D-dimensional vectors, with dimension as (m_{s} + m_{t}) \times D. w^{k}(x) denotes the gating function with size (m_{s} + m_{t}) and W^{k}_{g} is a trainable parameters with dimension as (m_{s} + m_{t}) \times D. t^{k} denotes the task-specific tower paratmeters. The progressive extraction layer means that the gating network g^{k,j}(x) of j-th extraction layer takes the output of previous gating layers g^{k,j-1}(x) as inputs.

Comments

Brian Clark 2023-04-23 00:00

Hopefully, I can celebrate passing this test!

0

0

Follow

Reply

Sara Duncan

replies to

Brian Clark

2023-04-29 00:00

Best Wishes.

Reply
Paul Martin 2023-07-06 00:00

Fingers crossed for passing this exam!

0

0

Follow

Reply

Joshua Moore

replies to

Paul Martin

2023-08-05 00:00

Nice~

Reply
Tracy Shepard 2024-03-16 00:00

Would be great if I could pass this test.

0

0

Follow

Reply

Timothy Perez

replies to

Tracy Shepard

2024-03-30 00:00

Nice~

Reply

Write Your Comment

Chatbot close

Bot
Hi there
How can I help you today?

Send

Progressive Layered Extraction PLE

Equation

Latex Code

Have Fun

Introduction

Equation

Latex Code

Explanation

Related Documents

Related Videos

Comments

Write Your Comment