Introduction to On Device Recommendation (Edge Recommendation)

rockingdingo 2024-08-25 23:05 #on device #edgerec #Taobao #Alipay #Meituan #Kuaishou

Navigation

In this blog, we will give you a brief introduction of most recent progress in On-Device Recommendation (Edge Recommendation) in real-world applications. Mobile AI systems and applications have been more popular due to increasing number of mobile devices and technology developments in deep learning based methods, e.g. model compression, distillation and so on. In recent years, on-device recommendations have enpowered many Mobile Apps to better respond to users' most real-time behaviors on mobile deivces, including clicks, scroll-donwns, likes, and many others. We will introduce three applications, including EdgeRec in Taobao, searchbar background words reranking in Alipay, search result reranking in Meituan-Dianping, short-video recommendation in KuaiShou, TfLite Implementation of Tensorflow, etc.

1.Recommender System on Edge in Mobile Taobao (Taobao)

2.Device-cloud Collaborative Recommendation via Meta Controller (Alipay)

3.Search Result On-Device Reranking in Dianping App (Meituan Dianping)

4.Real-time Short Video Recommendation on Mobile Devices (KuaiShou)

5.Introduction to tflite on device Recommendation

1. Recommender System on Edge in Mobile Taobao

1.1 Introduction

Paper url: Recommender System on Edge in Mobile Taobao

Recommender system (RS) has become crucial module in commercial systems. Most of RS are in waterfall flow form especially on mobiles. Specifically, we target to Taobao home-page feeds flow RS which is one of the largest e-commerce RS. Nowadays almost all the waterfall flow RS are based on client-and-sever framework, in which computing overhead on cloud as well as network bandwidth and latency cause the delay for system feedback and user perception. So that the recommended contents may be not what users want at the moment, then users' browsing and clicking willingness will decrease. Edge computing has the potential to address the concerns of response time, network bandwidth as well as data privacy. Our work takes the first to combine edge computing and RS. For system, we design and implement novel EdgeRec (Recommender System on Edge) aiming to do reranking on mobile device, which achieves Real-time Perception and Real-time Feedback. For algorithm, we propose Heterogeneous User Behavior Sequence Modeling and Context-aware Reranking with Behavior Attention Networks that captures users' plentiful behaviors and models reranking considering about user behavior context respectively. We conduct extensive offline and online evaluations on real traffic of Taobao RS before fully deploying EdgeRec into production.

2. Device-cloud Collaborative Recommendation via Meta Controller (Alipay)

1.1 Introduction

Paper url: Device-cloud Collaborative Recommendation via Meta Controller

On-device machine learning enables the lightweight deployment of recommendation models in local clients, which reduces the burden of the cloud-based recommenders and simultaneously incorporates more real-time user features. Nevertheless, the cloud-based recommendation in the industry is still very important considering its powerful model capacity and the efficient candidate generation from the billion-scale item pool. Previous attempts to integrate the merits of both paradigms mainly resort to a sequential mechanism, which builds the on-device recommender on top of the cloud-based recommendation. However, such a design is inflexible when user interests dramatically change: the on-device model is stuck by the limited item cache while the cloud-based recommendation based on the large item pool do not respond without the new re-fresh feedback. To overcome this issue, we propose a meta controller to dynamically manage the collaboration between the on-device recommender and the cloud-based recommender, and introduce a novel efficient sample construction from the causal perspective to solve the dataset absence issue of meta controller. On the basis of the counterfactual samples and the extended training, extensive experiments in the industrial recommendation scenarios show the promise of meta controller in the device-cloud collaboration.

Figure 3: Illustration of Meta-Controller based Device-Cloud Collaboration

Figure 4: Real-time Reranking of Searchbar Background Words on Alipay Mobile App

3. Search Result On-Device Reranking in Dianping App (Meituan)

1.1 Introduction

Paper url: Search Result On-Device Reranking in Dianping App (Meituan)

In Dianping's search result ranking scenario, on device reranking technoloy are adopted to rerank user's search result given input query. The reranking models takes user's most real-time on-device actions as features, such as scrolling downs, clicks, exposure, duration(stay-time on the page) and so on. The model architecture is a list-wise transformer based reranking models, similar to PRM (Personalized Re-ranking Model). When user clicks some nearby restaurants and go back to the search result page, the search result list will be reranked on device and reflect users' most recent interests.

Figure 5: Illustration of Diaping Search Result On-Device Reranking

Figure 6: Real-time Reranking Model Architecture of Dianping Search Result On-Device Reranking

4. Real-time Short Video Recommendation on Mobile Devices (KuaiShou)

1.1 Introduction

Paper url: Real-time Short Video Recommendation on Mobile Devices

Short video applications have attracted billions of users in recent years, fulfilling their various needs with diverse content. Users usually watch short videos on many topics on mobile devices in a short period of time, and give explicit or implicit feedback very quickly to the short videos they watch. The recommender system needs to perceive usersâ?? preferences in real-time in order to satisfy their changing interests. Traditionally, recommender systems deployed at server side return a ranked list of videos for each request from client. Thus it cannot adjust the recommendation results according to the userâ??s real-time feedback before the next request. Due to client-server transmitting latency, it is also unable to make immediate use of usersâ?? real-time feedback. However, as users continue to watch videos and feedback, the changing context leads the ranking of the server-side recommendation system inaccurate. In this paper, we propose to deploy a short video recommendation framework on mobile devices to solve these problems. Specifically, we design and deploy a tiny on-device ranking model to enable real-time re-ranking of server-side recommendation results. We improve its prediction accuracy by exploiting usersâ?? real-time feedback of watched videos and client-specific real-time features. With more accurate predictions, we further consider interactions among candidate videos, and propose a context-aware re-ranking method based on adaptive beam search. The framework has been deployed on Kuaishou, a billion-user scale short video application, and improved effective view, like and follow by 1.28%, 8.22% and 13.6% respectively.

Figure 7: Illustration of KuaiShou Short Video Mobile Device Recommendation Architecture

Algorithm 1: Context-aware re-ranking with adaptive beam search

5.Introduction to TFLite On-device Recommendation

1.1 Introduction

Blog url: Introduction to TFLite On-device Recommendation

Generating personalized high-quality recommendations is crucial to many real-world applications, such as music, videos, merchandise, apps, news, etc. Currently, a typical recommender system is fully constructed at the server side, including collecting user activity logs, training recommendation models using the collected logs, and serving recommendation models. While purely server-based recommender systems have been proven to be powerful, we explore and showcase a more lightweight approach to serve an recommendation model by deploying it on device. We demonstrate that such an on-device recommendation solution enjoys low latency inference that is orders of magnitude faster than server-side models. It enables user experiences that cannot be achieved by traditional server-based recommender systems, such as updating rankings and UI responding to every user tap or interaction. Moreover, on-device model inference respects user privacy without sending user data to a server to do predictions, instead keeping all needed data on the device. It is possible to train the model on public data or via an existing proxy dataset to avoid collecting user data for each new use case, which is demonstrated in our solution. For on-device training, we would refer interested readers to Federated Learning or TFLite model personalization as an alternative. Please find our solution includes the following components: Source code that constructs and trains high quality personalized recommendation models for on-device scenarios. A movie recommendation demo app that runs the model on device. We also provided source code for preparing training examples and a pretrained model in Github repo. Model Recommendation problems are typically formulated as future-activity prediction problems. A recommendation model is therefore trained to predict the userâ??s future activities, given their previous activities happened before. Our published model is constructed with the following architecture: At the context side, each user activity, such as a movie watch, is embedded into an embedding vector. Embedding vectors from past user activities are aggregated by the encoder to generate the context embedding. We support three different types of encoders: Bag-of-Words: activity embeddings are simply averaged. CNN: 1-D convolution is applied to activity embeddings followed by max-pooling. RNN: LSTM is applied to activity embeddings. At the label side, the label item, such as the next movie that the user watched or is interested in, is considered as â??positiveâ??, while all other items (e.g. all other movies the user didnâ??t watch) are considered as â??negativeâ?? through negative sampling. Both positive and negative items are embedded, and the dot product combines the context embedding to produce logits and feed to the loss of softmax cross entropy. Other modeling situations where labels are not binary will be supported in future. After training, the model can be exported and deployed on device for serving. We take the top-K recommendations which are simply the K-highest logits between the context embedding and all label embeddings.

1.1 Introduction