CABaRet: Leveraging Recommendation Systems for Mobile Edge Caching

CABaRet:利用推薦系統進行移動邊緣緩存

本文爲SIGCOMM 2018 Workshop (Mobile Edge Communications, MECOMM)論文。ios

筆者翻譯了該論文。因爲時間倉促,且筆者英文能力有限,錯誤之處在所不免;歡迎讀者批評指正。算法

本文及翻譯版本僅用於學習使用。若是有任何不當,請聯繫筆者刪除。express

本文做者包含4位,Savvas Kastanakis@University of Crete and FORTH, Greece;Pavlos Sermpezis@FORTH, Greece; Vasileios Kotronis@FORTH, Greece; Xenofontas Dimitropoulos@University of Crete and FORTH, Greece.
後端

ABSTRACT (摘要)

Joint caching and recommendation has been recently proposed for increasing the efficiency of mobile edge caching. While previous works assume collaboration between mobile network operators and content providers (who control the recommendation systems), this might be challenging in today’s economic ecosystem, with existing protocols and architectures. In this paper, we propose an approach that enables cache-aware recommendations without requiring a network and content provider collaboration. We leverage information provided publicly by the recommendation system, and build a system that provides cache-friendly and high-quality recommendations. We apply our approach to the YouTube service, and conduct measurements on YouTube video recommendations and experiments with video requests, to evaluate the potential gains in the cache hit ratio. Finally, we analytically study the problem of caching optimization under our approach. Our results show that signifcant caching gains can be achieved in practice; 8 to 10 times increase in the cache hit ratio from cache-aware recommendations, and an extra 2 times increase from caching optimization. 緩存

最近,聯合緩存和推薦被提出,以提高移動邊緣緩存的效率。儘管前期工做假設移動網絡操做者和內容提供商(管理推薦系統)之間相互協做,在今天的經濟生態系統中,使用現有的協議和架構,這種相互協做是具備挑戰的。本文中,咱們提出一種緩存感知的推薦方法,而不須要網絡和內容提供商的協做。咱們使用推薦系統提供的公開信息,構建了一種緩存友好的和高質量的推薦系統。咱們將該方法應用於YouTube服務,對YouTube視頻推薦進行測量,使用視頻請求進行試驗,從而評估緩存命中率方面的可能收益。最後,咱們分析研究了咱們方法中的緩存優化問題。試驗結果代表:能夠取得顯著的緩存收益;緩存感知的推薦有8-10倍的緩存命中率提高,且緩存優化能夠帶來額外的2倍的緩存命中率提高(即10-12倍)。服務器

1 INTRODUCTION (引言)

Mobile Edge Caching (MEC) is one of the key technologies for 5G networks [7] that can reduce latency of service delivery and offload traffic from backhaul links. In MEC, caches are located at the edge of the mobile network (e.g., base stations), and thus have limited capacity and serve small –and frequently changing– user populations [10]. These factors, despite the advances in caching policies [10] or delivery techniques [5], limit the possible gains from MEC: capacity is a tiny fraction of today’s content catalogs, and traffic is highly variable; hence, a large number of user requests is for non-cached contents, i.e., not served in the edge. 網絡

移動邊緣緩存(MEC)是5G網絡的關鍵技術之一[7],能夠下降服務交付延遲並卸載回程鏈路的流量。MEC中,緩存位於移動網絡的邊緣(如,基站),所以能力有限且服務少數(且頻繁變化的)用戶[10];這些因素限制了MEC(MEC的緩存能力只是當今內容目錄的一小部分,且流量高度變化)的可能增益,儘管緩存策略[10]和交付技術[5]取得進步。所以,大量用戶請求的是非緩存內容,即不是由邊緣提供服務。架構

A recently proposed solution for increasing the efficiency of MEC is jointly caching and recommending content [2, 4, 13]. Recommendation Systems (RS) are integrated in many popular services (e.g., YouTube, Netflix) and signifcantly affect the user demand [6, 14]. Therefore, steering recommendations towards cached contents, can signifcantly increase the cache hit ratio, even with small caches or populations. app

最近提出的一種提高MEC效率的方法是聯合緩存和推薦內容[2,4,13]。推薦系統(RS)是衆多流行服務(如YouTube和Netflix)的一部分,而且顯著影響用戶需求[6,14]。所以,將推薦內容引導到緩存的內容,能夠顯著提高緩存命中率,即便在緩存容量和數量較少時。less

However, joint caching and recommendation requires collaboration between network operators and Content Providers (CPs). This might be challenging, due to the different scope of these entities, and the constraints of current network protocols and architectures. For example, CPs encrypt traffic (e.g., https) and do not typically share user-related information [11]. 

然而,聯合緩存和推薦須要網絡操做員和內容提供商(CPs)的協做,這是具備挑戰的,由於這些實體的範圍不一樣,以及當前網絡協議和架構的限制。例如,CPs加密數據流,而且不共享用戶相關的信息[11]。

To bridge this gap, we propose an approach that is applicable in today’s networks: the network operator leverages the information made available by the RS, and, based on this, provides independently of the CP high-quality and cache-friendly recommendations that increase the efficiency of MEC. Specifcally, we consider the YouTube service, and design a system/application that (i) obtains video relations from the YouTube API, based on which (ii) it builds extended lists of directly and indirectly related videos, and (iii) carefully steers initial recommendations –and thus user demand– towards cached videos. These operations can take place without any tight collaboration with the CP, thus facilitating the application of joint caching and recommendation approaches by network operators (or other entities), without any need for modifcations in architectures or protocols. 

爲了彌補這一鴻溝,咱們提出一種適用於當前網絡的方法:網絡操做員使用RS公開的信息,並基於此提供獨立於CP的高質量和緩存友好的推薦內容,以提升MEC的效率。具體地,咱們考慮YouTube服務,並設計一種系統/應用來(i)從YouTube API獲取視頻關係,基於此(ii)構建直接相關和間接相關的擴展視頻列表,而且(iii)仔細地引導初始推薦(和用戶需求)到緩存的視頻。這些操做無需與CP的任何緊密協做,所以促進聯合緩存和推薦方法的應用,而不須要修改協議和架構。

Our contributions are summarized as follows:

咱們的貢獻總結以下: 

  • We propose an approach that enables joint caching and recommendation, without requiring collaboration between network operators and CPs (Sec. 2). 
  • 咱們提出一種聯合緩存和推薦方法,而不須要網絡操做員和CPs之間的協做(第2部分)。
  • We design an algorithm (named CABaRet) that leverages available information provided by a RS, and returns cache-aware recommendations (Sec. 3) .
  • 咱們設計了一種算法(稱爲CABaRet),使用RS提供的公開信息,返回緩存感知的推薦(第3部分)。
  • We perform extensive measurements over the YouTube service. Our results show that signifcant caching gains can be achieved in practice; even in conservative scenarios, our approach increases the cache hit ratio by a factor of 8 to 10 (Sec. 4) .
  • 咱們在YouTube服務上進行深刻的測量,結果代表:能夠取得顯著的緩存增益;即便在保守情形下,咱們的方法能夠將緩存命中率增長8到10倍(第4部分)。
  • We analytically study the problem of caching optimization under recommendations from CABaRet, and propose an approximation algorithm. We show that when caching is controlled by the network operator, an extra 2 times increase in the cache hit ratio can be achieved (Sec. 5). 
  • 咱們分析研究了CABaRet推薦下的緩存優化問題,並提出一種近似算法。咱們指出,當由網絡操做員控制緩存時,能夠取得額外的2倍的緩存命中率的增長。

Finally, while in this paper we focus on the YouTube service, which provides a public API, our approach can be extended to other video/radio services (e.g., Neflix, Vimeo, Spotify, Pandora) as well, e.g., using offline crawling processes (in case APIs are not available) for discovering content relations. 

最後,儘管本文咱們關注於YouTube服務(提供公開API),咱們的方法能夠擴展應用於其它視頻/音頻服務(如Netflix、Vimeo、Spotify和Pandora);好比,使用離線爬蟲進程(API不可用的情形下)發現內容關係。

 2 SYSTEM OVERVIEW (系統概覽)

The proposed approach can be implemented in a lightweight system/application that runs on mobile devices, and is triggered either by the network operator or by the user. The system is composed of the user interface (UI), the back-end, and the recommendation module, as depicted in Fig. 1.

本文提出的方法能夠實現爲運行在移動設備的輕量級系統/應用,並由網絡操做者和用戶觸發。該系統由用戶接口(UI)、後端和推薦模塊組成,如圖1所示。

圖1:系統概覽。

User Interface (UI). The UI resembles the original content service UI. For instance, in the YouTube case, the UI contains a search bar, a video player, and a list of related videos. The users search, browse, and watch videos through the UI.

用戶接口(UI)。用戶接口相似於原始的內容服務UI。以YouTube爲例,UI包含搜索條,視頻播放器,和相關視頻列表。用戶經過UI搜索,瀏覽並觀看視頻。 

Back-end. The back-end is responsible for (i) retrieving the list of cached video IDs (e.g., in the form of a text file), and (ii) streaming videos to the UI. Depending on the scenario, the list of cached video IDs can be already known to the network operator, e.g., in the case of network-controlled caching. Alternatively, they can be requested from the content provider directly, or discovered through offline network measurements (e.g., latency [9], or DNS resolution [1]) by the network operator. The video requested by the user is delivered/streamed from the CP’s (e.g., YouTube’s) origin server, or the CP’s cache, or an edge cache. In case the caches are controlled by the content provider (which is the most prevalent scenario today), the user-service communication can be encrypted (e.g., https requests directly to YouTube) and remain transparent to the network operator. 

後端。後端負責(i)獲取已緩存的視頻ID(例如,以文本文件的形式),而且(ii)將視頻流式傳播到UI。根據場景,已緩存視頻ID的列表能夠是網絡操做員已知的(如,在網絡控制緩存的情形)。可選地,該列表能夠是直接由服務提供商獲取的,或者網絡操做員經過離線網絡測量(如,延遲[9],或DNS解析[1])發現的。用戶請求的視頻能夠由CP(如YouTube)的原始服務器、CP的緩存或者邊緣緩存提供。在緩存被內容提供商控制的情形下(當前最流行的情形),用戶到服務的通訊可能被加密(例如,直接到YouTube的https請求),而且對網絡操做員透明。

Recommendation Module. The recommendation module is triggered upon each content request, and (i) receives as input the video v that the user currently watches, (ii) retrieves from the YouTube API a list of video IDs directly/indirectly related to v, (iii) extracts from the back-end the list of cached video IDs, and (iv) builds a list of related and cached video IDs and recommends it to the user, according to the cache-aware recommendation algorithm of Sec. 3. This recommendation process is lightweight and can return the list of recommendations very fast (e.g., ∼1sec. in our prototype), without affecting the user experience. 

推薦模塊。推薦模塊由每次內容請求觸發,而且(i)接收用戶正在觀看的視頻v做爲輸入,(ii)經過YouTube API獲取與v直接相關/間接相關的視頻ID列表,(iii)由後端獲取已緩存的視頻ID列表,(iv)構建相關的和已緩存的視頻ID列表,並將其推薦給用戶(根據第3部分介紹的緩存感知的推薦算法)。推薦過程是輕量級的,而且能夠很是快速地返回推薦列表(例如,小於1秒),而不影響用戶體驗。

3 CACHE-AWARE RECOMMENDATIONS (緩存感知的推薦)

In existing approaches, e.g., [2, 4, 13], the 「most related」 contents that are also cached, are recommended to users. However, this requires the system to be aware of the content relations (e.g., similarity scores, user preferences/history, trending videos), i.e., information owned by the content provider. Such data are unlikely to be disclosed to third parties, due to privacy and/or economic reasons (e.g., advertising). 

現有方法中(如[2,4,13]),最相關的且緩存的內容被推薦給用戶。然而,這要求系統可以感知內容的相關性(如,類似度分數,用戶偏好/歷史,趨勢視頻),即內容提供商擁有的信息。由於隱私和/或經濟緣由(如廣告),這類數據不太可能對第三方公開。

In our approach, the system leverages information about content relations that is made publicly available by the RS of the content service (i.e., YouTube in this paper). In particular, when a user watches a video v, the system requests from the YouTube API a list of video IDs L related to v, i.e., the videos that YouTube would recommend to the user. Then it requests the related video IDs for every video in L and adds them in the end of L, and so on, in a Breadth-First Search (BFS) manner. In the end of the process, the list L contains IDs of videos directly and indirectly related to v, from which the top N cached and/or highly related to v videos are fnally recommended to the user. The list L is (i) much larger than the list of videos recommended by YouTube, and thus it is more probable to contain cached videos that are related to v, and (ii) built based on video relations provided by YouTube itself, which satisfes a high quality of recommendations. 

咱們的方法中,系統使用關於內容相關的信息(這些信息由內容服務的RS公開,如本文采用的YouTube)。特別地,當用戶觀看視頻v時,系統經過YouTube API請求與v相關的視頻ID列表L,即YouTube將推薦給用戶的視頻。而後,系統請求L中每一個視頻的相關視頻IDs,並將其添加到L的尾端;以廣度優先搜索的方式進行。該進程結束時,列表L包含視頻v的直接相關和間接相關視頻IDs;前面N個緩存的和/或與視頻v高度相關的視頻被最終推薦給用戶。列表L(i)比YouTube的推薦視頻列表大得多,所以具備更高的包含視頻v相關的已緩存視頻的可能,且(ii)基於YouTube自身提供的視頻關係構建,知足高質量推薦的目標。

We detail our recommendation algorithm (CABaRet) in Sec. 3.1, and discuss the related design implications in Sec. 3.2. 

咱們在3.1節詳細介紹推薦算法(CABaRet),在3.2節討論相關的設計內涵。

3.1 The Recommendation Algorithm (推薦算法)

Input. The recommendation algorithm receives as input:

輸入。推薦算法接收以下輸入信息:

  • v: the video ID (or URL) which is currently watched
  • v:正在觀看的視頻ID(或URL)
  • N : the number of videos to be recommended
  • N:推薦視頻的數量
  • C: the list with the IDs of the cached videos
  • C:已緩存視頻的ID列表
  • DBFS: the depth to which the BFS proceeds
  • DBFS:BFS處理的深度
  • WBFS: the number of related videos that are requested per content from the YouTube API (i.e., the 「width」 of BFS) 
  • WBFS:每次由YouTube API請求的相關視頻的數量(即,BFS的寬度)

Output. The recommendation algorithm returns as output:

輸出。推薦算法的輸出爲:

  • R: ordered list of N video IDs to be recommended.
  • R:推薦的N個視頻ID的有序列表

Workflow. CABaRet searches for videos related to video v in a BFS manner as follows (line 1 in Algorithm 1). Initially, it requests the WBFS videos related to v, and adds them to a list L in the order they are returned from the YouTube API. For each video in L, it further requests WBFS related videos, as shown in Fig. 2, and adds them in the end of L. It proceeds similarly for the newly added videos, until the depth DBFS is reached; e.g., if DBFS = 2, then L contains WBFS video IDs related to v, and WBFS · WBFS video IDs related to the related videos of v. 

工做流。CABaRet以以下方式搜索與視頻v相關的視頻(算法1的行1)。初始時,請求與視頻v相關的WBFS個視頻,並將它們添加到列表L(按照YouTube API返回的順序)。對於L中的每個視頻,進一步請求WBFS個相關視頻,如圖2所示,並將他們添加到L的尾端。對於新添加的視頻也作相似處理,直到達到深度DBFS;例如,若是DBFS=2,那麼L包含WBFS個和v相關的視頻ID,以及WBFSxWBFS個與視頻v相關視頻的相關視頻。

圖2:CABaRet:例子中DBFS=2,WBFS=3,N=6.已緩存的視頻使用黑色表述。

Then, CABaRet searches for video IDs in L that are also included in the list of cached videos C and adds them to the list of video IDs to be recommended R, until all IDs in L are explored or the list R contains N video IDs, whichever comes frst (lines 4–9). If after this step, R contains less than N video IDs, N - |R| video IDs from the head of the list L are added to R; these IDs correspond to the top N - |R| non-cached videos that are directly related to video v (lines 10–15). 

而後,CABaRet搜索存在於L中和已緩存視頻列表C中的視頻ID,並將它們添加到視頻ID列表R(待推薦列表),直到L中的全部ID均被探索,或者列表R中包含N個視頻IDs(行4-9)。若是此驟以後,R中包含少於N個視頻ID,列表L中的前N-|R|個視頻ID被添加到列表R中;這些IDs是與視頻v直接相關的但未被緩存的前N-|R|個視頻ID(行10-15)。

3.2 Implications and Design Choices (內涵和設計選擇)

High-quality recommendations. Using the YouTube recommendations ensures strong relations between videos that are directly related to v (i.e., BFS at depth 1). Moreover, while the YouTube RS finds hundreds of videos highly related to v, only a subset of them are recommended to the user [3]. The rationale behind CABaRet is to explore the related videos that are not communicated to the user. To this end, based on the fact that related videos are similar and have high probability of sharing recommendations (i.e., if video a is related to b, and b to c, then it is probable that c relates to a), CABaRet tries to infer these latent video relations through BFS. Hence, videos found by BFS in depths > 1 are also (indirectly) related to v and probably good recommendations as well. 

高質量推薦。使用YouTube推薦內容確保與v相關的視頻之間的強相關(即,深度爲1的BFS)。此外,儘管YouTube RS查找到數百個與v高度相關的視頻,只有一部分被推薦給用戶[3]。CABaRet背後的合理性是探索不與用戶通訊的相關視頻。爲此,基於事實:相關視頻是類似的而且具備很高的共享推薦內容的機率(即,若是視頻a與b相關,且b與c相關,那麼c極可能與a也相關),CABaRet試圖使用BFS推測潛在的視頻相關性。所以,BFS中深度大於1的視頻與v也是相關的(間接相關)而且多是好的推薦。

To further support the above claim, we collect and analyze datasets of related YouTube videos. Specifcally, we consider the set of most popular videos, let P, in a region, and for each v ∈ P we perform BFS by requesting the list of related videos (similarly to line 1 in CABaRet). We use as parameters WBFS = {10, 20, 50} and DBFS = 2, i.e., considering the directly related videos (depth 1) and indirectly related videos with depth 2. We denote as R1(v) and R2(v), the set of videos found at the frst and second depth of the BFS, respectively. We calculate the fraction of the videos in R1(v) that are also contained in R2(v), i.e., I(v) = |R1(v)∩R2(v)| |R1(v)| . High values of I(v) indicate a strong similarity of the initial content v with the set of indirectly related contents at depth 2.

爲了進一步支持上述表述,咱們收集並分析了YouTube相關視頻的數據集。具體地,咱們考慮某一區域最流行的視頻集合P,且對於每一個v ∈ P咱們執行BFS(經過請求相關視頻列表)。咱們使用WBFS = {10, 20, 50} 和DBFS = 2做爲參數,即考慮直接相關視頻(深度爲1)和深度爲2的間接相關視頻。咱們以R1(v) and R2(v)表示BFS中深度爲1和深度爲2的視頻集合。咱們計算在集合R1(v) 和R2(v)中的視頻的比率,即 I(v) = |R1(v)∩R2(v)| /|R1(v)|。較高的I(v)值代表初始內容v和深度爲2的間接相關的視頻集合高度類似。

Table 1 shows the median values of I(v), over the |P| = 50 most popular contents in the region of Greece (GR), for different BFS widths. As it can be seen, I(v) is very high for most of the initial videos v. For larger values of WBFS , I(v) increases, and when we fully exploit the YouTube API capability, i.e., for WBFS =50, which is the maximum number of related videos returned by the YouTube API, the median value of I(v) becomes larger than 0.9. Finally, we measured the I(v) in other regions as well, and observed that even in large (size/population) regions, the I(v) values remain high, e.g., in the United States (US) region, I(v)=0.8 for WBFS =50. 

表1給出希臘(GR)地區|P| = 50個最流行內容的I(v)中間值(對不一樣的BFS寬度)。對於大多數初始視頻v來講,I(v)的值很高。對於較大的WBFS值,I(v)增長;當咱們徹底利用了YouTube API的能力,即對於WBFS=50(YouTube API返回的相關視頻的最大值),I(v)的中間值大於0.9。最後,咱們測量了其它區域的I(v),發現即便在大(尺寸/人口密度)地區,I(v)的值依然很高,例如在美國(US)地區,I(v)=0.8(對於WBFS=50)。

Tuning CABaRet. The parameters DBFS ,WBFS can be tuned to achieve a desired performance, e.g., in terms of probability of recommending a cached or highly related video. 

調整CABaRet。能夠調整參數DBFS和WBFS以取得指望的性能,例如,推薦已緩存或高度相關視頻的機率。

For large DBFS , the similarity between v and the videos at the end of the list L is expected to weaken, while for small DBFS the list L is shorter and it is less probable that a cached content is contained in it. Hence, the parameter DBFS can be used to achieve a trade-off between quality of recommendations (small DBFS ) and probability of recommending a cached video (large DBFS). The number of related videos requested per content WBFS, can be interpreted similarly to DBFS. A small WBFS leads to considering only top recommendations per video, while a large WBFS leads to a larger list L. 

對於大的DBFS,v和L末端視頻的類似度較低;對於小的DBFS,列表L較短,且其包含已緩存內容的機率較小。所以,參數DBFS用於取得推薦質量(小的DBFS)和推薦已緩存視頻(大的DBFS)可能性之間的權衡。每次請求的相關視頻的數量WBFS的解釋相似於DBFS。小的WBFS致使只考慮每一個視頻的頂部推薦,而大的WBFS致使大的列表L。

Remark: YouTube imposes quotas on the API requests per application per day, which prevents API users from setting the parameters WBFS and DBFS to arbitrarily large values. 

說明:YouTube限制每一個應用天天API請求的數量,這限制API用戶設置較大的WBFS和DBFS的值爲任意大的值。

In practice, CABaRet can be fine-tuned through experimentation with real users, e.g., A/B testing iterations, which is a common approach for tuning recommendation systems [3] .

實踐中,CABaRet能夠經過真實用戶的實驗精調,例如A/B測試迭代(這是推薦系統中通用的調整策略[3])。

4 MEASUREMENTS AND EVALUATION (測量和評估)

We conduct measurements and experiments over the YouTube service, to investigate the performance (in terms of cache hit ratios) of our approach in MEC scenarios. The setup of the scenarios is presented in Sec. 4.1, and the results in Sec. 4.2. 

咱們在YouTube服務上進行測量和實驗,以研究MEC場景下咱們方法的性能(緩存命中率)。4.1節介紹場景設置,4.2節給出結果。

4.1 Setup (設置)

The YouTube API provides a number of functions to retrieve information about videos, channels, user ratings, etc. In our measurements, we request the following information: 

YouTube API提供了一系列函數用來查詢視頻、通道和用戶打分等信息。實驗中,咱們請求以下信息:

  • the most popular videos in a region (max. 50)
  • 某一地區最流行的視頻(最大50)
  • the list of related videos (max. 50) for a given video
  • 某個給定視頻的相關視頻列表(最大50)

Remark: In the remainder, we present results collected during March 2018, for the region of Greece (GR). Nevertheless, our insights hold also in the other regions we tested (e.g., US).

說明:本文後續,咱們收集了希臘(GR)地區2018年3月份的結果。然而,在咱們測試的其它地區,咱們的觀察仍然成立(如美國US)。

Caching. We assume a MEC cache storing the most popular videos in a region. We populate the list of cached contents with the top C video IDs returned from the YouTube API. 

緩存。咱們假設一個存儲某一地區最流行視頻的MEC。咱們使用YouTube API返回的頂部C個視頻ID填充緩存內容列表。

Recommendations. We consider two classes of scenarios with (i) YouTube and (ii) CABaRet recommendations. In both cases, when a user enters the UI, the 50 most popular videos in her region are recommended to her (as in YouTube’s front page). Upon watching a video v, a list of N = 20 videos is recommended to the user; the list is (i) composed of the top N directly related videos returned from the YouTube API (YouTube scenarios), or (ii) generated by CABaRet with parameters N, WBFS and DBFS (CABaRet scenarios). 

推薦內容。咱們考慮兩種場景:(i)使用YouTube推薦,(ii)使用CABaRet推薦。兩種情形下,當用戶進入UI,用戶所在地區的50個最流行視頻推薦給用戶(如YouTube的首頁)。一旦觀看視頻v,N=20的列表被推薦給用戶;該列表是(i)YouTube API返回的前N個直接相關視頻(YouTube場景),或者(ii)使用參數N、WBFS和DBFS的CABaRet產生(CABaRet場景)。

Video Demand. In each experiment, we assume a user that enters the UI and watches one of the initially recommended (i.e., 50 most popular) videos at random. Then, the system recommends a list of N videos (r1,r2, ...,rN ), and the user selects with probability pi to watch ri next. We set the probabilities pi to depend on the order of appearance –and not the content– and consider uniform (pi = N1 ) and Zipf (pi ∼ i1α ) scenarios; the higher the exponent α of the Zipf distribution, the more preference is given by the user to the top recommendations (user preference to top recommendations has been observed in YouTube traffic [9]). 

視頻需求。每一個實驗中,咱們假設用戶進入UI,並隨機觀看某個初始推薦視頻(即,50個最流行視頻)。而後,系統推薦包含N個視頻的列表 (r1,r2, ...,rN ),用戶以機率pi選擇觀看ri。咱們根據出現的位置肯定pi的機率,而不是內容;咱們考慮平均分佈(pi=1/N)和Zipf分佈(pi∼ 1/iα)場景;指數α越高,用戶越偏心頂部推薦(YouTube數據流[9]中已經觀察到用戶偏心頂部推薦)。

4.2 Results (結果)

4.2.1 Single Requests (單一請求)

We frst consider scenarios of single requests (similarly to [2, 13]). In each experiment i (i = 1, ..., M) a user watches one of the top popular videos, let v1(i), and then follows a recommendation and watches a video v2(i). We measure the Cache Hit Ratio (CHR), which we define as the fraction of the second requests of a user that are for a cached video (since the frst request is always for a cached–top popular– video): 

咱們首先考慮單一請求的場景(相似於[2,13])。每次實驗i中(i = 1, ..., M) ,用戶觀看某個流行視頻v1(i),而後觀看某個推薦視頻v2(i)。咱們測量緩存命中率(CHR),其定義爲用戶第二次請求的視頻是已緩存視頻的比例(由於第一個請求老是緩存的流行視頻):

這裏,Iv2(i)=1,若是v2(i)∈ C;不然,其值爲0;M爲實驗的次數。

CHR vs. BFS parameters. Fig. 3 shows the CHR achieved by CABaRet under various parameters, along with the CHR under regular YouTube recommendations, when caching all the most popular videos (|C|=50). The efficiency of caching signifcantly increases with CABaRet, even when only directly related contents are recommended (DBFS=1), i.e., without loss in recommendation quality. Just reordering the list of YouTube recommendations (as suggested in [9]), brings gains when pi is not uniformly distributed. However, the added gains by our approach are signifcantly higher. As expected, the CHR increases for larger WBFS and/or DBFS; e.g., CABaRet for WBF S=50 and DBF S=2, achieves 8-10 times higher CHR than regular YouTube recommendations. Also, the CHR increases for more skewed pi distributions, since top recommendations are preferred and CABaRet places cached contents at the top of the recommendation list. 

CHR和BFS參數。圖3給出不一樣參數下CABaRet取得的CHR,以及常規YouTube推薦下的CHR(緩存全部的最流行視頻,即|C|=50)。CABaRet中緩存效率顯著提高,即便在只推薦直接相關內容是(DBFS=1),即不損失推薦質量。僅僅重排序YouTube推薦列表(如[9]所暗示的),在pi不是均爲分佈時也能夠再帶增益。然而,咱們的方法帶來的增益更加顯著。正如指望的,對於更大的WBFS和/或DBFS,CHR增長;例如,WBFS=50和DBFS=2的CABaRet比常規YouTube推薦取得8-10倍的更高CHR。同時,對於更加偏斜的pi分佈,CHR增長,這是由於偏心頂部推薦,且CABaRet將緩存的內容放置在推薦列表的頂部。

圖3:不一樣BFS參數下的CHR。

In experiments concerning the –larger– US region, the CHR values are lower for both regular YouTube (< 0.5%) and CABaRet (1% - 43%) recommendations, due to the fact that the top popular videos appear with lower frequency in the related lists. However, the relative gains from CABaRet are consistent with (or even higher than) the presented results. 

在US地區(更大),實驗中CHR的值對常規YouTube(小於0.5%)和CABaRet(1%-43%)推薦都較低,由於流行視頻出如今相關列表的頻度較低。然而,CABaRet的相對收益與呈現的結果一致(甚至更高)。

CHR vs. number of cached videos. We further consider scenarios with varying number of cached contents C = |C|. In each scenario, we assume that the C most popular contents are cached. Fig. 4 shows the CHR achieved by CABaRet, in comparison to scenarios under regular YouTube recommendations. The results are consistent for all considered values of C; the CHR under CABaRet is signifcantly higher than in the YouTube case. Moreover, even when caching a small subset of the most popular videos, CABaRet brings signifcant gains. E.g., by caching C = 10 out of the 50 top related contents CABaRet increases the CHR from 2% and 3.2% to 17% and 50%, for the uniform and Zipf(α=1) scenarios, respectively. 

CHR和已緩存視頻的數量。咱們進一步考慮已緩存內容數量C = |C|變化的場景。在每一個場景中,咱們假設C個最流行內容被緩存。圖4給出CABaRet取得的CHR,與常規YouTube推薦對比。對於全部的C值,結果是一致的;CABaRet情形下的CHR顯著高於YouTube情形。此外,即便只緩存一小部分最流行視頻,CABaRet帶來顯著收益。例如,經過緩存50個相關內容中的C=10個,CABaRet增長了CHR的值,由2%和3.2%增長到17%和50%(分別在均勻分佈和Zipf分佈(α=1)情景下)。

圖4: CHR和緩存的內容C(WBFS=50,DBFS=2)

4.2.2 Sequential Requests (順序請求)

We now test the performance of our approach in scenarios where users enter the system and watch a sequence of K, K > 2, videos (similarly to [4], and in contrast to the previous case, where they watch only two videos, i.e., K = 2). At each step, the system recommends a list of videos to the user by applying CABaRet on the currently watched video. We denote as vk (i) the kth video requested/watched by a user in experiment i. We measure the CHR, which is now defned as

如今,咱們測試以下場景下咱們方法的性能:用戶進入系統,並順序觀看K(K>2)個視頻(相似於[4],與前面的情形相比,它們只觀看兩個視頻,即K=2)。在每一步,系統根據當前觀看的視頻使用CABaRet給用戶推薦視頻列表。咱們以vk (i) 表示在i次實驗中用戶請求/觀看的第k個視頻。咱們測量CHR,定義以下:

Ivk (i)∈C = 1,若是 vk (i) ∈ C ,不然其值爲0。在每一個場景下,進行M=100次試驗。

Moving 「farther」 from the initially requested video (which belongs to the list of most popular and cached videos) through a sequence of requests, we expect the CHR to decrease, due to lower similarity of the requested and cached videos. However, as Fig. 5 shows, the decrease in the CHR (under CABaRet recommendations) is not large. The CHR remains close to the case of single requests (i.e., for K=2 in the x-axis), indicating that our approach performs well even when we are several steps far from the cached videos. In fact, caching more than the top most popular videos appearing on the front page, would further reduce the CHR decrease. 

經過順序請求,從初始請求視頻(屬於最流行視頻和已緩存視頻列表)移動的越遠,咱們指望CHR下降,由於請求的和緩存的視頻的低類似性。然而,如圖5所示,CHR的減小值不大(在CABaRet推薦下)。CHR仍然和單一請求情景下接近(即,x軸上的K=2),代表咱們的方法表現良好,即便咱們和已緩存視頻有多步遠。事實上,緩存多於首頁顯示的最流行視頻能夠進一步減小CHR的減小。

圖5: CHR和順序請求數量K對比(C=20,WBFS=20,DBFS=2)。注意:y軸高達40%。

5 CACHING OPTIMIZATION (緩存優化)

In this section, we extend our study by considering the scenario where CABaRet and the caches are controlled by the same entity, e.g., the network operator. Network operator-controlled caching is the most commonly considered scenario in related work for MEC; although in most of today’s architectures the caches are actually controlled by the CP. 

本節,咱們經過考慮以下場景擴展咱們的研究:CABaRet和緩存由同一實體控制,如網絡操做者。網絡操做者控制的緩存是MEC相關工做考慮的最通用場景;即便在今天的架構下,緩存由CP控制。

In this scenario, the network operator can optimize caching decisions, thus further increasing the efficiency of CABaRet recommendations. Note that still there is no need for collaboration between the operator and the CP (e.g., possessing full knowledge of the RS), assumed in previous works [2, 4, 13]. 

這種場景下,網絡操做者能夠優化緩存決策,以進一步增長CABaRet推薦的效率。注意,操做者和CP之間沒有協做的必要(例如,擁有RS的全部知識),如前期工做[2,4,13]中假設的。

In the following, we first analytically formulate and study the problem of optimizing the caching policy, and propose an approximation algorithm with provable performance guarantees (Sec. 5.1). We then evaluate the performance of this joint caching and recommendation approach (Sec. 5.2). 

下面,咱們首先研究緩存策略優化問題,並提出一種近似算法(具備性能保證的,見5.1節)。咱們而後評估這種聯合緩存和推薦方法的性能(5.2節)。

5.1 Optimization Problem & Algorithm (優化問題和算法)

Let a content catalog V, V = |V|, and a content popularity vector q = [q1, ...,qV ]T . Let L(v) ⊆ V be the set of contents that are explored by CABaRet (at line 1) for a content v ∈ V, and denote L = Uv ∈V L(v). 

V表示內容目錄,V = |V|表示內容目錄的大小,以q= [q1, ...,qV ]T表示內容流行度向量。以L(v) ⊆ V表示(對v∈ V)CABaRet探索的內容集合(行1),定義L= Uv ∈V L(v) 。

For some set of cached contents C ⊆ V, and a content v, CABaRet returns a list of recommendations R(v) (|R(v)| = N), in which at most N contents c ∈ C ∩ L(v) appear at the top of the list. Therefore, CHR can be expressed as:

對於某個緩存的內容集合C ⊆ V,和內容v,CABaRet返回推薦列表R(v) (|R(v)| = N),其中最多N個內容c ∈ C ∩ L(v) 出如今列表頂部。所以,CHR定義以下:

where N (C,v) = min{|C ∩ L(v)|, N }, and pi is the probability for a user to select the ith recommended content. 

這裏,N (C,v) = min{|C ∩ L(v)|, N },pi表示用戶選擇第i個推薦內容的機率。

Then, the problem of optimizing the caching policy (to be jointly used with CABaRet), is formulated as follows: 

那麼,緩存策略優化問題(和CABaRet聯合使用)能夠形式化表示以下:

whereC is the capacity of the –MEC– cache. We prove the following for the optimization problem of Eq. (2). 

這裏C是MEC緩存的容量。咱們接下來證實公式(2)的優化問題。

Lemma 1. The optimization problem of Eq. (2): (i) is NP-hard, (ii) cannot be approximated within 1 - e1 + o(1) in polynomial time, and (iii) has a monotone (non-decreasing) submodular objective function, and is subject to a cardinality constraint. 

定理1:公式(2)的優化問題:(i)是NP難的,(ii)在1 - 1/e + o(1)多項式時間內沒法近似,和(iii)具備單調(非減)目標函數,而且受限於基數限制。

Proof. Items (i) and (ii) of the above lemma, are proven by reduction to the maximum set coverage problem, and we prove item (iii) using standard methods (see, e.g., [5, 13]). The detailed proof is omitted due to space limitations. 

證實。上述定理中的項(i)和(ii)經過約簡爲最大集合覆蓋問題證實,咱們使用標準方法證實項(iii)(見[5,13])。因爲篇幅所限,省略詳細的證實。

If we design a greedy algorithm that starts from an empty set of cached contents Cд = ∅, and at each iteration it augments the set Cд (until |Cд | = C) as follows: 

若是咱們設計一個貪心算法,該算法由空的緩存內容集合Cд = ∅開始,在每一次迭代中,咱們以下加強結合Cд,知道|Cд|=C:

then the properties stated in item (iii) satisfy that it holds [8]
那麼,項(iii)中闡述的屬性知足他所保持的[8]。

where C the optimal solution of the problem of Eq. (2).

這裏, C是公式(2)中問題的最優解。

Remark: While Eq. (4) gives a lower bound for the performance of the greedy algorithm, in practice greedy algorithms have been shown to perform often very close to the optimal. 

說明:儘管公式(4)給出貪心算法性能的下界,實踐中貪心算法被證實經常很是接近最優解。

5.2 Evaluation of Greedy Caching (貪心緩存評估)

Calculating the CHR from Eq. (1) requires running a BFS (CABaRet, line 1) and generating the lists L(v), for every content v ∈ V. In practice, for scalability reasons, the most popular contents (i.e., with high qi) can be considered by the greedy algorithm in the calculation of the objective function Eq. (1), since those contribute more to the objective function. However, any video in the catalog is still candidate to be cached, e.g., a video with low qi can bring a large increase in the CHR through its association with many popular contents. 

由公式(1)計算CHR須要運行BFS(CABaRet,行1)併爲每一個內容v∈ V產生列表L(v)。實踐中,爲了可擴展,最流行內容(即高qi)可有由貪心算法在計算公式(1)中目標函數時考慮,由於那些內容對目標函數的貢獻更多。然而,目錄中的任意視頻仍然是被緩存的候選者,例如,低qi的視頻能夠帶來CHR的大的增長(經過其與許多流行內容的關聯)。

In fact, in our experiments, for the calculation of Eq. (1), we consider only the 50 most popular videos, for which we set qi = 1/50. Nevertheless, in the different scenarios we tested, only 10% to 30% of the cached videos (selected by the greedy algorithm) were also in the top 50 most popular. 

事實上,咱們的實驗中,爲了計算公式(1),咱們只考慮50個最流行視頻,而且設置qi=1/50。然而,在咱們測試的不一樣場景下,只有10%到30%的已緩存視頻(貪心算法選擇)同時在50個最流行視頻中。

In Fig. 6, we compare the achieved CHR when the cache is populated according to the greedy algorithm of Eq. (3) (Greedy Caching), and with the top most popular videos (Top Caching). Greedy caching always outperforms top caching, with an increase in the CHR of around a factor of 2 for uniform video selection (for the Zipf(a=1) scenarios we tested, the CHR values are even higher, and the relative performance is 1.5 times higher). This clearly demonstrates that the gains from joint recommendation and caching [2, 13], are applicable even in simple practical scenarios (e.g., CABaRet & greedy caching). Finally, while greedy caching increases the CHR even with regular YouTube recommendations, the CHR is still less than 50% of the CABaRet case with top caching. This further stresses the benefts from CABaRet’s cache-aware recommendations. 

圖6中,咱們比較了當緩存由公式(3)表示的貪心算法填充(貪心緩存)情形下和最流行視頻(Top緩存)情形下的CHR。貪心緩存老是優於top緩存:對於均勻分佈視頻選擇來講,CHR約增長了2倍(對於Zipf(a=1),CHR的值更高,且相對性能高1.5倍)。這清晰地代表:聯合推薦和緩存[2,13]帶來的增益在簡單場景下也適用(例如,CABaRet和貪心緩存)。最後,儘管使用常規YouTube推薦貪心緩存能夠增長CHR,CHR比使用top緩存的CABaRet低50%。這進一步強調了CABaRet的緩存感知的推薦的好處。

圖6:CHR與已緩存內容的對比(pi均勻分佈,WBFS=50,DBFS=2)

相關文章
相關標籤/搜索