英文原文地址Memory optimization for feeds on Androidhtml
在Java中HashSet只能存放繼承自Objcet的對象,這中狀況下「基本數據類型」轉化爲繼承自Object的(Integer
、Long
等)會產生不少中間Object對象,佔用過多的內存,從而引起垃圾回收,形成系統卡頓。java
Millions of people use Facebook on Android devices every day, scrolling through News Feed, Profile, Events, Pages, and Groups to interact with the people and information they care about. All of these different feed types are powered by a platform created by the Android Feed Platform team, so any optimization we make to the Feed platform could potentially improve performance across our app. We focus on scroll performance, as we want people to have a smooth experience when scrolling through their feeds.android
天天有數百萬的用戶,在Android平臺上使用Facebook,經過滑動 "新聞動態"、"用戶資料"、"Events"、"Pages"和"羣組"動態列表 來與他們關心的人和信息互動。全部這些不一樣的動態類型是由Android動態平臺組提供的,所以咱們對動態列表的任何優化,都有可能提升Fackbook這個app的性能。咱們專一於動態列表滑動時的性能,當用戶滑動動態列表時,咱們但願能夠爲用戶提供一種流暢的用戶體驗。ios
To help us achieve that, we have several automatic tools that run performance tests on the Feed platform, across different scenarios and on different devices, measuring how our code performs in runtime, memory use, frame rate, and more. One of those tools, Traceview, showed a relatively high number of calls to the Long.valueOf() function, which led to objects accumulating in memory and causing the app to stall. This post describes the problem, the potential solutions we weighed, and the set of optimizations we made to improve the Feed platform.算法
爲了幫助咱們實現這一點,咱們使用了幾個自動工具,在不一樣的設備和不一樣的情景下,測試動態列表滑動時候的內存使用狀況、幀率等。這些測試工具中Traceview顯示Long.valueof()
方法的調用次數較多,使內存空閒對象迅速增長,從而影響App的運行速度。在這篇文章中,咱們介紹了問題、潛在解決方案、以及在動態列表上咱們作的優化。數組
After noticing the high number of calls to the Long.valueOf() function in one of our method profiling reports from Traceview, we ran further tests that confirmed that as we scrolled through News Feed, there were an unexpectedly high number of invocations for this method.數據結構
經過分析Traceview的運行報告,在注意到Long.valueof()
方法被屢次調用後,咱們進行了進一步的測試,確認當咱們滑動新聞動態時,這個方法的調用數量出乎意料地高。oracle
When we looked at the stacktrace, we found that the function was not being called explicitly from Facebook's code but implicitly by code inserted by the compiler. This function was called when assigning(分派、設定) a primitive long value where a Long object is expected. Java supports both Object and primitive representation of the simple types (e.g., integer, long character) and provides a way to seamlessly convert between them. This feature is called autoboxing, because it 「boxes」 a primitive type into a corresponding Object type. While that's a convenient development feature, it creates new objects unbeknownst to the developer.app
當咱們查看stacktrace時,咱們發如今Facebook的代碼並無直接調用該方法,而是由編譯器插入的代碼隱式調用的。這個函數在long
轉化爲Long
時被調用。Java提供了「基本數據類型」(例如: int
、long
)與繼承自Object
數據類型(例如: Integer
、Long
)轉化的支持。這個特性被稱爲 autoboxing,它將「基本數據類型」轉化爲相應的對象類型。這是一個方便的開發特性,在開發者無感知的狀況下,爲開發者建立了一個新的對象。框架
And these objects add up.
In this heap dump taken from a sample app, Long
objects have a noticeable presence; while each object is not big by itself, there are so many that they occupy(佔據) a large portion( 部分) of the app's memory in the heap. This is especially problematic(問題的) for devices running the Dalvik runtime. Unlike ART, which is the newer Android runtime environment, Dalvik doesn't have a generational garbage collection, known to be more optimal for handling many small objects. As we scroll through News Feed and the number of objects grows, the garbage collection will cause the app to pause and sweep(打掃) unused objects from the memory. The more objects that accumulate(積攢), the more frequently( 頻繁地) the garbage collector will have to pause the app, causing it to stutter(結巴) and stall and making for a poor user experience.
截圖取自一個示例App,其中Long
有一個明顯的調用,從圖中能夠看出,每一個對象自己並不大,卻佔用了App堆內存的很大一部分空間。這確定是有問題的。不像Android runtime(ART虛擬機)環境,Dalvik虛擬機在處理許多小對象時,並無分代垃圾收集機制。當咱們滾動 「新聞動態」 時,對象的數量迅速增加,垃圾回收機制會致使應用程序暫停並清理沒有引用的對象的內存。對象積累越多,垃圾回收的次數就越頻繁。進行垃圾回收時,Dalvik將暫停應用程序,致使App卡頓,形成一個不好的用戶體驗。
Luckily, tools such as Traceview and Allocation Tracker can help find where these calls are made. After reviewing the origins of these autoboxing occurrences, we found that the majority of them were done while inserting long values into a HashSet<Long> data structure. (We use this data structure to store hash values of News Feed stories, and later to check if a certain story's hash is already in the Set.) HashSet provides quick access to its items, an important feature allowing interaction with it as the user is scrolling through News Feed. Since the hash is calculated and stored into a primitive long variable, and our HashSet works only with objects, we get the inevitable(不可避免的) autoboxing when calling setStories.put(lStoryHash).
幸運的是,Traceview
和Allocation Tracker
這些工具能夠幫助咱們查找這些調用。回顧這些autoboxing
轉化,咱們發現大部分的autoboxing
轉化發生在long
插入到HashSet<Long>
時。(咱們用HashSet<Long>
這個數據結構存儲「新聞動態」數據,並經過hash查找某一個「新聞動態」數據是否在此數據結構中)。當用戶滑動「新聞動態」列表時,HashSet提供快速快速的數據查找能力。HashSet
只能存儲繼承自Object的對象,並不能存儲「基本數據類型」,而咱們用於計算和存儲的對象是一個long
型的變量,當進行setStories.put(lStoryHash)
操做時,不可避免的發生long
到Long
的轉化。
As a solution, a Set
implementation( 實現) for primitive types can be used, but that turned out not to be as straightforward(簡單的) as we expected.
其中一個解決方案,能夠實現一種支持"「基本數據類型」數據類型"的集合,但這種解決方案並不像咱們預期的那麼簡單。
There are a few existing Java libraries that provide a Set
implementation for primitive types. Almost all of these libraries were created more than 10 years ago, when the only Java running on mobile devices was J2ME. So, to determine viability, we needed to test them under Dalvik/ART and ensure they could perform well on more constrained mobile devices. We created a small testing framework to help compare these libraries with the existing HashSet
and with one another. The results showed that a few of these libraries had a faster runtime than HashSet
, and with fewer Long objects, but they still internally allocated a lot of objects. As an example, TLongHashSet
, part of the Trove library, allocated about 2 MB worth of objects when tested with 1,000 items:
有一些現有的Java庫,爲"「基本數據類型」提供Set
實現。幾乎全部這些庫建立十多年前,當時在移動設備上使用Java的設備只有J2ME。所以,爲了肯定是否可行,咱們須要在Dalvik/ART
上進行測試,確保他們能夠運行在更多的移動設備上。咱們構建了一個小測試框架來,來實現這些現有庫與HashSet
進行比較。結果代表,其中一些庫相比於HashSet
運行速度更快,建立的Long
對象更少,但他們仍然在內存中分配了大量的對象。例如,Trove庫中的TLongHashSet
,1000項Items分配了約2 MB內存控件的Object對象。
Testing other libraries, such as PCJ and Colt, showed similar results.
It was possible that the existing solutions didn't fit our needs. We considered whether we could instead create a new Set implementation and optimize it for Android. Looking inside Java's HashSet, there's a relatively simple implementation using a single HashMap to do the heavy lifting.
現有的解決方案不符合咱們的須要。咱們考慮是否能夠建立一個對Android優化的Set
實現。查看Java HashSet
源碼實現,發現其實現相對簡單,內部採用HashMap
來實現。
public class HashSet<E> extends AbstractSet<E> implements Set<E>, ... { transient HashMap<E, HashSet<E>> backingMap; ... @Override public boolean add(E object) { return backingMap.put(object, this) == null; } @Override public boolean contains(Object object) { return backingMap.containsKey(object); } ... }
Adding a new item to the HashSet
means adding it to the internal HashMap
where the object is the key and the HashSet
's instance is the value. To check object membership, HashSet
checks whether its internal HashMap
contains the object as a key. An alternative to HashSet
could be implemented using an Android-optimized map and the same principles.
添加一個Item到HashSet
意味着將此Item添加到HashMap
中,在HashMap中,該Item對象作爲Key存在;在HashSet
中查找該Item對象時,實際是在對比HashMap
中的Key中查找是否存在該Item對象。在Android平臺上的數據交換一樣使用的這套準則。
LongArraySet
You may already be familiar with LongSparseArray
, a class in the Android Support Library that acts as a map using a primitive long
as a key. Example usage:
您可能已經熟悉LongSparseArray
,其存在於Android的Support Library
庫中,作爲一個Map
實現,其key爲long
型數據變量。使用示例:
LongSparseArray<String> longSparseArray = new LongSparseArray<>(); longSparseArray.put(3L, "Data"); String data = longSparseArray.get(3L); // the value of data is "Data"
LongSparseArray
works differently than HashMap
, though. When calling mapHashmap.get(KEY5)
, this is how the value is found in the HashMap
:
LongSparseArray
與HashMap
工做原理不一樣。當調用mapHashmap.get(KEY5)
時,如下爲HashMap
查找原理:
When retrieving a value using a key on a HashMap, it's accessing the value in the array using a hash of the key as the index, a direct access in O(1)
. Making the same call on a LongSparseArray looks like this:
在HashMap
中檢索一個值時,經過其對應key的hash值進行比較,直接檢索時時間複雜度爲O(1)
。LongSparseArray
作一樣操做時:
LongSparseArray
searches a sorted keys array for the key's value using a binary search, an operation with runtime of O(log N)
. The index of the key in the array is later used to find the value in the values array.
LongSparseArray
搜索某一個key時,採用二分查找算法查找存放key的數組,時間複雜度O(log N)
。查找到的key的index用於在Value數組中找到對應的value。
HashMap
allocates one big array in an effort to avoid collisions, which are causing the search to be slower. LongSparseArray
allocates two small arrays, making its memory footprint smaller. But to support its search algorithm, LongSparseArray
needs to allocate its internal arrays in consecutive memory blocks. Adding more items will require allocating new arrays when there's no more space in the current ones. The way LongSparseArray
works makes it less ideal when holding more than 1,000 items, where these differences have a more significant impact on performance. (You can learn more about LongSparseArray
in the official documentation and by watching this short video by Google.)
爲了不數據衝突HashMap
建立了一個很大的數組,從而致使搜索速度較慢。LongSparseArray
分配兩個較小的數組,使其內存佔用較小。可是爲了支持其搜索算法,LongSparseArray
須要分配連續的內存塊。添加更多的Item時,當數組空間不足時,須要分配新的數組空間。LongSparseArray
的工做方式,使得Item數量超過1000項時,其表現不是很理想,這些差別對性能有更重大的影響。(想了解更多關於LongSparseArray
的內容,能夠閱讀official documentation和谷歌提供的short video ) 。
Since the LongSparseArray'
s keys are of a primitive long
type, we can create a data structure with the same approach as HashSet
but with a LongSparseArray
as the internal map instead of HashMap
.
因爲LongSparseArray
使用「基本數據類型」long
作爲key,咱們能夠用LongSparseArray
替換HashSet
中的HashMap
,從而建立一個與HashSet
類似的數據結構。
And so LongArraySet
was created.
所以LongArraySet
被建立出來了。
The new data structure looked promising, but the first rule of optimization is 「always measure.」 By using the same testing framework from earlier, we compared the new data structure with HashSet
. Each data structure was tested by adding X number of items, checking the existence of each item, and later removing all of them. We ran tests using different numbers of items (X=10, X=100, X=1,000 ...) and averaged the time it took to complete each operation per item.
新的數據結構看起來是頗有前景的,可是最優的解決方案老是在不斷的權衡的。經過之間介紹的小測試程序,咱們把新的數據結構與HashSet
進行了對比,對每一種數據結構進行添加X條數據、檢測Item數據存在狀況和移除數據測試。咱們測試使用不一樣數量的Item(X = 十、X = 100、X = 1000…),並計算對應操做的平均時間。
The runtime results (time shown is in nanoseconds):
測試結果(單位:納秒)
We saw a runtime improvement for the contains
and remove
methods using the new data structure. Also, as the number of items increased in the array set, it took more time to add new items. That's consistent with what we already knew about LongSparseArray
— it doesn't perform as well as HashMap
when the number of items is more than 1,000. In our use cases, we're dealing with only hundreds of items, so this is a trade-off we're willing to make.
咱們看到使用新的數據結構後,contains
與remove
操做性能的顯著提高。同時,隨着item數量的增長,須要花更多的時間去添加一個新的item。這符合咱們對於 LongSparseArray
中item超過1000後的操做表現預期。在咱們的案例中,咱們僅僅要處理數百條的數據,所以通過權衡,咱們將採用這種方案。
We also saw a big improvement related to memory. In reviewing the heap dumps and Allocation Tracker reports, we noticed a decrease in object allocations. Here's a side-by-side allocation report for the HashSet
and LongArraySet
implementations, when adding 1,000 items for 20 iterations:
咱們還看到一個內存的重大提高。回顧heap dumps
和Allocation Tracker
報告,咱們注意到對象分配的減小。如下添加1000條數據時,HashSet
和LongArraySet
的內存分配報告。
In addition to avoiding all the Long
object allocations, LongSparseArray
was more memory-efficient internally, with about 30 percent fewer allocations in this scenario.
避免了Long
對象佔用內存,在這種情景下,LongSparseArray
減小了30%的內存佔用。
By understanding how other data structures work, we were able to create a more optimized data structure for our needs. The less the garbage collector has to work, the lower the likelihood of dropped frames. Using the new LongArraySet
class, and a similar IntArraySet
for the primitive int data type, we were able to cut down a significant number of allocations in our entire app.
經過了解數據結構的工做原理,咱們能夠建立一個知足咱們要求的更優的數據結構。垃圾回收的次數越少,對幀率的影響就越小。使用LongArraySet
和IntArraySet
,咱們能夠減小整個應用程序中,對象的內存分配。
This case study demonstrates the importance of measuring our options to ensure that we were introducing an improvement. We also accepted that while this solution is not perfect for all use cases, since this implementation is slower for very large data sets, it was an acceptable trade-off for us to optimize our code.
這個學習案例展現了,爲了確保咱們引入改進的可行,不斷權衡的重要性。對於這個方案並不適用於全部的狀況,這點咱們是能夠接收的,雖然當數據量很大時,這個方案的效率是不好的,可是這是一個咱們能夠接受的折衷優化方案。
You can find the source code for the two data structures here. We are excited to continue working on challenges and optimizing our Feed platform and to share our solutions with the community.
從這裏能夠下載LongArraySet
與IntArraySet
的代碼。
若是打不開,能夠從如下鏈接獲取源碼:
http://blog.csdn.net/xiaxl/ar...
本人英語水平較爛,若有翻譯錯誤之處,請指出,我會積極修正,謝謝你們