前言
ANR異常是android開發中比較常見的異常之一,最近爲了看ANR異常,差了很多資料,如今作一個總結。
本文將從如下幾個角度來總結ANR異常分析:java
1.ANR類型
2.ANR產生的緣由
3.ANR日誌來源
4.分析ANR問題的套路
1. ANR類型
ANR類型按報錯類型主要有Input(輸入事件)、Broadcast(廣播)、Service(服務), 對應產生ANR的時間分別爲:react
類型 前臺應用 後臺應用
Input 5s 5s
Broadcast 10s 6s
Service 20s 200s
上面的只是原生數據,各個廠商可能會對數據進行修改。而且前臺應用和後臺應用判斷是否ANR的時間是不同的。其中Input時間會直接彈出ANR彈框, 另外的異常若是用戶不操做, 只會打印日誌,不會彈彈框。android
2. ANR產生的緣由
產生ANR的緣由本質都是:主線程未在規定時間內完成代碼的執行, 具體的緣由有:app
1.耗時操做, 如複雜的Layout,龐大的for循環, IO操做等
2.被Binder 對端block
3.被子線程同步鎖block
4.Binder被佔滿致使主線程沒法和System_server進程通訊
5.得不到系統資源(CPU/RAM/IO)
第1點比較好理解, 就是在主線程耗時操做太多;異步
第2點指當前進程的主線程與其餘進程進行Binder通訊時, 因爲Binder通訊會讓主線程掛起,等待對端的返回, 結果對端因爲某些緣由一直未返回, 致使當前進程的主線程沒法往下執行而超市,產生ANR;oop
第3點是指主線程與其餘線程公用鎖可是一直未獲得鎖, 致使超時,產生ANR;測試
第4點是指主線程和System_server進程進行Binder通訊,因爲其餘線程或者其餘進程與System_server進程之間有太多Binder通訊,將Binder池佔滿,致使主線程的Binder通訊一直沒法處理,超時產生ANR;ui
第5點是因爲系統資源緊張,CPU佔用率太高,負載過大,IO操做太多, 致使主線程沒法獲得CPU時間片等系統資源去執行,而產生ANR.spa
其中前四點比較好分析, 經過日誌能夠找到線索並解決, 而第五種定位比較困難, 由於第五種ANR狀況更復雜, 甚至不是軟件的問題。.net
3. ANR日誌來源
若是是第三方數據上報,或者測試人員測試時抓取了日誌, 咱們能夠直接查看日誌, 可是若是使咱們本身發現或者復現了問題,能夠用如下方式獲取日誌:
3.1 traces.txt
ANR優先查看此日誌, AMS在ANR發生的時候,dump相關進程(ANR的進程、systemserver、mediaserver,surfaceFinger等)的當前棧到traces.txt。須要注意的是,traces.txt是抓取的是超時後(如input超時就是5s後)的snapshot,並不必定可以真實的反應出block的點。也存在抓到主線程沒有block,在idle的狀況。
獲取traces.txt的方式:adb pull /data/anr/traces.txt
3.2 applogcat-log
咱們的logcat信息會打印在此處, 裏面有較多的堆棧信息,按照日期,取出裏面最新的日誌。
獲取applogcat-log方式:adb pull /data/log/android_logs
3.3 dropbox
裏面會記錄不一樣的錯誤信息日誌, 日誌命名格式爲 AA_BB@CC.txt, 其中AA表示進程名,BB表示異常類型, CC爲時間戳,如 system_server_anr@13457689.txt, 就表示一個ANR異常的日誌文件, 找到本身對應的時間及進程的日誌信息便可。此日誌可做爲一個線索。
獲取方式:adb pull /data/system/dropbox
4. 分析ANR問題的套路
4.1 在主線程進行耗時操做
這個通常比較好排查,(這個正式APP如今通常也不會有吧。。。), 若是看代碼看不出, 能夠看一下日誌, 打開traces.txt文件, 看到的第一行是這樣:
----- pid 13250 at 2018-01-01 23:56:09 -----
1
這個表示進程 13250 的日誌就從這裏開始了, 通常traces.txt文件的第一個進程就是發生ANR所在進程, 而後繼續往下看, 找到主線程的日誌, 舉個栗子:
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 obj=0x7425caf8 self=0xb4827800
| sysTid=671 nice=0 cgrp=default sched=0/0 handle=0xb6febbec
| state=S schedstat=( 0 0 0 ) utm=1853 stm=50 core=2 HZ=100
| stack=0xbe7bd000-0xbe7bf000 stackSize=8MB
| held mutexes=
kernel: (couldn't read /proc/self/task/671/stack)
native: #00 pc 000133cc /system/lib/libc.so (syscall+28)
native: #01 pc 000a9a83 /system/lib/libart.so (art::ConditionVariable::Wait(art::Thread*)+82)
native: #02 pc 001b16f1 /system/lib/libart.so (art::JNI::NewString(_JNIEnv*, unsigned short const*, int)+640)
native: #03 pc 00075887 /system/lib/libandroid_runtime.so (???)
native: #04 pc 008570ab /data/dalvik-cache/arm/system@framework@boot.oat (Java_android_database_CursorWindow_nativeGetString__JII+110)
at android.database.CursorWindow.nativeGetString(Native method)
at android.database.CursorWindow.getString(CursorWindow.java:438)
at android.database.AbstractWindowedCursor.getString(AbstractWindowedCursor.java:51)
at android.database.CursorWrapper.getString(CursorWrapper.java:114)
at com.android.settings.ApnSettings.fillList(ApnSettings.java:259)
at com.android.settings.ApnSettings.onResume(ApnSettings.java:208)
at android.app.Fragment.performResume(Fragment.java:2096)
at android.app.FragmentManagerImpl.moveToState(FragmentManager.java:928)
at android.app.FragmentManagerImpl.moveToState(FragmentManager.java:1067)
at android.app.FragmentManagerImpl.moveToState(FragmentManager.java:1049)
at android.app.FragmentManagerImpl.dispatchResume(FragmentManager.java:1879)
at android.app.Activity.performResume(Activity.java:6113)
at android.app.ActivityThread.performResumeActivity(ActivityThread.java:3015)
at android.app.ActivityThread.handleResumeActivity(ActivityThread.java:3061)
at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2415)
at android.app.ActivityThread.access$800(ActivityThread.java:151)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1313)
at android.os.Handler.dispatchMessage(Handler.java:102)
at android.os.Looper.loop(Looper.java:135)
at android.app.ActivityThread.main(ActivityThread.java:5345)
at java.lang.reflect.Method.invoke!(Native method)
at java.lang.reflect.Method.invoke(Method.java:372)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:947)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:742)
這一段日誌中, 看到第一行的"main"就知道這就是主線程, main是線程名。而後從ApnSettings.java中能夠看到fillList這個可能的耗時操做主線程處理的地方有OnResume和OnReceive,由於調用棧到這裏就結束了。
且代碼中存在使用HandlerThread異步處理這個操做的地方,能夠借鑑下將OnResume和OnReceive這兩處都放到HandlerThread中去作。
4.2 被Binder 對端block
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 obj=0x73ee6470 self=0xb4d76500
| sysTid=22831 nice=0 cgrp=default sched=0/0 handle=0xb6f4bc00
| state=S schedstat=( 0 0 0 ) utm=22 stm=22 core=0 HZ=100
| stack=0xbe283000-0xbe285000 stackSize=8MB
| held mutexes=
native: #00 pc 000410ac /system/lib/libc.so (__ioctl+8)
native: #01 pc 000477e5 /system/lib/libc.so (ioctl+14)
native: #02 pc 0001e7c5 /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+132)
native: #03 pc 0001ee17 /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+38)
native: #04 pc 0001efcd /system/lib/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+124)
native: #05 pc 00019fb7 /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+30)
native: #06 pc 00086de9 /system/lib/libandroid_runtime.so (???)
native: #07 pc 00d94629 /data/dalvik-cache/arm/system@framework@boot.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+140)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:503)
at android.net.INetworkPolicyManager$Stub$Proxy.getNetworkPolicies(INetworkPolicyManager.java:410)
at android.net.NetworkPolicyManager.getNetworkPolicies(NetworkPolicyManager.java:174)
at com.android.settings.net.NetworkPolicyEditor.read(NetworkPolicyEditor.java:57)
at com.android.settings.DataUsageSummary.onCreate(DataUsageSummary.java:361)
at android.app.Fragment.performCreate(Fragment.java:2202)
at android.app.FragmentManagerImpl.moveToState(FragmentManager.java:942)
at android.app.FragmentManagerImpl.moveToState(FragmentManager.java:1148)
at android.app.BackStackRecord.run(BackStackRecord.java:793)
at android.app.FragmentManagerImpl.execPendingActions(FragmentManager.java:1535)
at android.app.FragmentManagerImpl.executePendingTransactions(FragmentManager.java:562)
at com.android.settings.SettingsActivity.switchToFragment(SettingsActivity.java:1084)
at com.android.settings.SettingsActivity.onCreate(SettingsActivity.java:657)
at android.app.Activity.performCreate(Activity.java:6251)
at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1110)
at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2370)
at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2477)
at android.app.ActivityThread.-wrap11(ActivityThread.java:-1)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1345)
at android.os.Handler.dispatchMessage(Handler.java:102)
at android.os.Looper.loop(Looper.java:148)
at android.app.ActivityThread.main(ActivityThread.java:5438)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:762)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:652)
主從日誌中看到調用了NetworkPolicyManager.getNetworkPolicies,而後就進行了Binder通訊, 而後調用棧就終止在這裏了,推斷是對端block了, Binder通訊的兩端方法名是一致的, 因此能夠再當前日誌搜索getNetworkPolicies,找到對端:
"Binder_4" prio=5 tid=56 Blocked
| group="main" sCount=1 dsCount=0 obj=0x1321a0a0 self=0xad31e200
| sysTid=2491 nice=0 cgrp=default sched=0/0 handle=0x9fd00930
| state=S schedstat=( 0 0 0 ) utm=46441 stm=46582 core=1 HZ=100
| stack=0x9fc04000-0x9fc06000 stackSize=1014KB
| held mutexes=
at com.android.server.net.NetworkPolicyManagerService.getNetworkPolicies(NetworkPolicyManagerService.java:1696)
- waiting to lock <0x07439315> (a java.lang.Object) held by thread 35
at android.net.INetworkPolicyManager$Stub.onTransact(INetworkPolicyManager.java:145)
at android.os.Binder.execTransact(Binder.java:453
被Tid=35的人拿住了一把鎖(0x07439315),那麼繼續看tid=35是誰,有兩種方法:
搜索tid=35
搜索0x07439315,找到 - locked <0x07439315> (a java.lang.Object)
"NetworkPolicy" prio=5 tid=35 TimedWaiting
| group="main" sCount=1 dsCount=0 obj=0x12d98940 self=0x9f91f700
| sysTid=2415 nice=0 cgrp=default sched=0/0 handle=0xa0f33930
| state=S schedstat=( 0 0 0 ) utm=7681 stm=7783 core=0 HZ=100
| stack=0xa0e31000-0xa0e33000 stackSize=1038KB
| held mutexes=
at java.lang.Object.wait!(Native method)
- waiting on <0x02580c1b> (a java.lang.Object)
at java.lang.Thread.parkFor$(Thread.java:1220)
- locked <0x02580c1b> (a java.lang.Object)
at sun.misc.Unsafe.park(Unsafe.java:299)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2053)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:372)
at com.android.server.NativeDaemonConnector$ResponseQueue.remove(NativeDaemonConnector.java:634)
at com.android.server.NativeDaemonConnector.executeForList(NativeDaemonConnector.java:426)
at com.android.server.NativeDaemonConnector.execute(NativeDaemonConnector.java:345)
at com.android.server.NativeDaemonConnector.execute(NativeDaemonConnector.java:340)
at com.android.server.NetworkManagementService.setInterfaceQuota(NetworkManagementService.java:1712)
- locked <0x0b0f91b8> (a java.lang.Object)
at com.android.server.net.NetworkPolicyManagerService.setInterfaceQuota(NetworkPolicyManagerService.java:2421)
at com.android.server.net.NetworkPolicyManagerService.updateNetworkRulesLocked(NetworkPolicyManagerService.java:1232)
at com.android.server.net.NetworkPolicyManagerService$14.onReceive(NetworkPolicyManagerService.java:1060)
- locked <0x07439315> (a java.lang.Object)
at android.app.LoadedApk$ReceiverDispatcher$Args.run(LoadedApk.java:881)
at android.os.Handler.handleCallback(Handler.java:739)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:148)
at android.os.HandlerThread.run(HandlerThread.java:61)
能夠看到,NetworkPolicy在經過NativeDaemonConnector和netd通訊(setInterfaceQuota)
咱們結合log來看下在發生ANR的先後的打印:
06-19 15:29:00.997 1235 1270 I am_anr : [0,22831,com.android.settings,818429509,Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)]
06-19 15:29:05.683 1235 2415 E NetdConnector: NDC Command {55445 bandwidth setiquota seth_w0 9223372036854775807} took too long (4755ms
06-19 15:29:05.723 1235 2491 I dvm_lock_sample: [system_server,1,Binder_4,4919,NetworkPolicyManagerService.java,1696,-,1056,100]
從eventslog中還能夠發現,setting出現了兩次連續的ANR,而上下文中都有相似上面的dvm_lock_sample NetworkPolicyManagerService相關的告警。
所以雖然15:28的此次ANR並無打出有用的棧,可是咱們仍是能夠猜想出這兩次ANR的緣由都是netd的command耗時過久致使的(在主線程的調用路徑上存在必然的block)
那麼,在netd可能沒法修改的狀況下(至少應用層是沒法修改的),咱們只能將可能存在block的操做放到非UI線程中去作。
4.3 主線程被子線程同步鎖block 致使ANR
這種問題和上面降到的Binder對端block的例子有一部分步驟類似, 只要在主線程的日誌中看到這麼一句:
waiting to lock <0x07439315> (a java.lang.Object) held by thread 35
1
說明主線程正在等鎖, held by thread 35 標識同步鎖被線程號 35 所持有, 咱們能夠搜索這個線程號或者0x07439315, 去看看這個線程在作什麼,找到問題緣由。
4.4 系統資源緊張致使ANR
04-01 13:12:11.572 I/InputDispatcher( 220): Application is not responding:Window{2b263310com.Android.email/com.android.email.activity.SplitScreenActivitypaused=false}. 5009.8ms since event, 5009.5ms since waitstarted
04-0113:12:11.572 I/WindowManager( 220): Input event dispatching timedout sending tocom.android.email/com.android.email.activity.SplitScreenActivity
04-01 13:12:14.123 I/Process( 220): Sending signal. PID: 21404 SIG: 3
04-01 13:12:14.123 I/dalvikvm(21404):threadid=4: reacting to signal 3
……
04-0113:12:15.872 E/ActivityManager( 220): ANR in com.android.email(com.android.email/.activity.SplitScreenActivity)
04-0113:12:15.872 E/ActivityManager( 220): Reason:keyDispatchingTimedOut
04-0113:12:15.872 E/ActivityManager( 220): Load: 8.68 / 8.37 / 8.53
04-0113:12:15.872 E/ActivityManager( 220): CPUusage from 4361ms to 699ms ago ----CPU在ANR發生前的使用狀況
04-0113:12:15.872 E/ActivityManager( 220): 5.5%21404/com.android.email: 1.3% user + 4.1% kernel / faults: 10 minor
04-0113:12:15.872 E/ActivityManager( 220): 4.3%220/system_server: 2.7% user + 1.5% kernel / faults: 11 minor 2 major
04-0113:12:15.872 E/ActivityManager( 220): 0.9%52/spi_qsd.0: 0% user + 0.9% kernel
04-0113:12:15.872 E/ActivityManager( 220): 0.5%65/irq/170-cyttsp-: 0% user + 0.5% kernel
04-0113:12:15.872 E/ActivityManager( 220): 0.5%296/com.android.systemui: 0.5% user + 0% kernel
04-0113:12:15.872 E/ActivityManager( 220): 100%TOTAL: 4.8% user + 7.6% kernel + 87% iowait
04-0113:12:15.872 E/ActivityManager( 220): CPUusage from 3697ms to 4223ms later:-- ANR後CPU的使用量======later是標誌
04-0113:12:15.872 E/ActivityManager( 220): 25%21404/com.android.email: 25% user + 0% kernel / faults: 191 minor
04-0113:12:15.872 E/ActivityManager( 220): 16% 21603/__eas(par.hakan: 16% user + 0% kernel
04-0113:12:15.872 E/ActivityManager( 220): 7.2% 21406/GC: 7.2% user + 0% kernel
04-0113:12:15.872 E/ActivityManager( 220): 1.8% 21409/Compiler: 1.8% user + 0% kernel
04-0113:12:15.872 E/ActivityManager( 220): 5.5%220/system_server: 0% user + 5.5% kernel / faults: 1 minor
04-0113:12:15.872 E/ActivityManager( 220): 5.5% 263/InputDispatcher: 0% user + 5.5% kernel
04-0113:12:15.872 E/ActivityManager( 220): 32%TOTAL: 28% user + 3.7% kernel
上述日誌中,有這麼三行:
04-01 13:12:14.123 I/Process( 220): Sending signal. PID: 21404 SIG: 3—發生ANR的時間和生成trace.txt的時間 04-0113:12:15.872 E/ActivityManager( 220): CPUusage from 4361ms to 699ms ago ----CPU在ANR發生前的使用狀況 04-0113:12:15.872 E/ActivityManager( 220): 100%TOTAL: 4.8% user + 7.6% kernel + 87% iowait 04-0113:12:15.872 E/ActivityManager( 220): CPUusage from 3697ms to 4223ms later:-- ANR後CPU的使用量 從這個日誌中能夠看到CPU使用率很高,IO操做佔比很大(87% iowait), 由此致使系統資源緊張,可是也只能推測ANR很是可能由此產生,卻不能100%肯定, 由於這個CPU統計是一個時間段的,不是一個實時的, 具體還須要在日誌中找找其餘線索,以肯定問題。