Android TransactionTooLargeException 解析，思考與監控方案

時間 2019-11-17

標籤 android transactiontoolargeexception 解析思考監控方案欄目 Android 简体版

原文原文鏈接

　　最近公司遇到了一個頗有意思的 Crash：android.os.TransactionTooLargeException，這個 Crash 你們可能見到過，錯誤堆棧的信息多種多樣，下面是其中的常見錯誤堆棧信息之一：javascript

#1 main
android.os.TransactionTooLargeException
java.lang.RuntimeException:Adding window failed
android.view.ViewRootImpl.setView(ViewRootImpl.java:515)
......
Caused by:
android.os.TransactionTooLargeException:
android.os.BinderProxy.transact(Native Method)
android.view.IWindowSession$Stub$Proxy.addToDisplay(IWindowSession.java:684)
android.view.ViewRootImpl.setView(ViewRootImpl.java:504)
android.view.WindowManagerGlobal.addView(WindowManagerGlobal.java:259)
android.view.WindowManagerImpl.addView(WindowManagerImpl.java:69)
android.app.Dialog.show(Dialog.java:307)複製代碼

這個是什麼引發的呢？其實多是一段很簡單的代碼，相似於：php

Dialog dialog = XXX;
......
dialog.show();複製代碼

就是這麼一段簡單的 dialog.show() 就會致使崩潰，具體緣由我在下面會詳細介紹到，咱們先會經過分析三段 Crash 日誌調用信息來定位緣由，而後提出解決辦法。
　　轉載請註明出處：blog.csdn.net/self_study/…
　　對技術感興趣的同鞋加羣 544645972 一塊兒交流。html

TransactionTooLargeException 分析與解決

　　咱們來仔細分析一下這個 Exception 的錯誤堆棧信息，因爲這裏面涉及到了 AIDL 以及 WMS，AMS的相關知識，這裏列出對應相關的博客，下面的分析會直接使用到這些內容：
android 不能在子線程中更新ui的討論和分析：Activity 打開的過程分析；
java/android 設計模式學習筆記（9）---代理模式：AMS 的相關類圖和介紹；
android WindowManager解析與騙取QQ密碼案例分析：界面 window 的建立過程；
java/android 設計模式學習筆記（8）---橋接模式：WMS 的相關類圖和介紹；
android IPC通訊（下）－AIDL：AIDL 以及 Binder 的相關介紹；
Android 動態代理以及利用動態代理實現 ServiceHook：ServiceHook 的相關介紹；
Android TransactionTooLargeException 解析，思考與監控方案：TransactionTooLargeException 的解析以及監控方案。java

TransactionTooLargeException StackTrace 分析

　　咱們這裏先分析一下上面那段 Exception 的調用棧，這裏直接摘取了其中的方法調用部分：android

android.view.ViewRootImpl.setView(ViewRootImpl.java:515)
android.os.BinderProxy.transact(Native Method)
android.view.IWindowSession$Stub$Proxy.addToDisplay(IWindowSession.java:684)
android.view.ViewRootImpl.setView(ViewRootImpl.java:504)
android.view.WindowManagerGlobal.addView(WindowManagerGlobal.java:259)
android.view.WindowManagerImpl.addView(WindowManagerImpl.java:69)
android.app.Dialog.show(Dialog.java:307)複製代碼

從最底下開始，咱們一步步分析，首先第一個是 dialog.show() 函數，這個是咱們應用層用來顯示一個 Dialog 的方法，很正常，對吧，而後下一句：c++

android.view.WindowManagerImpl.addView(WindowManagerImpl.java:69)複製代碼

我在博客：android WindowManager解析與騙取QQ密碼案例分析中介紹到，Dialog Window 的建立和 Activity 相似，也是須要調用 PolicyManager.makeNewWindow 去建立一個 Window，而後經過 WindowManager 將該 Window 的 DecorView 添加到 Activity 的 Window 中就能顯示出來了，與 Activity 最大的區別就是 Dialog 的 Window 須要一個 Activity 的句柄，由於須要依附在 Activity 上面，而 Toast 這種系統 Window 則能夠直接顯示，這三種 Window 有着不一樣的層級範圍，層級大的 Window 會覆蓋在層級小的 Window 之上，應用window的層級範圍是 1~99，子 Window 的範圍是 1000~1999，系統 Window 的範圍是 2000~2999。因此說 Dialog.show() 會調用相關的函數去建立 Window ，而 Dialog 建立 Window 的過程咱們能夠參考 Activity 建立 Window 的過程，第一步會調用到 WindowManagerImpl 類中的 addView 函數去添加上面 new 出來的那個 Window 對象，而 WindowManager 和 Window 類是一個典型的橋接模式，具體的能夠看看個人博客：java/android 設計模式學習筆記（8）---橋接模式，下面爲 uml 類圖：

git

WindowManagerImpl 類持有一個 WindowManagerGlobal 類的引用，全部的操做都交給了 WindowManagerGlobal 類， WindowManagerGlobal 裏面會調用到 ViewRootImpl 類的 setView 方法，而這個函數裏面會調用 IWindowSession 類，這個 IWindowSession 類的對象 sWindowSession 是經過 IWindowManager 的 openSession 函數獲取的，而 IWindowManager 其實就是 WindowManagerService 在應用進程的 Proxy 類對象，它持有了 WMS 的 IBinder 對象，經過 AIDL 調用到主進程的 WMS 中，WMS 的 openSession 方法返回的是一個 IWindowSession.Stub 類的對象，可是因爲跨進程了，因此係統進程返回的 IWindowSession.Stub 對象在應用進程中就對應爲 IWindowSession 的 IBinder 對象，最後同理須要調用 IWindowSession.Stub.asInterface 函數轉成 Proxy 對象，具體的代碼以下所示：

@Override
public android.view.IWindowSession openSession(android.view.IWindowSessionCallback callback, com.android.internal.view.IInputMethodClient client, com.android.internal.view.IInputContext inputContext) throws android.os.RemoteException {
    android.os.Parcel _data = android.os.Parcel.obtain();
    android.os.Parcel _reply = android.os.Parcel.obtain();
    android.view.IWindowSession _result;
    try {
        _data.writeInterfaceToken(DESCRIPTOR);
        _data.writeStrongBinder((((callback != null)) ? (callback.asBinder()) : (null)));
        _data.writeStrongBinder((((client != null)) ? (client.asBinder()) : (null)));
        _data.writeStrongBinder((((inputContext != null)) ? (inputContext.asBinder()) : (null)));
        mRemote.transact(Stub.TRANSACTION_openSession, _data, _reply, 0);
        _reply.readException();
        _result = android.view.IWindowSession.Stub.asInterface(_reply.readStrongBinder());
    } finally {
        _reply.recycle();
        _data.recycle();
    }
    return _result;
}複製代碼

這樣就轉換爲了 IWindowSession\$Stub\$Proxy 對象，爲何調用到的是 BinderProxy 類的 transact 方法呢？android IPC通訊（下）－AIDL博客中我已經介紹到了，應用進程經過 ServiceManager 獲取到的 WMS 的 IBinder 對象其實就是 BinderProxy 對象，這裏的 IWindowSession 也是相似的，因此調用到了 BinderProxy 對象中的 transact 方法，而這個方法：github

public boolean transact(int code, Parcel data, Parcel reply, int flags) throws RemoteException {
    Binder.checkParcel(this, code, data, "Unreasonably large binder buffer");
    if (Binder.isTracingEnabled()) { Binder.getTransactionTracker().addTrace(); }
    return transactNative(code, data, reply, flags);
}
....
public native boolean transactNative(int code, Parcel data, Parcel reply,
        int flags) throws RemoteException;複製代碼

這個方法就調用到了 native 方法中，全局搜索一下，這個方法對應於 native 的 android_os_BinderProxy_transact 方法，這個方法是關鍵：編程

static jboolean android_os_BinderProxy_transact(JNIEnv* env, jobject obj,
        jint code, jobject dataObj, jobject replyObj, jint flags) // throws RemoteException
{
    if (dataObj == NULL) {
        jniThrowNullPointerException(env, NULL);
        return JNI_FALSE;
    }

    Parcel* data = parcelForJavaObject(env, dataObj);
    if (data == NULL) {
        return JNI_FALSE;
    }
    Parcel* reply = parcelForJavaObject(env, replyObj);
    if (reply == NULL && replyObj != NULL) {
        return JNI_FALSE;
    }

    IBinder* target = (IBinder*)
        env->GetLongField(obj, gBinderProxyOffsets.mObject);
    if (target == NULL) {
        jniThrowException(env, "java/lang/IllegalStateException", "Binder has been finalized!");
        return JNI_FALSE;
    }

    ALOGV("Java code calling transact on %p in Java object %p with code %" PRId32 "\n",
            target, obj, code);


    bool time_binder_calls;
    int64_t start_millis;
    if (kEnableBinderSample) {
        // Only log the binder call duration for things on the Java-level main thread.
        // But if we don't
        time_binder_calls = should_time_binder_calls();

        if (time_binder_calls) {
            start_millis = uptimeMillis();
        }
    }

    //printf("Transact from Java code to %p sending: ", target); data->print();
    status_t err = target->transact(code, *data, reply, flags);
    //if (reply) printf("Transact from Java code to %p received: ", target); reply->print();

    if (kEnableBinderSample) {
        if (time_binder_calls) {
            conditionally_log_binder_call(start_millis, target, code);
        }
    }

    if (err == NO_ERROR) {
        return JNI_TRUE;
    } else if (err == UNKNOWN_TRANSACTION) {
        return JNI_FALSE;
    }

    signalExceptionForError(env, obj, err, true /*canThrowRemoteException*/, data->dataSize());
    return JNI_FALSE;
}複製代碼

其中調用到了 signalExceptionForError 方法：設計模式

void signalExceptionForError(JNIEnv* env, jobject obj, status_t err,
        bool canThrowRemoteException, int parcelSize)
{
    switch (err) {
        case UNKNOWN_ERROR:
            jniThrowException(env, "java/lang/RuntimeException", "Unknown error");
            break;
        ......
        case FAILED_TRANSACTION: {
            ALOGE("!!! FAILED BINDER TRANSACTION !!! (parcel size = %d)", parcelSize);
            const char* exceptionToThrow;
            char msg[128];
            // TransactionTooLargeException is a checked exception, only throw from certain methods.
            // FIXME: Transaction too large is the most common reason for FAILED_TRANSACTION
            // but it is not the only one. The Binder driver can return BR_FAILED_REPLY
            // for other reasons also, such as if the transaction is malformed or
            // refers to an FD that has been closed. We should change the driver
            // to enable us to distinguish these cases in the future.
            if (canThrowRemoteException && parcelSize > 200*1024) {
                // bona fide large payload
                exceptionToThrow = "android/os/TransactionTooLargeException";
                snprintf(msg, sizeof(msg)-1, "data parcel size %d bytes", parcelSize);
            } else {
                // Heuristic: a payload smaller than this threshold "shouldn't" be too
                // big, so it's probably some other, more subtle problem. In practice
                // it seems to always mean that the remote process died while the binder
                // transaction was already in flight.
                exceptionToThrow = (canThrowRemoteException)
                        ? "android/os/DeadObjectException"
                        : "java/lang/RuntimeException";
                snprintf(msg, sizeof(msg)-1,
                        "Transaction failed on small parcel; remote process probably died");
            }
            jniThrowException(env, exceptionToThrow, msg);
        } break;
        .......
    }
}複製代碼

因而咱們就看到了關鍵的一句話：

exceptionToThrow = "android/os/TransactionTooLargeException";
snprintf(msg, sizeof(msg)-1, "data parcel size %d bytes", parcelSize);複製代碼

沒錯，這就是錯誤的來源，這裏判斷若是這個 parcelSize 大於 200K 就會報錯，而這個 parcelSize 的大小，對應一下，發現就是 BinderProxy 的第二個參數，也就是說若是 Percel 對象的大小超過 200K 就會報出這個錯誤，而這個參數的大小就是應用進程傳遞給主進程的參數大小，而應用進程傳遞給主進程的參數對應的就是 Dialog 的相關參數，好比 Message 或者 Title 等等，若是這些參數過大的話，就會出現這個崩潰，解決辦法就是就是將 Dialog 的相關參數變小，但是這真的是解決辦法嘛，不必定，我們繼續看。

同類 Crash

　　上面的 Dialog.show() 引起的 Crash 只是冰山一角，由於咱們知道調用 WMS 服務的時候，transact 函數的參數若是過大就會崩潰，那麼 AMS，PMS呢？答案是確定的，咱們來看看我司的相關同類 Crash：

PMS 檢查權限

java.lang.reflect.UndeclaredThrowableException:
$Proxy2.checkPermission(Unknown Source)
......
Caused by:
android.os.TransactionTooLargeException:
android.os.BinderProxy.transactNative(Native Method)
android.os.BinderProxy.transact(Binder.java:504)
android.content.pm.IPackageManager$Stub$Proxy.checkPermission(IPackageManager.java:2169)
java.lang.reflect.Method.invoke(Native Method)
java.lang.reflect.Method.invoke(Method.java:372)
androidx.pluginmgr.hook.PackageManagerHook$HookHandler.invoke(PackageManagerHook.java:99)
java.lang.reflect.Proxy.invoke(Proxy.java:397)
$Proxy2.checkPermission(Unknown Source)
android.app.ApplicationPackageManager.checkPermission(ApplicationPackageManager.java:401)
com.lidroid.xutils.util.DeviceInfoUtils.checkPermissions(DeviceInfoUtils.java:315)複製代碼

能夠看到這個是由 PackageManager.checkPermission 引發的，而這個會最終會調用到 ApplicationPackageManager 類的 checkPermission 函數裏面，這個函數：

@Override
public int checkPermission(String permName, String pkgName) {
    try {
        return mPM.checkPermission(permName, pkgName, mContext.getUserId());
    } catch (RemoteException e) {
        throw e.rethrowFromSystemServer();
    }
}複製代碼

這個函數調用到了 mPM 變量的 checkPermission 方法中，這個變量是 IPackageManager 類型，由於我司的插件化框架的緣故，因此這個變量是被修改過的，具體的能夠看看個人博客：Android 動態代理以及利用動態代理實現 ServiceHook，這個變量最終被修改成一個動態生成類的對象，博客裏面我介紹到這個類的名字格式爲 \$ProxyXXX，後面的 XXX 爲具體的數字，因此緊接着就調用到了這個動態生成類的 checkPermission 函數裏面，而後調用到 InvocationHandler 類的 invoke 方法裏面，對應的就是 PackageManagerHook 類的內部類 HookHandler 的 invoke 方法，最終會調用到 IPackageManager 的 Proxy 對象中，對應的就是 IPackageManager\$Stub\$Proxy 這個角色，這個角色會調用 IBinder 對象的，也就是 BinderProxy 的 transact 方法，最終的調用過程也就是和上面 WMS 的相似了。

AMS -> WMS 啓動應用或者打開頁面

　　還有另外的好比：

java.lang.RuntimeException:Adding window failed
android.view.ViewRootImpl.setView(ViewRootImpl.java:559)
......
Caused by:
android.os.TransactionTooLargeException:
android.os.BinderProxy.transact(Native Method)
android.view.IWindowSession$Stub$Proxy.addToDisplay(IWindowSession.java:683)
android.view.ViewRootImpl.setView(ViewRootImpl.java:548)
android.view.WindowManagerGlobal.addView(WindowManagerGlobal.java:259)
android.view.WindowManagerImpl.addView(WindowManagerImpl.java:94)
android.app.ActivityThread.handleResumeActivity(ActivityThread.java:3394)
android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2658)
android.app.ActivityThread.access$800(ActivityThread.java:156)
android.app.ActivityThread$H.handleMessage(ActivityThread.java:1355)
android.os.Handler.dispatchMessage(Handler.java:102)
android.os.Looper.loop(Looper.java:157)
android.app.ActivityThread.main(ActivityThread.java:5883)
java.lang.reflect.Method.invokeNative(Native Method)
java.lang.reflect.Method.invoke(Method.java:515)
com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:871)
com.android.internal.os.ZygoteInit.main(ZygoteInit.java:687)
dalvik.system.NativeStart.main(Native Method)複製代碼

從最底下開始分析，NativeStart.main 和 ZygoteInit.main，這個在每一個應用啓動以前都會執行，由於每一個應用的進程都是經過 Zygote 進程 fork 出來的，Zygote進程這裏簡單介紹一下：Zygote 服務進程也叫作孵化進程，在 Linux 的用戶空間，進程 app_process 先會作一些 Zygote 進程啓動的前期工做，如啓動 Runtime 運行時環境(實例)，參數分解，設置 startSystemServer 標誌，接着用 runtime.start() 來執行 Zygote 服務的代碼，其實說簡單點，就是 Zygote 搶了 app_process 這個進程的軀殼，改了名字，將後面的代碼換成 Zygote 的 main 函數，這樣順利地過分到了 Zygote 服務進程。這樣咱們在控制檯用 ps 看系統全部進程信息，就不會看到 app_process，取而代之的是 Zygote。而前面 runtime.start()這個函數其實是類函數 AndroidRuntime::start()，在這個函數中，會新建並啓動一個虛擬機實例來執行 com.android.internal.os.ZygoteInit 這個包的 main 函數。這個 main 函數中會 fork 一個子進程來啓動 Systemserver，父進程就做爲真正的孵化進程存在了，每當系統要求執行一個 Android 應用程序，Zygote 就會收到 socket 消息 fork 出一個子進程來執行該應用程序。由於 Zygote 進程是在系統啓動時產生的，它會完成虛擬機的初始化，庫的加載，預置類庫的加載和初始化等操做，而在系統須要一個新的虛擬機實例時能夠快速地製造出一個虛擬機出來。因此這就是應用啓動以後會調用到 ZygoteInit 類的緣由，這個 ZygoteInit.main 接着調用到了 ZygoteInit$MethodAndArgsCaller.run，這個函數的調用過程頗有意思，這裏須要着重分析一下：ZygoteInit.main -> ZygoteInit.startSystemServer -> ZygoteInit.handleSystemServerProcess -> RuntimeInit.zygoteInit -> RuntimeInit.applicationInit -> RuntimeInit.invokeStaticMain -> 拋出 MethodAndArgsCaller 異常 -> 被 ZygoteInit.main 捕獲 -> MethodAndArgsCaller.run，爲何要在 RuntimeInit.invokeStaticMain 拋出異常，而後在 ZygoteInit.main 函數中捕獲它呢，這個就要涉及到函數的執行模型了，咱們知道，程序都是由一個個函數組成的(除了彙編程序)，c/c++/java/.. 等高級語言編寫的應用程序在執行的時候，他們都擁有本身的棧空間（是一種先進後出的內存區域），用於存放函數的返回地址和函數的臨時數據，每調用一個函數時，就會把函數的返回地址和相關數據壓入棧中，當一個函數執行完後，就會從棧中彈出，cpu 會根據函數的返回地址，執行上一個調用函數的下一條指令。因此，在拋出異常後，若是異常沒有在當前的函數中捕獲，那麼當前的函數執行就會異常的退出，從應用程序的棧彈出，並將這個異常傳遞給上一個函數，直到異常被捕獲處理，不然，就會引發程序的崩潰。咱們能夠回想一下，不管咱們寫 C 程序仍是 Java 程序，他們都只有一個入口就是 main 函數，當 main 函數返回退出後就表明整個程序退出了，根據上面分析的函數的執行模型，程序的 main 函數應該是每個應用程序最後退出的函數，應該位於棧的底部。同理，Android 應用程序的入口是 ActivityThread.main 函數，因此它也應該位於新的進程棧的 ZygoteInit.main 函數的上面，這樣才能實現直接退出應用程序，可是 Android 每 fork 一個新進程的時候，它都會先調用其餘的函數作一些子進程的處理，這樣就形成此時應用程序棧的最底部函數上面不是 ActivityThread.main 函數，而是其餘函數，因此這裏經過拋異常的方式啓動 ActivityThread.main 函數主要是清理應用程序棧中 ZygoteInit.main 以上的函數棧，以實現當 ActivityThread.main 函數退出時，能直接退出整個應用程序。當 ActivityThread 的 main 退出後，就會退回到 MethodAndArgsCaller.run，而這個函數直接就退回到 ZygoteInit.main 函數，而 ZygoteInit.main 也無其餘的操做，直接退出了函數，這樣整個應用程序將會徹底退出，咱們看看 google 工程師的註釋也能夠看出來：

private static void invokeStaticMain(String className, String[] argv, ClassLoader classLoader)
        throws ZygoteInit.MethodAndArgsCaller {
    ......
    /* * This throw gets caught in ZygoteInit.main(), which responds * by invoking the exception's run() method. This arrangement * clears up all the stack frames that were required in setting * up the process. */
    throw new ZygoteInit.MethodAndArgsCaller(m, argv);
}複製代碼

是用來清空須要建立一個進程的前期函數調用棧的。接着在 ZygoteInit.MethodAndArgsCaller 函數中經過 method.invoke() 方法調用到了 ActivityThread.main，這個函數熟悉的味道，哈哈哈哈，這就是一個應用的 main 函數，打開某個應用的時候入口函數就是這個 main，咱們看看這個函數：

public static void main(String[] args) {
    Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "ActivityThreadMain");
    SamplingProfilerIntegration.start();

    // CloseGuard defaults to true and can be quite spammy. We
    // disable it here, but selectively enable it later (via
    // StrictMode) on debug builds, but using DropBox, not logs.
    CloseGuard.setEnabled(false);

    Environment.initForCurrentUser();

    // Set the reporter for event logging in libcore
    EventLogger.setReporter(new EventLoggingReporter());

    // Make sure TrustedCertificateStore looks in the right place for CA certificates
    final File configDir = Environment.getUserConfigDirectory(UserHandle.myUserId());
    TrustedCertificateStore.setDefaultUserDirectory(configDir);

    Process.setArgV0("<pre-initialized>");

    Looper.prepareMainLooper();

    ActivityThread thread = new ActivityThread();
    thread.attach(false);

    if (sMainThreadHandler == null) {
        sMainThreadHandler = thread.getHandler();
    }

    if (false) {
        Looper.myLooper().setMessageLogging(new
                LogPrinter(Log.DEBUG, "ActivityThread"));
    }

    // End of event ActivityThreadMain.
    Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER);
    Looper.loop();

    throw new RuntimeException("Main thread loop unexpectedly exited");
}複製代碼

這個函數裏面有一句 Looper.prepareMainLooper()，咱們知道 Android 系統是事件驅動的，因此這個 Looper 是用來接收應用事件的（這裏就不介紹 Looper ，Handler 以及相關類了），接收到消息以後會調用 Handler 去處理這些消息，這個 Handler 的名字叫什麼呢？就叫 H，哈哈哈，很直白，我在博客 android 不能在子線程中更新ui的討論和分析中介紹到了這個 H 類，感興趣的能夠去了解一下，而後調用到 H 類的 handleLaunchActivity 方法中（ActivityThread.access$800 這行日誌，在使用 Handler 的時候就會打印，應該是表明從 Handler 的 Looper 處理消息到了相關類也就是 ActivityThread 中），這個方法在調用 startActivity 打開一個頁面時也會調用，由於第一次打開應用的時候也須要打開 HOME 界面，因此後面的步驟就和 startActivity 同樣了，handleLaunchActivity 函數會調用到 handleResumeActivity，handleResumeActivity 函數中會建立 Activity 的 PhoneWindow，而且經過 WMS 添加這個建立的 PhoneWindow，由於步驟和 Dialog.show 的就是同樣的了，我這裏不重複分析了。

疑問以及解決方案

　　爲何上面這兩個 Exception 會這麼詭異呢？一個簡單的 PMS.checkPermission 和啓動應用都會崩潰麼？咱們來看看 google 官方對於該 TransactionTooLargeException 的介紹：

The Binder transaction failed because it was too large.
During a remote procedure call, the arguments and the return value of the call are transferred as 
Parcel objects stored in the Binder transaction buffer. If the arguments or the return value are too 
large to fit in the transaction buffer, then the call will fail and TransactionTooLargeException 
will be thrown.

The Binder transaction buffer has a limited fixed size, currently 1Mb, which is shared by all 
transactions in progress for the process. Consequently this exception can be thrown when there 
are many transactions in progress even when most of the individual transactions are of moderate size.

There are two possible outcomes when a remote procedure call throws TransactionTooLargeException. 
Either the client was unable to send its request to the service (most likely if the arguments were 
too large to fit in the transaction buffer), or the service was unable to send its response back to 
the client (most likely if the return value was too large to fit in the transaction buffer). It is 
not possible to tell which of these outcomes actually occurred. The client should assume that a 
partial failure occurred.

The key to avoiding TransactionTooLargeException is to keep all transactions relatively small. 
Try to minimize the amount of memory needed to create a Parcel for the arguments and the return 
value of the remote procedure call. Avoid transferring huge arrays of strings or large bitmaps. 
If possible, try to break up big requests into smaller pieces.

If you are implementing a service, it may help to impose size or complexity contraints on the 
queries that clients can perform. For example, if the result set could become large, then don't 
allow the client to request more than a few records at a time. Alternately, instead of returning 
all of the available data all at once, return the essential information first and make the client 
ask for additional information later as needed.複製代碼

一步步分析一下上面的介紹，一個跨進程調用，調用的參數和返回值是要轉換成 Parcel 對象進行傳遞的，而這些 Parcel 對象是存儲在 Binder transaction buffer 裏面的，若是參數或者返回值過大，致使這個 buffer 放不下的話，程序就會失敗而且拋出 TransactionTooLargeException 異常；這個 Binder transaction buffer 有一個固定的大小 1Mb，而這個空間是提供給一個進程的全部 transaction 使用的，所以甚至當絕大多數單獨的 transaction 調用的參數大小並不大可是數量不少的時候，也會拋出這個 Exception；當遠程調用拋出 TransactionTooLargeException 異常的時候，一般會有兩個可能的結果，一個是 Binder Client 沒法將請求發送給 Service（通常是因爲傳遞的參數過大，Binder transaction buffer 放不下致使的），另外一個是 Service 沒法將返回值傳遞迴 Binder Client（通常是因爲返回值過大致使），通常很難去決斷到底會產生這兩個結果中的哪個，因此客戶端應該去假定它們中的一個會失敗；避免 TransactionTooLargeException 的關鍵是讓全部的 transaction 儘量的小，儘可能去縮小遠程調用 Service 的參數大小和返回值，禁止傳遞大數組，String 字串或者一個大的 Bitmap 對象，若是能夠的話，儘可能把大的請求分解成一個個小的調用；若是你在實現一個 Service 服務者，瞭解這些會幫助你強制性的規定 Binder Client 的遠程調用的大小和制定一些複雜的約束，舉個例子，若是結果集合可能會變的很大，那麼就不容許 Binder Client 在一個時間點內請求超過必定數量，又或者能夠選擇性地當返回值很大的時候，不須要一次性返回全部數據，能夠第一次先返回關鍵的數據，而後若是須要的話讓 Binder Client 再次去請求額外的信息。
　　看到這裏咱們明白了，一個應用進程的全部 AIDL 調用都是共用一個 Binder transaction buffer，而這個 buffer 的大小僅僅只是 1Mb，當全部的遠程調用的參數或者這些調用返回值的大小加起來超過 1Mb 的話就會拋出 TransactionTooLargeException 異常，因此這也就是咱們上面的 WMS，PMS 都會拋出這個錯誤的緣由。知道緣由，咱們就知道初步的處理方法了，就是查看每個拋出這個異常的地方，修改調用參數的大小，或者去查看 AIDL 的 Binder Server 端，看看是不是返回值的大小超過了必定的限制。
　　亦或者看看這個答案的描述也能夠：What to do on TransactionTooLargeException，它給出了幾種常見的可能會形成這個 exception 的使用方式：

When you get this exception in your application, please analyze your code.

1. Are you exchanging lot of data between your services and application?
2. Using intents to share huge data, (for example, the user selects huge number of files 
from gallery share press share, the URIs of the selected files will be transferred using intents)
3. receiving bitmap files from service
4. waiting for android to respond back with huge data (for example, getInstalledApplications() 
when the user installed lot of applications)
5. using applyBatch() with lot of operations pending複製代碼

討論與思考

　　通過上面的三種同類型 Crash 的分析，咱們知道了一個應用進程會對應一個 Binder transaction buffer(若是應用有多個進程，那就是對應多個 buffer)，若是一個應用進程的全部 AIDL 調用，這裏包括系統 Service 和應用內部跨進程通訊的 Client 和 Server 的調用，在一個時間點內這些調用的參數和返回值大小若是加起來超過 1Mb，就會引發 TransactionTooLargeException 錯誤，那麼問題來了！！！咱們在分析第一個 Dialog.show() 引起的崩潰日誌的時候，跟蹤到 native 層的時候，明明看到這一段代碼：

// TransactionTooLargeException is a checked exception, only throw from certain methods.
// FIXME: Transaction too large is the most common reason for FAILED_TRANSACTION
// but it is not the only one. The Binder driver can return BR_FAILED_REPLY
// for other reasons also, such as if the transaction is malformed or
// refers to an FD that has been closed. We should change the driver
// to enable us to distinguish these cases in the future.
if (canThrowRemoteException && parcelSize > 200*1024) {
    // bona fide large payload
    exceptionToThrow = "android/os/TransactionTooLargeException";
    snprintf(msg, sizeof(msg)-1, "data parcel size %d bytes", parcelSize);
}複製代碼

而這裏是檢測了調用的參數若是大於 200K 就會報出錯誤，並且這裏的大小僅僅只是調用的參數大小，我全局搜索了 Android 的源碼，發現拋出異常的地方只有這一處：Androidxref search TransactionTooLargeException，因此這就和 google 的官方文檔有出入了，並且實際的狀況更傾向於 google 文檔的介紹，可是代碼確實擺在這，拋出異常的地方只有這一處，仍是說個人代碼分析出現了問題，可是 200K 確實是硬編碼寫死的，並且我看了一下我司的代碼，Dialog.show() 函數確實沒有傳遞大的數據，PMS.checkPermission() 函數也沒有傳遞大的參數，因此不會有參數超過 200K 的狀況出現，那麼實際多是因爲 buffer 已經快要滿了，致使一次小參數的調用也會致使拋出這個異常，也就是實際更傾向於 1Mb 的解釋，但是這就和源碼對應不上了，這就是我糾結的地方了，由於這個確實是讓我很困惑，但願有大神能夠知會我一下，很是感謝～～

最終監控方案&&源碼

　　這個問題的最終處理的方法就是去檢查參數和返回值的大小，還有不能在短期內有大量的系統 Service 調用，若是是前者比較好處理，可是若是是後者，就相對比較麻煩，須要去仔細查看工程源碼，查找每一處可能引起的地方，能不能有一種方式能夠獲取應用每次調用 Service 的參數大小和調用的頻率呢？能夠的，怎麼去作呢，這就要用到上一篇博客：Android 動態代理以及利用動態代理實現 ServiceHook 內容了，咱們將上篇博客的源碼稍微改造一下就OK了！！怎麼獲取調用系統 Service 的參數大小呢？上面分析源碼的時候咱們知道，在 BinderProxy 對象調用 transact 方法的時候，第二個參數 Parcel 對象對應的就是咱們參數，因此咱們只須要獲取到這個參數的大小並經過日誌打印出來，這樣就可以實時監控參數的大小。怎麼獲取調用的頻率呢？可以打印大小了，那麼只須要查看每次打印大小的日誌間隔時間就能夠了，若是在短期內有大量的 AIDL 調用就能夠定位問題源碼的所在了。
　　好比咱們如今就須要監控 ClipboardService 每次調用的參數大小和頻率，怎麼作？很簡單，咱們知道 ClipboardService 返回給應用進程的 IBinder 對象會轉成一個 Proxy 對象，而這個 Proxy 對象會持有上面 IBinder 對象的引用，這個引用名字叫 mRemote，Proxy 的每次調用其實就是簡單的 new 兩個 Parcel 對象，一個是參數，一個是返回值，而後調用 mRemote 對象的 transact 方法將信息寫入到 Binder Driver 中：

@Override
public void setPrimaryClip(android.content.ClipData clip, java.lang.String callingPackage) throws android.os.RemoteException {
    android.os.Parcel _data = android.os.Parcel.obtain();
    android.os.Parcel _reply = android.os.Parcel.obtain();
    try {
        _data.writeInterfaceToken(DESCRIPTOR);
        if ((clip != null)) {
            _data.writeInt(1);
            clip.writeToParcel(_data, 0);
        } else {
            _data.writeInt(0);
        }
        _data.writeString(callingPackage);
        mRemote.transact(Stub.TRANSACTION_setPrimaryClip, _data, _reply, 0);
        _reply.readException();
    } finally {
        _reply.recycle();
        _data.recycle();
    }
}複製代碼

而咱們傳遞給 ClipboardService 的參數就寫進了 _data 那個 Parcel 對象中，BinderProxy 對象調用 transact 函數的時候，這個參數被放在了第二位，咱們只須要打印第二個參數的大小不就能夠了麼，咱們如今已經獲取到了 ClipboardService 在應用進程的 Proxy 對象，因此接下來只須要經過反射 mRemote 變量，設置爲咱們動態生成類的一個對象，讓調用 transact 函數的時候調用到咱們 InvocationHandler 對象的 invoke 方法中，而後把參數取出來，打印第二個參數的大小便可：

public HookHandler(IBinder base, Class<?> stubClass,
                   InvocationHandler InvocationHandler) {
    mInvocationHandler = InvocationHandler;

    try {
        Method asInterface = stubClass.getDeclaredMethod("asInterface", IBinder.class);
        this.mBase = asInterface.invoke(null, base);

        Class clazz = mBase.getClass();
        Field mRemote = clazz.getDeclaredField("mRemote");
        mRemote.setAccessible(true);
        //新建一個 BinderProxy 的代理對象
        Object binderProxy = Proxy.newProxyInstance(mBase.getClass().getClassLoader(),
                new Class[] {IBinder.class}, new ClipboardHook.TransactionWatcherHook((IBinder) mRemote.get(mBase)));
        mRemote.set(mBase, binderProxy);

    } catch (Exception e) {
        e.printStackTrace();
    }
}
.......
//用來監控 TransactionTooLargeException 錯誤
public static class TransactionWatcherHook implements InvocationHandler {

    IBinder binder;
    public TransactionWatcherHook(IBinder binderProxy) {
        binder = binderProxy;
    }

    @Override
    public Object invoke(Object o, Method method, Object[] objects) throws Throwable {
        if (objects.length >= 2 && objects[1] instanceof Parcel) {
            //第二個參數對應爲 Parcel 對象
            Log.e(TAG, "clipboard service invoked, transact's parameter size is " + ((Parcel)objects[1]).dataSize() + " byte");
        }
        return method.invoke(binder, objects);
    }
}複製代碼

這裏只貼出來了關鍵代碼，其餘代碼能夠去看看Android 動態代理以及利用動態代理實現 ServiceHook，這樣就成功獲取到了參數的大小，單位爲 B ，咱們來看看實際效果：

03-06 17:19:37.031 459-459/com.example.servicehook E/ClipboardHook: clipboardhookhandler invoke
03-06 17:19:37.032 459-459/com.example.servicehook E/ClipboardHook: clipboard service invoked, transact's parameter size is 312 B複製代碼

增長一個字符以後：

03-06 17:19:40.056 459-459/com.example.servicehook E/ClipboardHook: clipboardhookhandler invoke
03-06 17:19:40.057 459-459/com.example.servicehook E/ClipboardHook: clipboard service invoked, transact's parameter size is 316 B複製代碼

增長了 4B，也就是一個字，因此這個大小的單位爲 B，這裏簡單計算一下 1Mb 能夠複製多少字符 1024*1024/32 = 32768，感興趣的能夠複製一下這麼多字符，看看是否是會崩潰，哈哈哈哈。
　　固然這只是監控 ClipboardService 的每次 AIDL 調用，PMS，WMS 的監控和這裏相似，步驟是同樣的，這裏就不一一介紹了。
　　轉載請註明出處：blog.csdn.net/self_study/…
　　源碼：github.com/zhaozepeng/…