Android硬件加速(二)-RenderThread與OpenGL GPU渲染

Android4.0以後,系統默認開啓硬件加速來渲染視圖,以前,理解Android硬件加速的小白文簡單的講述了硬件加速的簡單模型,不過主要針對前半階段,並沒怎麼說是如何使用OpenGL、GPU處理數據的,OpenGL主要處理的任務有Surface的composition及圖形圖像的渲染,本篇文章簡單說一下後半部分的模型,這部分對於理解View渲染也有很多幫助,也能更好的幫助理解GPU渲染玄學曲線。node

不過這裏有個概念要先弄清,OpenGL僅僅是提供標準的API及調用規則,在不一樣的硬件平臺上有不一樣的實現,好比驅動等,這部分代碼通常是不開源,本文主要基於Android libagl(6.0),它是Android中經過軟件方法實現的一套OpenGL動態庫,並結合Systrace真機上的調用棧,對比二者區別(GPU廠商提供的硬件實現的OpenGL),猜想libhgl(硬件OpenGL)的實現。對於Android APP而言,基於GPU的硬件加速繪製能夠分爲以下幾個階段:android

  • 第一階段:APP在UI線程構建OpenGL渲染須要的命令及數據
  • 第二階段:CPU將數據上傳(共享或者拷貝)給GPU,PC上通常有顯存一說,可是ARM這種嵌入式設備內存通常是GPU、CPU共享內存
  • 第三階段:通知GPU渲染,通常而言,真機不會阻塞等待GPU渲染結束,效率低,CPU通知結束後就返回繼續執行其餘任務,固然,理論上也能夠阻塞執行,glFinish就能知足這樣的需求(不一樣GPU廠商實現不一樣,Android源碼自帶的是軟件實現的,只具備參考意義)(Fence機制輔助GPU CPU同步)
  • 第四階段:swapBuffers,並通知SurfaceFlinger圖層合成
  • 第五階段:SurfaceFlinger開始合成圖層,若是以前提交的GPU渲染任務沒結束,則等待GPU渲染完成,再合成(Fence機制),合成依然是依賴GPU,不過這就是下一個任務了

第一個階段,其實主要作的就是構建DrawOp樹(裏面封裝OpenGL渲染命令),同時,預處理分組一些類似命令,以便提升GPU處理效率,這個階段主要是CPU在工做,不過這個階段前期運行在UI線程,後期部分運行在RenderThread(渲染線程),第二個階段主要運行在渲染線程,CPU將數據同步(共享)給GPU,以後,通知GPU進行渲染,不過這裏須要注意的是,CPU通常不會阻塞等待GPU渲染完畢,而是通知結束後就返回,除非GPU很是繁忙,來不及響應CPU的請求,沒有給CPU發送通知,CPU纔會阻塞等待。CPU返回後,會直接將GraphicBuffer提交給SurfaceFlinger,告訴SurfaceFlinger進行合成,可是這個時候GPU可能並未完成圖像的渲染,這個時候就牽扯到一個同步,Android中,這裏用的是Fence機制,SurfaceFlinger合成前會查詢這個Fence,若是GPU渲染沒有結束,則等待GPU渲染結束,GPU結束後,會通知SurfaceFlinger進行合成,SF合成後,提交顯示,如此完成圖像的渲染顯示,簡單畫下示意圖:算法

Android CPU GPU通訊模型

以前已經簡單分析過DrawOp樹的構建,優化,本文主要是分析GPU如何完成OpenGL渲染,這個過程主要在Render線程,經過OpenGL API通知GPU處理渲染任務。canvas

Android OpenGL環境的初始化

通常在使用OpenGL的時候,首先須要獲取OpenGL相應的配置,再爲其構建渲染環境,好比必須建立OpenGL上下文(Context),上下文能夠看作是OpenGL的化身,沒有上下文就沒有OpenGL環境,同時還要構建一個用於繪圖的畫布GlSurface,在Android中抽象出來就是EglContext與EglSurface,示例以下:api

private void initGL() {
    
        mEgl = (EGL10) EGLContext.getEGL();
        <!--獲取display顯示目標-->
        mEglDisplay = mEgl.eglGetDisplay(EGL10.EGL_DEFAULT_DISPLAY);
         <!--構建配置-->
        mEglConfig = chooseEglConfig();
        ...<!--構建上下文-->
        mEglContext = createContext(mEgl, mEglDisplay, mEglConfig);
    	  ...<!--構建繪圖Surface-->
        mEglSurface = mEgl.eglCreateWindowSurface(mEglDisplay, mEglConfig, mSurface, null);
        }
複製代碼

Android系統中,APP端如何爲每一個窗口配置OpenGL環境的,在一個窗口被添加到窗口的時候會調用其ViewRootImpl對象的setView:緩存

public void setView(View view, WindowManager.LayoutParams attrs, View panelParentView) {
    synchronized (this) {
        		...
                enableHardwareAcceleration(attrs);
            }
複製代碼

setView會調用enableHardwareAcceleration,配置OpenGL的硬件加速環境:app

private void enableHardwareAcceleration(WindowManager.LayoutParams attrs) {
        mAttachInfo.mHardwareAccelerated = false;
        mAttachInfo.mHardwareAccelerationRequested = false;
			...
        final boolean hardwareAccelerated =
                (attrs.flags & WindowManager.LayoutParams.FLAG_HARDWARE_ACCELERATED) != 0;

        if (hardwareAccelerated) {
        <!--能夠開啓硬件加速 ,通常都是true-->
            if (!HardwareRenderer.isAvailable()) {
                return;
            }
 					...
                <!--建立硬件加速環境-->
                mAttachInfo.mHardwareRenderer = HardwareRenderer.create(mContext, translucent);
                if (mAttachInfo.mHardwareRenderer != null) {
                    mAttachInfo.mHardwareRenderer.setName(attrs.getTitle().toString());
                    mAttachInfo.mHardwareAccelerated =
                            mAttachInfo.mHardwareAccelerationRequested = true;
                }
            }
        }
    }
複製代碼

Android中每一個顯示的Window(Activity、Dialog、PopupWindow等)都對應一個ViewRootImpl對象,也會對應一個AttachInfo對象,以後經過異步

HardwareRenderer.create(mContext, translucent);
複製代碼

建立的HardwareRenderer對象就被保存在ViewRootImpl的AttachInfo中,跟Window是一對一的關係,經過HardwareRenderer.create(mContext, translucent)建立硬件加速環境後,在須要draw繪製的時候,經過:ide

mAttachInfo.mHardwareRenderer.draw(mView, mAttachInfo, this);
複製代碼

進一步渲染。回過頭,接着看APP如何初始化硬件加速環境:直觀上說,主要是構建OpenGLContext、EglSurface、RenderThread(若是沒啓動的話)函數

static HardwareRenderer create(Context context, boolean translucent) {
    HardwareRenderer renderer = null;
    if (DisplayListCanvas.isAvailable()) {
        renderer = new ThreadedRenderer(context, translucent);
    }
    return renderer;
}
    
ThreadedRenderer(Context context, boolean translucent) {
    final TypedArray a = context.obtainStyledAttributes(null, R.styleable.Lighting, 0, 0);
    ...
	<!--建立rootnode-->
    long rootNodePtr = nCreateRootRenderNode();
    mRootNode = RenderNode.adopt(rootNodePtr);
   <!--建立native ThreadProxy-->
    mNativeProxy = nCreateProxy(translucent, rootNodePtr);
	<!--初始化AssetAtlas,本文不分析-->
    ProcessInitializer.sInstance.init(context, mNativeProxy);
    ...
}
複製代碼

以前分析過,nCreateRootRenderNode 爲ViewRootimpl建立一個root RenderNode,UI線程經過遞歸mRootNode,能夠構建ViewTree全部的OpenGL繪製命令及數據,nCreateProxy會爲當前widow建立一個ThreadProxy ,ThreadProxy則主要用來向RenderThread線程提交一些OpenGL相關任務,好比初始化,繪製、更新等:

class ANDROID_API RenderProxy {
public:
    ANDROID_API RenderProxy(bool translucent, RenderNode* rootNode, IContextFactory* contextFactory);
    ANDROID_API virtual ~RenderProxy();
	...
    ANDROID_API bool initialize(const sp<ANativeWindow>& window);
    ...
    ANDROID_API int syncAndDrawFrame();
    ...
    ANDROID_API DeferredLayerUpdater* createTextureLayer();
    ANDROID_API void buildLayer(RenderNode* node);
    ANDROID_API bool copyLayerInto(DeferredLayerUpdater* layer, SkBitmap& bitmap);
    ...
    ANDROID_API void fence();
    ...
    void destroyContext();

    void post(RenderTask* task);
    void* postAndWait(MethodInvokeRenderTask* task);
	...
};
複製代碼

RenderProxy的在建立之初會作什麼?其實主要兩件事,第一:若是RenderThread未啓動,則啓動它,第二:向RenderThread提交第一個Task--爲當前窗口建立CanvasContext,CanvasContext有點EglContext的意味,全部的繪製命令都會經過CanvasContext進行中轉:

RenderProxy::RenderProxy(bool translucent, RenderNode* rootRenderNode, IContextFactory* contextFactory)
        : mRenderThread(RenderThread::getInstance())
        , mContext(nullptr) {
     <!--建立CanvasContext-->
    SETUP_TASK(createContext);
    args->translucent = translucent;
    args->rootRenderNode = rootRenderNode;
    args->thread = &mRenderThread;
    args->contextFactory = contextFactory;
    mContext = (CanvasContext*) postAndWait(task);
    <!--初始化DrawFrameTask-->
    mDrawFrameTask.setContext(&mRenderThread, mContext);
}
複製代碼

從其構造函數中能夠看出,OpenGL Render線程是一個單例,同一個進程只有一個RenderThread,RenderProxy 經過mRenderThread引用該單例,未來須要提交任務的時候,直接經過該引用向RenderThread的Queue中插入消息,而RenderThread主要負責從Queue取出消息,並執行,好比將OpenGL命令issue提交給GPU,並通知GPU渲染。在Android Profile的CPU工具中能夠清楚的看到該線程的存在(沒有顯示任務的進程是沒有的:

renderThread

簡單看下RenderThread()這個單例線程的建立與啓動,

RenderThread::RenderThread() : Thread(true), Singleton<RenderThread>()
        , mNextWakeup(LLONG_MAX)
        , mDisplayEventReceiver(nullptr)
        , mVsyncRequested(false)
        , mFrameCallbackTaskPending(false)
        , mFrameCallbackTask(nullptr)
        , mRenderState(nullptr)
        , mEglManager(nullptr) {
    Properties::load();
    mFrameCallbackTask = new DispatchFrameCallbacks(this);
    mLooper = new Looper(false);
    run("RenderThread");
}
複製代碼

RenderThread會維護一個MessageQuene,並經過loop的方式讀取消息,執行,RenderThread在啓動以前,會爲OpenGL建立EglManager、RenderState、VSync信號接收器(這個主要爲了動畫)等OpenGL渲染須要工具組件,以後啓動該線程進入loop:

bool RenderThread::threadLoop() {
	
	<!--初始化-->
    setpriority(PRIO_PROCESS, 0, PRIORITY_DISPLAY);
    initThreadLocals();

    int timeoutMillis = -1;
    for (;;) {
    <!--等待消息隊列不爲空-->
        int result = mLooper->pollOnce(timeoutMillis);
        nsecs_t nextWakeup;
        // Process our queue, if we have anything
        <!--獲取消息並執行-->
        while (RenderTask* task = nextTask(&nextWakeup)) {
            task->run();
        }
        ...	
    return false;}
複製代碼

初始化,主要是建立EglContext中必須的一些組件,到這裏其實都是工具的建立,基本上還沒構建OpenGL須要的任何實質性的東西

void RenderThread::initThreadLocals() {
    sp<IBinder> dtoken(SurfaceComposerClient::getBuiltInDisplay(
            ISurfaceComposer::eDisplayIdMain));
    status_t status = SurfaceComposerClient::getDisplayInfo(dtoken, &mDisplayInfo);
    nsecs_t frameIntervalNanos = static_cast<nsecs_t>(1000000000 / mDisplayInfo.fps);
    mTimeLord.setFrameInterval(frameIntervalNanos);
    <!--初始化vsync接收器-->
    initializeDisplayEventReceiver();
    <!--管家-->
    mEglManager = new EglManager(*this);
    <!--狀態機-->
    mRenderState = new RenderState(*this);
    <!--debug分析工具-->
    mJankTracker = new JankTracker(frameIntervalNanos);
}
複製代碼

Android5.0以後,有些動畫是能夠徹底在RenderThread完成的,這個時候render渲染線程須要接受Vsync,等信號到來後,回調RenderThread::displayEventReceiverCallback,計算當前動畫狀態,最後調用doFrame繪製當前動畫幀(不詳述),有時間能夠看下Vsync機制

void RenderThread::initializeDisplayEventReceiver() {
    mDisplayEventReceiver = new DisplayEventReceiver();
    status_t status = mDisplayEventReceiver->initCheck();
    mLooper->addFd(mDisplayEventReceiver->getFd(), 0,
            Looper::EVENT_INPUT, RenderThread::displayEventReceiverCallback, this);
}
複製代碼

其次RenderThread須要new一個EglManager及RenderState,二者跟上面的DisplayEventReceiver都從屬RenderThread,所以在一個進程中,也是單例的

EglManager::EglManager(RenderThread& thread)
        : mRenderThread(thread)
        , mEglDisplay(EGL_NO_DISPLAY)
        , mEglConfig(nullptr)
        , mEglContext(EGL_NO_CONTEXT)
        , mPBufferSurface(EGL_NO_SURFACE)
        , mAllowPreserveBuffer(load_dirty_regions_property())
        , mCurrentSurface(EGL_NO_SURFACE)
        , mAtlasMap(nullptr)
        , mAtlasMapSize(0) {
    mCanSetPreserveBuffer = mAllowPreserveBuffer;
}
複製代碼

EglManager主要做用是管理OpenGL上下文,好比建立EglSurface、指定當前操做的Surface、swapBuffers等,主要負責場景及節點的管理工做:

class EglManager {
public:
    // Returns true on success, false on failure
    void initialize();
    EGLSurface createSurface(EGLNativeWindowType window);
    void destroySurface(EGLSurface surface);

    bool isCurrent(EGLSurface surface) { return mCurrentSurface == surface; }
    // Returns true if the current surface changed, false if it was already current
    bool makeCurrent(EGLSurface surface, EGLint* errOut = nullptr);
    void beginFrame(EGLSurface surface, EGLint* width, EGLint* height);
    bool swapBuffers(EGLSurface surface, const SkRect& dirty, EGLint width, EGLint height);

    // Returns true iff the surface is now preserving buffers.
    bool setPreserveBuffer(EGLSurface surface, bool preserve);
    void setTextureAtlas(const sp<GraphicBuffer>& buffer, int64_t* map, size_t mapSize);
    void fence();

private:
    friend class RenderThread;

    EglManager(RenderThread& thread);
    // EglContext is never destroyed, method is purposely not implemented
    ~EglManager();
    void createPBufferSurface();
    void loadConfig();
    void createContext();
    void initAtlas();
    RenderThread& mRenderThread;
    EGLDisplay mEglDisplay;
    EGLConfig mEglConfig;
    EGLContext mEglContext;
    EGLSurface mPBufferSurface;
    ,,
};
複製代碼

而RenderState能夠看作是OpenGL狀態機的具體呈現,真正負責OpenGL的渲染狀態的維護及渲染命令的issue

RenderState::RenderState(renderthread::RenderThread& thread)
        : mRenderThread(thread)
        , mViewportWidth(0)
        , mViewportHeight(0)
        , mFramebuffer(0) {
    mThreadId = pthread_self();
}
複製代碼

在RenderProxy建立之初,插入到的第一條消息就是SETUP_TASK(createContext),構建CanvasContext ,它能夠看作OpenGL的Context及Surface的封裝,

CREATE_BRIDGE4(createContext, RenderThread* thread, bool translucent,
        RenderNode* rootRenderNode, IContextFactory* contextFactory) {
    return new CanvasContext(*args->thread, args->translucent,
            args->rootRenderNode, args->contextFactory);
}
複製代碼

能夠看到,CanvasContext同時握有RenderThread、EglManager、RootRenderNode等,它能夠看作Android中OpenGL上下文,是上層渲染API的入口

CanvasContext::CanvasContext(RenderThread& thread, bool translucent,
        RenderNode* rootRenderNode, IContextFactory* contextFactory)
        : mRenderThread(thread)
        , mEglManager(thread.eglManager())
        , mOpaque(!translucent)
        , mAnimationContext(contextFactory->createAnimationContext(mRenderThread.timeLord()))
        , mRootRenderNode(rootRenderNode)
        , mJankTracker(thread.timeLord().frameIntervalNanos())
        , mProfiler(mFrames) {
    mRenderThread.renderState().registerCanvasContext(this);
    mProfiler.setDensity(mRenderThread.mainDisplayInfo().density);
}
複製代碼

其實到這裏初始化完成了通常,另外一半是在draw的時候,進行的也就是ThreadRender的initialize,畢竟,若是不須要繪製,是不須要初始化OpenGL環境的,省的浪費資源:

private void performTraversals() {
   ...
      if (mAttachInfo.mHardwareRenderer != null) {
                        try {
                            hwInitialized = mAttachInfo.mHardwareRenderer.initialize(mSurface);
複製代碼

這裏的mSurface實際上是已經被WMS填充處理過的一個Surface,它在native層對應一個ANativeWindow(其實就是個native的Surface),隨着RenderProxy的initial的初始化,EglContext跟EglSurface會被進一步建立,須要注意的是這裏的initialize任務是在Render線程,OpenGL的相關操做都必須在Render線程:

CREATE_BRIDGE2(initialize, CanvasContext* context, ANativeWindow* window) {
    return (void*) args->context->initialize(args->window);
}

bool RenderProxy::initialize(const sp<ANativeWindow>& window) {
    SETUP_TASK(initialize);
    args->context = mContext;
    args->window = window.get();
    return (bool) postAndWait(task);
}

bool CanvasContext::initialize(ANativeWindow* window) {
    setSurface(window);
    if (mCanvas) return false;
    mCanvas = new OpenGLRenderer(mRenderThread.renderState());
    mCanvas->initProperties();
    return true;
}
複製代碼

這裏傳入的ANativeWindow* window其實就是native的Surface,CanvasContext在初始化的時候,會經過setSurface爲OpenGL建立E關聯Con小text、EglSurface畫布,同時會爲當前窗口建立一個OpenGLRenderer,OpenGLRenderer主要用來處理以前構建的DrawOp,輸出對應的OpenGL命令。

void CanvasContext::setSurface(ANativeWindow* window) {
    mNativeWindow = window;
    <!--建立EglSurface畫布-->
    if (window) {
        mEglSurface = mEglManager.createSurface(window);
    }
    if (mEglSurface != EGL_NO_SURFACE) {
        const bool preserveBuffer = (mSwapBehavior != kSwap_discardBuffer);
        mBufferPreserved = mEglManager.setPreserveBuffer(mEglSurface, preserveBuffer);
        mHaveNewSurface = true;
        <!--綁定上下文-->
        makeCurrent();
    }}

EGLSurface EglManager::createSurface(EGLNativeWindowType window) {
	<!--構建EglContext-->
    initialize();
    <!--建立EglSurface-->
    EGLSurface surface = eglCreateWindowSurface(mEglDisplay, mEglConfig, window, nullptr);
 	    return surface;
}

void EglManager::initialize() {
    if (hasEglContext()) return;
    
    mEglDisplay = eglGetDisplay(EGL_DEFAULT_DISPLAY);
    loadConfig();
    createContext();
    createPBufferSurface();
    makeCurrent(mPBufferSurface);
    mRenderThread.renderState().onGLContextCreated();
    initAtlas();
}

void EglManager::createContext() {
    EGLint attribs[] = { EGL_CONTEXT_CLIENT_VERSION, GLES_VERSION, EGL_NONE };
    mEglContext = eglCreateContext(mEglDisplay, mEglConfig, EGL_NO_CONTEXT, attribs);
    LOG_ALWAYS_FATAL_IF(mEglContext == EGL_NO_CONTEXT,
        "Failed to create context, error = %s", egl_error_str());
}
複製代碼

EglManager::initialize()以後EglContext、Config全都有了,以後經過eglCreateWindowSurface建立EglSurface,這裏先調用eglApi.cpp 的eglCreateWindowSurface

EGLSurface eglCreateWindowSurface(  EGLDisplay dpy, EGLConfig config,
                                    NativeWindowType window,
                                    const EGLint *attrib_list) {
        <!--配置-->
        int result = native_window_api_connect(window, NATIVE_WINDOW_API_EGL);
        <!--Android源碼中,實際上是調用egl.cpp的eglCreateWindowSurface,不過這一塊軟件模擬的跟真實硬件的應該差異很少-->	
        // Eglsurface裏面是有Surface的引用的,同時swap的時候,是能通知consumer的
        EGLSurface surface = cnx->egl.eglCreateWindowSurface(
                iDpy, config, window, attrib_list);
        ...	}
複製代碼

egl.cpp實際上是軟件模擬的GPU實現庫,不過這裏的eglCreateWindowSurface邏輯其實跟真實GPU平臺的代碼差異不大,由於只是抽象邏輯:

static EGLSurface createWindowSurface(EGLDisplay dpy, EGLConfig config,
        NativeWindowType window, const EGLint* /*attrib_list*/)
{
   ...
    egl_surface_t* surface;
    <!--其實返回的就是egl_window_surface_v2_t-->
    surface = new egl_window_surface_v2_t(dpy, config, depthFormat,
            static_cast<ANativeWindow*>(window));
..	    return surface;
}
複製代碼

從上面代碼能夠看出,其實就是new了一個egl_window_surface_v2_t,它內部封裝了一個ANativeWindow,因爲EGLSurface是一個Void* 類型指針,所以egl_window_surface_v2_t型指針能夠直接賦值給它,到這裏初始化環境結束,OpenGL須要的渲染環境已經搭建完畢,等到View須要顯示或者更新的時候,就會接着調用VieWrootImpl的draw去更新,注意這裏,一個Render線程,默認一個EglContext,可是能夠有多個EglSurface,用eglMakeCurrent切換綁定便可。也就是一個Window對應一個ViewRootImpl->一個AttachInfo->ThreadRender對象->ThreadProxy(RootRenderNode)->CanvasContext.cpp(DrawFrameTask、EglManager(單例複用)、EglSurface)->->RenderThread(單例複用),對於APP而言,通常只會維持一個OpenGL渲染線程,固然,你也能夠本身new一個獨立的渲染線程,主動調用OpenGL API。簡答類圖以下

image.png

上面工做結束後,OpenGL渲染環境就已經準備好,或者說RenderThread這個渲染線程已經配置好了渲染環境,接下來,UI線程像渲染線程發送渲染任務就好了。

Android OpenGL GPU 渲染

以前分析理解Android硬件加速的小白文的時候,已經分析過,ViewRootImpl的draw是入口,會調用HardwareRender的draw,先構建DrawOp樹,而後合併優化DrawOp,以後issue OpenGL命令到GPU,其中構建DrawOp的任務在UI線程,後面的任務都在Render線程

@Override
void draw(View view, AttachInfo attachInfo, HardwareDrawCallbacks callbacks) {
   <!--構建DrawOp Tree UI線程-->        
   updateRootDisplayList(view, callbacks);
   <!--渲染 提交任務到render線程-->
    int syncResult = nSyncAndDrawFrame(mNativeProxy, frameInfo, frameInfo.length);
    ...
}
複製代碼

如上面代碼所說updateRootDisplayList構建DrawOp樹在UI線程,nSyncAndDrawFrame提交渲染任務到渲染線程,以前已經分析過構建流程,nSyncAndDrawFrame也簡單分析了一些合併等操做,下面接着以前流程分析如何將OpenGL命令issue到GPU,這裏有個同步問題,可能牽扯到UI線程的阻塞,先分析下同步

SyncAndDrawFrame 同步

static int android_view_ThreadedRenderer_syncAndDrawFrame(JNIEnv* env, jobject clazz,
        jlong proxyPtr, jlongArray frameInfo, jint frameInfoSize) {
    RenderProxy* proxy = reinterpret_cast<RenderProxy*>(proxyPtr);
    env->GetLongArrayRegion(frameInfo, 0, frameInfoSize, proxy->frameInfo());
    return proxy->syncAndDrawFrame();
}

int DrawFrameTask::drawFrame() {
    mSyncResult = kSync_OK;
    mSyncQueued = systemTime(CLOCK_MONOTONIC);
    postAndWait();
    return mSyncResult;
}

void DrawFrameTask::postAndWait() {
    AutoMutex _lock(mLock);
    mRenderThread->queue(this);
    <!--阻塞等待,同步資源-->
    mSignal.wait(mLock);
}

void DrawFrameTask::run() {
    bool canUnblockUiThread;
    bool canDrawThisFrame;
    {
        TreeInfo info(TreeInfo::MODE_FULL, mRenderThread->renderState());
        <!--同步操做,其實就是同步Java跟native中的構建DrawOp Tree、圖層、圖像資源-->
        canUnblockUiThread = syncFrameState(info);
        canDrawThisFrame = info.out.canDrawThisFrame;
    }
    // Grab a copy of everything we need
    CanvasContext* context = mContext;
    <!--若是同步完成,則能夠返回-->
    if (canUnblockUiThread) {
        unblockUiThread();
    }
	<!--繪製,提交OpenGL命令道GPU-->
    if (CC_LIKELY(canDrawThisFrame)) {
        context->draw();
    }
   <!--看看是否以前由於同步問題阻塞了UI線程,若是阻塞了,須要喚醒-->
    if (!canUnblockUiThread) {
        unblockUiThread();
    }
}
複製代碼

其實就是調用RenderProxy的syncAndDrawFrame,將DrawFrameTask插入RenderThread,而且阻塞等待RenderThread跟UI線程同步,若是同步成功,則UI線程喚醒,不然UI線程阻塞等待直到Render線程完成OpenGL命令的issue完畢。同步結束後,以後RenderThread纔會開始處理GPU渲染相關工做,先看下同步:

bool DrawFrameTask::syncFrameState(TreeInfo& info) {
    int64_t vsync = mFrameInfo[static_cast<int>(FrameInfoIndex::Vsync)];
    mRenderThread->timeLord().vsyncReceived(vsync);
    mContext->makeCurrent();
    Caches::getInstance().textureCache.resetMarkInUse(mContext);
	
	<!--關鍵點1,TextureView類處理,主要牽扯紋理-->
    for (size_t i = 0; i < mLayers.size(); i++) {
        // 更新Layer 這裏牽扯到圖層數據的處理,可能還有拷貝,
        mContext->processLayerUpdate(mLayers[i].get());
    }
    mLayers.clear();
    <!--關鍵點2 同步DrawOp Tree -->
    mContext->prepareTree(info, mFrameInfo, mSyncQueued);
	 ...
    // If prepareTextures is false, we ran out of texture cache space
    return info.prepareTextures;
}
複製代碼

當Window中的TextureView(目前只考慮系統API,好像就這麼一個View,自定義除外)有更新時,須要從TextureView的SurfaceTexture中讀取圖形緩衝區,而且封裝綁定成Open GL紋理,供GPU繪製使用,這裏不詳述,未來有機會分析TexutureView的時候再分析。第二步,是將UI線程中構建的DrawOpTree等信息同步到Render Thread中,由於以前經過ViewRootImpl再Java層調用構建的DisplayListData還沒被真正賦值到RenderNode的mDisplayListData(最終用到的對象),只是被setStagingDisplayList暫存,由於中間可能有那種屢次meausre、layout的,還有可能發生改變,暫存邏輯以下:

static void android_view_RenderNode_setDisplayListData(JNIEnv* env,
        jobject clazz, jlong renderNodePtr, jlong newDataPtr) {
    RenderNode* renderNode = reinterpret_cast<RenderNode*>(renderNodePtr);
    DisplayListData* newData = reinterpret_cast<DisplayListData*>(newDataPtr);
    renderNode->setStagingDisplayList(newData);
}

void RenderNode::setStagingDisplayList(DisplayListData* data) {
    mNeedsDisplayListDataSync = true;
    delete mStagingDisplayListData;
    mStagingDisplayListData = data;
}
複製代碼

View的DrawOpTree同步

void CanvasContext::prepareTree(TreeInfo& info, int64_t* uiFrameInfo, int64_t syncQueued) {
    mRenderThread.removeFrameCallback(this);

    if (!wasSkipped(mCurrentFrameInfo)) {
        mCurrentFrameInfo = &mFrames.next();
    }

	<!--同步Java層測繪信息到native,OpenGL玄學曲線的來源-->
    mCurrentFrameInfo->importUiThreadInfo(uiFrameInfo);
    mCurrentFrameInfo->set(FrameInfoIndex::SyncQueued) = syncQueued;
    <!--一個計時節點-->
    mCurrentFrameInfo->markSyncStart();
	    info.damageAccumulator = &mDamageAccumulator;
    info.renderer = mCanvas;
    info.canvasContext = this;

    mAnimationContext->startFrame(info.mode);
    // mRootRenderNode遞歸遍歷全部節點
    mRootRenderNode->prepareTree(info);
  ...
複製代碼

經過遞歸遍歷,mRootRenderNode能夠檢查全部的節點,

void RenderNode::prepareTree(TreeInfo& info) {
    bool functorsNeedLayer = Properties::debugOverdraw;
    prepareTreeImpl(info, functorsNeedLayer);
}

void RenderNode::prepareTreeImpl(TreeInfo& info, bool functorsNeedLayer) {
    info.damageAccumulator->pushTransform(this);

    if (info.mode == TreeInfo::MODE_FULL) {
        // 同步屬性 
        pushStagingPropertiesChanges(info);
    }
     
    // layer
    prepareLayer(info, animatorDirtyMask);
    <!--同步DrawOpTree-->
    if (info.mode == TreeInfo::MODE_FULL) {
        pushStagingDisplayListChanges(info);
    }
    <!--遞歸處理子View-->
    prepareSubTree(info, childFunctorsNeedLayer, mDisplayListData);
    // push
    pushLayerUpdate(info);
    info.damageAccumulator->popTransform();
}
複製代碼

到這裏同步的時候,基本就是最終結果,只要把mStagingDisplayListData賦值到mDisplayListData便可,

void RenderNode::pushStagingDisplayListChanges(TreeInfo& info) {
    if (mNeedsDisplayListDataSync) {
        mNeedsDisplayListDataSync = false;
        ...
        mDisplayListData = mStagingDisplayListData;
        mStagingDisplayListData = nullptr;
        if (mDisplayListData) {
            for (size_t i = 0; i < mDisplayListData->functors.size(); i++) {
                (*mDisplayListData->functors[i])(DrawGlInfo::kModeSync, nullptr);
            }
        }
        damageSelf(info);
    }
}
複製代碼

以後經過遞歸遍歷子View,便可以完成完成全部View的RenderNode的同步。

void RenderNode::prepareSubTree(TreeInfo& info, bool functorsNeedLayer, DisplayListData* subtree) {
    if (subtree) {
        TextureCache& cache = Caches::getInstance().textureCache;
        info.out.hasFunctors |= subtree->functors.size();
        <!--吧RenderNode用到的bitmap封裝成紋理-->
        for (size_t i = 0; info.prepareTextures && i < subtree->bitmapResources.size(); i++) {
            info.prepareTextures = cache.prefetchAndMarkInUse(
                    info.canvasContext, subtree->bitmapResources[i]);
        }
        <!--遞歸子View-->
        for (size_t i = 0; i < subtree->children().size(); i++) {
            ...
            childNode->prepareTreeImpl(info, childFunctorsNeedLayer);
            info.damageAccumulator->popTransform();
        }
    }
}
複製代碼

當DrawFrameTask::syncFrameState返回值(實際上是TreeInfo的prepareTextures,這裏主要是針對Bitmap的處理)爲true時,表示同步完成,能夠馬上喚醒UI線程,可是若是返回false,則就意UI中的數據沒徹底傳輸給GPU,這個狀況下UI線程須要等待, 源碼中有句註釋 If prepareTextures is false, we ran out of texture cache space,其實就是說一個應用程序進程能夠建立的Open GL紋理是有大小限制的,若是超出這個限制,紋理就會同步失敗,看6.0代碼,這個限制有Bitmap自身大小的限制,還有總體可用內存的限制,看代碼中的限制

Texture* TextureCache::getCachedTexture(const SkBitmap* bitmap, AtlasUsageType atlasUsageType) {
    if (CC_LIKELY(mAssetAtlas != nullptr) && atlasUsageType == AtlasUsageType::Use) {
        AssetAtlas::Entry* entry = mAssetAtlas->getEntry(bitmap);
        if (CC_UNLIKELY(entry)) {
            return entry->texture;
        }
    }

    Texture* texture = mCache.get(bitmap->pixelRef()->getStableID());

    // 沒找到狀況下
    if (!texture) {
        // 判斷單個限制
        if (!canMakeTextureFromBitmap(bitmap)) {
            return nullptr;
        }

        const uint32_t size = bitmap->rowBytes() * bitmap->height();
        //
        bool canCache = size < mMaxSize;
        // Don't even try to cache a bitmap that's bigger than the cache
        // 剔除Lru算法中老的,再也不用的,若是可以挪出空間,那就算成功,不然失敗
        while (canCache && mSize + size > mMaxSize) {
            Texture* oldest = mCache.peekOldestValue();
            if (oldest && !oldest->isInUse) {
                mCache.removeOldest();
            } else {
                canCache = false;
            }
        }
        // 若是能緩存,就新建一個Texture
        if (canCache) {
            texture = new Texture(Caches::getInstance());
            texture->bitmapSize = size;
            generateTexture(bitmap, texture, false);

            mSize += size;
            TEXTURE_LOGD("TextureCache::get: create texture(%p): name, size, mSize = %d, %d, %d",
                     bitmap, texture->id, size, mSize);
            if (mDebugEnabled) {
                ALOGD("Texture created, size = %d", size);
            }
            mCache.put(bitmap->pixelRef()->getStableID(), texture);
        }
    } else if (!texture->isInUse && bitmap->getGenerationID() != texture->generation) {
        // Texture was in the cache but is dirty, re-upload
        // TODO: Re-adjust the cache size if the bitmap's dimensions have changed
        generateTexture(bitmap, texture, true);
    }

    return texture;
}
複製代碼

先看單個Bitmap限制:

bool TextureCache::canMakeTextureFromBitmap(const SkBitmap* bitmap) {
if (bitmap->width() > mMaxTextureSize || bitmap->height() > mMaxTextureSize) {
    ALOGW("Bitmap too large to be uploaded into a texture (%dx%d, max=%dx%d)",
            bitmap->width(), bitmap->height(), mMaxTextureSize, mMaxTextureSize);
    return false;
}
return true;
}
複製代碼

單個Bitmap大小限制基本上定義:

#define GL_MAX_TEXTURE_SIZE               0x0D33
複製代碼

若是bitmap的寬高超過這個值,可能就會同步失敗,再看第二個緣由:超過可以Cache紋理總和上限:

#define DEFAULT_TEXTURE_CACHE_SIZE 24.0f 這裏是24M
複製代碼

若是空間足夠,則直接新建一個Texture,若是不夠,則根據Lru算法 ,剔除老的再也不使用的Textrue,剔除後的空間若是夠,則新建Texture,不然按失敗處理,這裏雖說得是GPU Cache,其實仍是在同一個內存中,歸CPU管理的不過因爲對GPU不是太瞭解,不知道這個數值是否是跟GPU有關係,紋理在須要新建的前提下:

void TextureCache::generateTexture(const SkBitmap* bitmap, Texture* texture, bool regenerate) {
    SkAutoLockPixels alp(*bitmap);
    <!--glGenTextures新建紋理-->
    if (!regenerate) {
        glGenTextures(1, &texture->id);
    }

    texture->generation = bitmap->getGenerationID();
    texture->width = bitmap->width();
    texture->height = bitmap->height();
    <!--綁定紋理-->
    Caches::getInstance().textureState().bindTexture(texture->id);

    switch (bitmap->colorType()) {
    ...
    case kN32_SkColorType:
     // 32位 RGBA 或者BGREA resize第一次都是true,由於一開始寬高確定不一致
        uploadToTexture(resize, GL_RGBA, bitmap->rowBytesAsPixels(), bitmap->bytesPerPixel(),
                texture->width, texture->height, GL_UNSIGNED_BYTE, bitmap->getPixels());
    ...
}
複製代碼

上面代碼主要是新建紋理,而後爲紋理綁定紋理圖片資源,綁定資源代碼以下:

void TextureCache::uploadToTexture(bool resize, GLenum format, GLsizei stride, GLsizei bpp,
        GLsizei width, GLsizei height, GLenum type, const GLvoid * data) {
    glPixelStorei(GL_UNPACK_ALIGNMENT, bpp);
    const bool useStride = stride != width
            && Caches::getInstance().extensions().hasUnpackRowLength();
   ...
	     if (resize) {
            glTexImage2D(GL_TEXTURE_2D, 0, format, width, height, 0, format, type, temp);
        } else {
            glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, width, height, format, type, temp);
        }
複製代碼

關鍵就是調用glTexImage2D將紋理圖片跟紋理綁定,OpenGL的glTexImage2D通常會再次拷貝一次圖片,以後,Bitmap就能夠釋放了,到這裏就完成了紋理的上傳這部分紅功了,就算同步成功,UI線程能夠再也不阻塞。那麼爲何同步失敗的時候,CPU須要等待呢?我是這麼理解的:若是說正常緩存了,調用glTexImage2D完成了一次數據的轉移與備份,那麼UI線程就不須要維持這份Bitmap對應的數據了,可是若是失敗,沒有爲GPU生成備份,那就要保留這份數據,直到調用glTexImage2D爲其生成備份。那爲何不把緩存調整很大呢?多是在內存跟性能之間作的一個平衡,若是很大,可能同一時刻爲GPU緩存的Bitmap太大,可是這個時候,GPU並無用的到,多是GPU太忙,來不及處理,那麼這部份內存實際上是浪費掉的,並且,這個時候CPU明顯比GPU快了不少,能夠適當讓CPU等等,有的解析說防止Bitmap被修改,說實話,我也沒太明白,只是我的理解,歡迎糾正,不過這裏就算緩存失敗,在issue提交OpenGL命令的時候,仍是會再次upload Bitmap的,這大概也是UI阻塞的緣由,這個時段對應的耗時以下:

OpenGL CPU跟GPU關係玄學曲線.jpg

Render線程issue提交OpenGL渲染命令

同步完成後,就能夠處理以前的DrawOpTree,裝換成標準的OpenGL API,提交OpenGL進行渲染,繼續看DrawFrameTask的後半部分,主要是調用CanvasContext的draw,遞歸以前的DrawOpTree

void CanvasContext::draw() {
   
    EGLint width, height;
    <!--開始繪製,綁定EglSurface, 申請EglSurface須要的內存-->
    mEglManager.beginFrame(mEglSurface, &width, &height);
    ...
    Rect outBounds;
    <!--遞歸調用OpenGLRender中的OpenGL API,繪製-->
    mCanvas->drawRenderNode(mRootRenderNode.get(), outBounds);
    bool drew = mCanvas->finish();
    // Even if we decided to cancel the frame, from the perspective of jank
    // metrics the frame was swapped at this point
    mCurrentFrameInfo->markSwapBuffers();
    <!--通知提交畫布-->
    if (drew) {
        swapBuffers(dirty, width, height);
    }
   ...
}
複製代碼
  • 第一步:mEglManager.beginFrame,實際上是標記當前上下文,而且申請繪製內存,由於一個進程中可能存在多個window,也就是多個EglSurface,那麼咱們首先須要標記處理哪一個,也就是用哪塊畫布繪畫。以前理解Android硬件加速的小白文說過,硬件加速場景會提早在SurfaceFlinger申請內存坑位,可是並未真正申請內存,這塊內存是在真正繪製的時候纔去申請,這裏申請的內存是讓GPU操做的內存,也是未來用來提交給SurfaceFlinger用來合成用的Layer數據;
  • 第二步:遞歸issue OpenGL命令,提交給GPU繪製;
  • 第三步:經過swapBuffers將繪製好的數據提交給SF去合成(其實GPU極可能並未完成渲染,可是能夠提早釋放Render線程,這裏須要Fence機制保證同步)。不一樣的GPU實現不一樣,廠商不會將這部分開源,本文結合Android源碼(軟件實現的OpenGL)跟真機Systrace猜想實現。

先看第一步,經過EglManager讓Context綁定當前EglSurface,完成GPU繪製內存的申請

void EglManager::beginFrame(EGLSurface surface, EGLint* width, EGLint* height) {

    makeCurrent(surface);
    ...
    eglBeginFrame(mEglDisplay, surface);
}
複製代碼

makeCurrent都會向BnGraphicproducer申請一塊內存,對於非本身編寫的Render線程,基本都是向SurfaceFlinger申請,

EGLBoolean eglMakeCurrent(  EGLDisplay dpy, EGLSurface draw,
                            EGLSurface read, EGLContext ctx)
{
    ogles_context_t* gl = (ogles_context_t*)ctx;
    if (makeCurrent(gl) == 0) {
        if (ctx) {
            egl_context_t* c = egl_context_t::context(ctx);
            egl_surface_t* d = (egl_surface_t*)draw;
            egl_surface_t* r = (egl_surface_t*)read;
            ...
            if (d) {
            <!--牽扯到申請內存-->
                if (d->connect() == EGL_FALSE) {
                    return EGL_FALSE;
                }
                d->ctx = ctx;
                <!--綁定-->
                d->bindDrawSurface(gl);
            }
           ...
    return setError(EGL_BAD_ACCESS, EGL_FALSE);
}
複製代碼

若是是第一次的話,則須要調用egl_surface_t connect,其實就是調用以前建立的egl_window_surface_v2_t的connect,觸發申請繪製內存:

EGLBoolean egl_window_surface_v2_t::connect() 
{
 	 // dequeue a buffer
    int fenceFd = -1;
    <!--調用nativeWindow的dequeueBuffer申請繪製內存,獲取一個Fence-->
    if (nativeWindow->dequeueBuffer(nativeWindow, &buffer,
            &fenceFd) != NO_ERROR) {
        return setError(EGL_BAD_ALLOC, EGL_FALSE);
    }

    // wait for the buffer  等待申請的內存可用
    sp<Fence> fence(new Fence(fenceFd));
 	...
    return EGL_TRUE;
}
複製代碼

上面的nativeWindow其實就是Surface:

int Surface::dequeueBuffer(android_native_buffer_t** buffer, int* fenceFd) {
        ...
    FrameEventHistoryDelta frameTimestamps;
    status_t result = mGraphicBufferProducer->dequeueBuffer(&buf, &fence, reqWidth, reqHeight,
                                                            reqFormat, reqUsage, &mBufferAge,
                                                            enableFrameTimestamps ? &frameTimestamps
                                                                                  : nullptr);
    ... 若是須要從新分配,則requestBuffer,請求分配
    if ((result & IGraphicBufferProducer::BUFFER_NEEDS_REALLOCATION) || gbuf == nullptr) {
        <!--請求分配-->
        result = mGraphicBufferProducer->requestBuffer(buf, &gbuf);
       }
    ...
複製代碼

簡單說就是先申請內存坑位,若是該坑位的內存須要從新分配,則再申請分配匿名共享內存,這裏分配的內存纔是EglSurface(Surface)繪製所需內存(硬件加速),接下來就能夠通知OpenGL渲染繪製了。上面流程牽扯到一個Fence機制,其實就是一種協助生產者消費者的機制,主要做用是處理GPU跟CPU的同步上,先不談。先走完流程,CanvasContext的mCanvas實際上是OpenGLRenderer,接着看OpenGLRenderer的drawRenderNode:

void OpenGLRenderer::drawRenderNode(RenderNode* renderNode, Rect& dirty, int32_t replayFlags) {
    // All the usual checks and setup operations (quickReject, setupDraw, etc.)
    // will be performed by the display list itself
    if (renderNode && renderNode->isRenderable()) {
        // compute 3d ordering
        <!--計算Z順序-->
        renderNode->computeOrdering();
        <!--若是禁止合併Op直接繪製-->
        if (CC_UNLIKELY(Properties::drawDeferDisabled)) {
            startFrame();
            ReplayStateStruct replayStruct(*this, dirty, replayFlags);
            renderNode->replay(replayStruct, 0);
            return;
        }
       ...
        DeferredDisplayList deferredList(mState.currentClipRect(), avoidOverdraw);
        DeferStateStruct deferStruct(deferredList, *this, replayFlags);
        <!--合併-->
        renderNode->defer(deferStruct, 0);
		 <!--處理文理圖層-->
        flushLayers();
        <!--設置視窗-->
        startFrame();
       <!--flush,生成並提交OpenGL命令-->
        deferredList.flush(*this, dirty);
    } ...
複製代碼

計算Z order跟合併DrawOp以前簡單說過,不分析,這裏只看flushLayers跟最終的issue OpenGL 命令(deferredList.flush,其實也是遍歷每一個DrawOp,調用本身的draw函數),flushLayers主要是處理TextureView,爲了簡化,先不考慮,假設不存在此類試圖,那麼只看flush便可,

void DeferredDisplayList::flush(OpenGLRenderer& renderer, Rect& dirty) {
    ...
    replayBatchList(mBatches, renderer, dirty);
	...
}

 static void replayBatchList(const Vector<Batch*>& batchList,
        OpenGLRenderer& renderer, Rect& dirty) {
    for (unsigned int i = 0; i < batchList.size(); i++) {
        if (batchList[i]) {
            batchList[i]->replay(renderer, dirty, i);
        }
    }
}
複製代碼

DrawOp合併

virtual void  DrawBatch::replay(OpenGLRenderer& renderer, Rect& dirty, int index) override {
  	        for (unsigned int i = 0; i < mOps.size(); i++) {
            DrawOp* op = mOps[i].op;
            const DeferredDisplayState* state = mOps[i].state;
            renderer.restoreDisplayState(*state);
            op->applyDraw(renderer, dirty);     }  }
複製代碼

遞歸每一個合併後的Batch,接着處理Batch中每一個DrawOp,調用其replay,以DrawPointsOp畫點爲例:

class DrawPointsOp : public DrawLinesOp {
public:
    DrawPointsOp(const float* points, int count, const SkPaint* paint)
            : DrawLinesOp(points, count, paint) {}

    virtual void applyDraw(OpenGLRenderer& renderer, Rect& dirty) override {
        renderer.drawPoints(mPoints, mCount, mPaint);
    }
...
複製代碼

最終調用OpenGLrender的drawPoints

void OpenGLRenderer::drawPoints(const float* points, int count, const SkPaint* paint) {
    ...
	 count &= ~0x1; 
	<!--構建VertexBuffer-->
    VertexBuffer buffer;
    PathTessellator::tessellatePoints(points, count, paint, *currentTransform(), buffer);
     ...	
    int displayFlags = paint->isAntiAlias() ? 0 : kVertexBuffer_Offset;
    <!--使用buffer paint繪製 -->
    drawVertexBuffer(buffer, paint, displayFlags);
    mDirty = true;
}

void OpenGLRenderer::drawVertexBuffer(float translateX, float translateY,
        const VertexBuffer& vertexBuffer, const SkPaint* paint, int displayFlags) {
    /...
    Glop glop;
    GlopBuilder(mRenderState, mCaches, &glop)
            .setRoundRectClipState(currentSnapshot()->roundRectClipState)
            .setMeshVertexBuffer(vertexBuffer, shadowInterp)
            .setFillPaint(*paint, currentSnapshot()->alpha)
             ...
            .build();
    renderGlop(glop);
}

void OpenGLRenderer::renderGlop(const Glop& glop, GlopRenderType type) {
...
mRenderState.render(glop);
...
複製代碼

Vertex是OpenGL的基礎概念,drawVertexBuffer調用RenderState的render,向GPU提交繪製命令(不會當即繪製,GPU也是由緩衝區的,除非手動glFinish或者glFlush,纔會即刻渲染),RenderState能夠看作OpenGL狀態機的抽象,render函數實現以下

void RenderState::render(const Glop& glop) {
    const Glop::Mesh& mesh = glop.mesh;
    const Glop::Mesh::Vertices& vertices = mesh.vertices;
    const Glop::Mesh::Indices& indices = mesh.indices;
    const Glop::Fill& fill = glop.fill;
    // ---------------------------------------------
    // ---------- Program + uniform setup ----------
    // ---------------------------------------------
    mCaches->setProgram(fill.program);

    if (fill.colorEnabled) {
        fill.program->setColor(fill.color);
    }

    fill.program->set(glop.transform.ortho,
            glop.transform.modelView,
            glop.transform.meshTransform(),
            glop.transform.transformFlags & TransformFlags::OffsetByFudgeFactor);

    // Color filter uniforms
    if (fill.filterMode == ProgramDescription::kColorBlend) {
        const FloatColor& color = fill.filter.color;
        glUniform4f(mCaches->program().getUniform("colorBlend"),
                color.r, color.g, color.b, color.a);
    }
     ....
	 // ---------- Mesh setup ----------
    // vertices
    const bool force = meshState().bindMeshBufferInternal(vertices.bufferObject)
            || (vertices.position != nullptr);
    meshState().bindPositionVertexPointer(force, vertices.position, vertices.stride);

    // indices
    meshState().bindIndicesBufferInternal(indices.bufferObject);
    ...
    // ------------------------------------
    // ---------- GL state setup ----------
    // ------------------------------------
    blend().setFactors(glop.blend.src, glop.blend.dst);
    // ------------------------------------
    // ---------- Actual drawing ----------
    // ------------------------------------
    if (indices.bufferObject == meshState().getQuadListIBO()) {
        // Since the indexed quad list is of limited length, we loop over
        // the glDrawXXX method while updating the vertex pointer
        GLsizei elementsCount = mesh.elementCount;
        const GLbyte* vertexData = static_cast<const GLbyte*>(vertices.position);
        while (elementsCount > 0) {
            ...
            glDrawElements(mesh.primitiveMode, drawCount, GL_UNSIGNED_SHORT, nullptr);
            elementsCount -= drawCount;
            vertexData += (drawCount / 6) * 4 * vertices.stride;  } }  
            ...
}
複製代碼

能夠看到,通過一步步的設置,變換,預處理,最後都是要轉換成glXXX函數,生成相應的OpenGL命令發送給GPU,通知GPU繪製,這裏有兩種處理方式,第一種是CPU阻塞等待GPU繪製結束後返回,再將繪製內容提交給SurfaceFlinger進行合成,第二種是CPU直接返回,而後提交給SurfaceFlinger合成,等到SurfaceFlinger合成的時候,若是還未繪製完畢,則須要阻塞等待GPU繪製完畢,軟件實現的採用的是第一種,硬件實現的通常是第二種。須要注意:OpenGL繪製前各類準備包括傳給GPU使用的內存都是CPU在APP的私有內存空間申請的,而GPU真正繪製到畫布使用的提交給SurfaceFlinger的那塊內存,是從匿名共享申請的內存,二者是不同的,這一部分的耗時,其實就是CPU 將命令同步給GPU的耗時,在OpenGL玄學曲線中是:

構建OpenGL命令.jpg

Render線程swapBuffers提交圖形緩衝區(加Fence機制)

在Android裏,GraphicBuffer的同步主要藉助Fence同步機制,它最大的特色是可以處理GPU、CPU、HWC間的同步。由於,GPU處理通常是異步的,當咱們調用OpenGL API返回後,OpenGL命令並非即刻被GPU執行的,而是被緩存在本地的GL命令緩衝區中,等緩衝區滿的時候,纔會真正通知GPU執行,而CPU可能徹底不知道執行時機,除非CPU主動使用glFinish()強制刷新,阻塞等待這些命令執行完,可是,毫無疑問,這會使得CPU、GPU並行處理效率下降,至少,渲染線程是被阻塞在那裏的;相對而言異步處理的效率要高一些,CPU提交命令後就返回,不等待GPU處理完,這樣渲染線程被解放處理下一條消息,不過這個時候圖形未被處理完畢的前提的下就被提交給SurfaceFlinger圖形合成,那麼SurfaceFlinger須要知道何時這個GraphicBuffer被GPU處理填充完畢,這個時候就是Fence機制發揮做用的地方,關於Fence不過多分析,畢竟牽扯信息也挺多,只簡單畫了示意圖:

Fence示意圖.jpg

以前的命令被issue完畢後,CPU通常會發送最後一個命令給GPU,告訴GPU當前命令發送完畢,能夠處理,GPU通常而言須要返回一個確認的指令,不過,這裏並不表明執行完畢,僅僅是通知到而已,若是GPU比較忙,來不及回覆通知,則CPU須要阻塞等待,CPU收到通知後,會喚起當前阻塞的Render線程,繼續處理下一條消息,這個階段是在swapBuffers中完成的,Google給的解釋以下:

Once Android finishes submitting all its display list to the GPU, the system issues one final command to tell the graphics driver that it's done with the current frame. At this point, the driver can finally present the updated image to the screen.

It’s important to understand that the GPU executes work in parallel with the CPU. The Android system issues draw commands to the GPU, and then moves on to the next task. The GPU reads those draw commands from a queue and processes them.

In situations where the CPU issues commands faster than the GPU consumes them, the communications queue between the processors can become full. When this occurs, the CPU blocks, and waits until there is space in the queue to place the next command. This full-queue state arises often during the Swap Buffers stage, because at that point, a whole frame’s worth of commands have been submitted

但看Android源碼而言,軟件實現的libagl能夠看作同步的,不須要考慮Fence機制:

EGLBoolean egl_window_surface_v2_t::swapBuffers()
{
  	...
    // 其實就是queueBuffer,queueBuffer這裏用的是-1
    nativeWindow->queueBuffer(nativeWindow, buffer, -1);
    buffer = 0;
    // dequeue a new buffer
    int fenceFd = -1;
    // 這裏是爲了什麼,仍是阻塞等待,難道是爲了等待GPU處理完成嗎?  
    // buffer換buffer
    if (nativeWindow->dequeueBuffer(nativeWindow, &buffer, &fenceFd) == NO_ERROR) {
        sp<Fence> fence(new Fence(fenceFd));
        // fence->wait
        if (fence->wait(Fence::TIMEOUT_NEVER)) {
            nativeWindow->cancelBuffer(nativeWindow, buffer, );
            return setError(EGL_BAD_ALLOC, EGL_FALSE);
        }
		...
複製代碼

能夠看到,源碼中是先將Buffer提交給SurfaceFlinger,而後再申請一個Buffer用來處理下一次請求。而且這裏queueBuffer傳遞的Fence是-1,也就在swapbuffer的時候,軟件實現的OpenGL庫是不須要Fence機制的(壓根不須要考慮GPU、CPU同步)。queueBuffer會觸發Layer回調,並向SurfaceFlinger發送消息,請求SurfaceFlinger執行,這裏是一個異步過程,不會致使阻塞,回調入口在Layer的onFrameAvailable

void Layer::onFrameAvailable(const BufferItem& item) {
    { 
    ...queueBuffer後觸發Layer的onFrameAvailable回調,
    mFlinger->signalLayerUpdate();
}
複製代碼

而dequeueBuffer在slot上限容許的前提下,也不會阻塞,按理說,不會怎麼耗時,可是就模擬器效果而言,swapBuffers好像耗時比較嚴重(其中下圖的黃色部分就是swapBuffers耗時),這裏不太理解,由於模擬器應該是同步的,應該不會牽扯緩衝區交換時也不會隱式將命令送去GPU執行,也不會阻塞等待,爲何耗時這麼多呢,模擬器的(Genymotion 6.0),不知道是否是跟Genymotion有關係:

image.png

再看一下Genymotion 的Systrace:

image.png

能夠看到,Systace中的函數調用基本跟egl.cpp中基本一致,可是queue跟dequeue buffer爲何耗時這麼久呢?有些不理解,但願有人能指點。而對於硬件真機,通常須要處理Fence,其egl_window_surface_v2_t::swapBuffers()應該會被重寫,至少須要傳遞一個有效的Fence過去,

nativeWindow->queueBuffer(nativeWindow, buffer, fenceId(不該該再是-1));
複製代碼

也就是說,queueBuffer的fenceid不能再是-1了,由於須要一個有效的Fence處理GPU CPU同步,再再看下真機的Systrace(nexus5 6.0)

真機OpenGL渲染Systrace

能夠看到真機函數的調用跟模擬器差異很大,好比dequeue、enqueue,具體可能要看各家的實現了,再看8.0的nexus6p:

nexus6p 8.0

一開始我覺得,swapBuffers會在某個地方調用glFinish()或者glFlush,這個時候可能會阻塞,致使耗時增長,可是看源碼說不通,由於好像也跟就不會在enqueue或者dequeue的時候直接觸發,就算觸發,也是異步的。通常,issue任務給驅動後,若是採用是雙緩衝,在緩衝區交換操做會隱式將命令送去執行,這裏猜測是不一樣廠商本身實現,可是看不到具體的代碼,也很差肯定,誰作rom的但願能指點下。 這段時間的耗時在GPU呈現曲線上以下,文檔解釋說是CPU等待GPU的時間,我的理解:是等待時間,可是不是等待GPU完成渲染的時間,僅僅是等待一個ACK類的信號,不然,就不存在CPU、GPU並行了:

swapbuffer耗時.jpg

dequeueBuffer會阻塞致使耗時增長嗎?應該也不會,關於swapbuffer這段時間的耗時有空再看了

總結

  • UI線程構建OpenGL的DrawOpTree
  • Render線程負責DrawOpTree合併優化、數據的同步
  • Render線程負責將DrawOp轉換成標準OpenGL命令,並isssue給GPU
  • Render線程經過swapbuffer通知GPU(待研究),同時完成向SurfaceFlinger畫布數據的提交

做者:看書的小蝸牛 Android硬件加速(二)-RenderThread與OpenGL GPU渲染

僅供參考,歡迎指正

相關文章
相關標籤/搜索