TaintDroid 剖析之 DVM 變量級污點跟蹤(下篇)

1 回顧

上一章節中咱們詳細分析了TaintDroid對DVM方法參數和方法變量的變量級污點跟蹤機制,如今咱們將繼續分析TaintDroid對類的靜態域、實例域以及數組的污點跟蹤。java

2 瞭解DVM中類的數據結構

因爲DVM師從JVM,因此DVM中全部類的祖先也是Object類,該類定義在dalvik/vm/oo/Object.h中。其實不單單是Object類,DVM全部的基本類都定義在Object.h文件中。數組

衆所周知,Object類共分三種類型:安全

  1. Class Objects,它是java.lang.Class的實例,此類object的公共基類是ClassObject;數據結構

  2. Array Objects,由new Array指令建立,此類object的公共基類是ArrayObject;app

  3. Data Objects,除了上面兩種Object以外的全部object,公共基類是DataObject。less

這裏有一個特例須要注意,那就是String Objects!String Objects當前等同於Data Objects,但鑑於該類在DVM中大量使用,所以DVM單獨定義了一個類StringObject,它直接繼承至Object。ide

瞭解了類的數據結構,再去分析TaintDroid對類的靜態域、實例域和數組的污點跟蹤就不會以爲無從下手了。函數

3 對各類數據結構的修改

要想實現對類的實例域和靜態域的污點跟蹤,最簡單粗暴的方式就是對類中相關的數據結構進行修改。TaintDroid就是這麼作的。性能

1)首先,修改了ClassObject::Object:this

struct ClassObject : Object {
    /* leave space for instance data; we could access fields directly if we freeze the definition of java/lang/Class */
#ifdef WITH_TAINT_TRACKING
    // x2 space for interleaved taint tags
    u4              instanceData[CLASS_FIELD_SLOTS*2];
#else
    u4              instanceData[CLASS_FIELD_SLOTS];
#endif /*WITH_TAINT_TRACKING*/

TaintDroid將其中的u4 instanceData[CLASS_FILED_SLOTS]改成u4 instanceData[CLASS_FILED_SLOTS * 2]。這裏CLASS_FILED_SLOTS默認爲4。倍增的空間用於交叉存儲各個實例域的污點。聯想到類的實例域有兩種類型:1)諸如int之類的基本類型;2)類對象的引用。因此咱們能夠知道,TaintDroid爲每一個引用也分配了一個tag,用於表示該引用的污點信息。充分理解這一點,對咱們後續分析複雜污點傳播邏輯頗有幫助。

2)其次,修改了靜態域StaticField:Field:

struct StaticField : Field {
    JValue          value;          /* initially set from DEX for primitives */
#ifdef WITH_TAINT_TRACKING
    Taint           taint;
#endif
};

在JValue以後添加了Taint tiant成員。Taint成員定義在vm/interp/Taint.h文件中定義以下:

typedef struct Taint{ u4 tag}Taint;

經過這樣的修改,再對涉及到操做這些數據結構的方法進行修復就能實現類的實例域和靜態域的污點跟蹤了。這裏以computeFieldOffsets函數爲例,此函數定義在dalvik/vm/oo/Class.cpp中,因爲代碼較多,僅截取部分修復相關部分:

……
if (clazz->super != NULL)
        fieldOffset = clazz->super->objectSize;
    else
        fieldOffset = OFFSETOF_MEMBER(DataObject, instanceData);
……
/*Start by moving all reference fields to the front */
for (i = 0; i < clazz->ifieldCount; i++) {
        InstField* pField = &clazz->ifields[i];
        char c = pField->signature[0];
 
        if (c != '[' && c != 'L') {
            while (j > i) {
                InstField* refField = &clazz->ifields[j--];
                char rc = refField->signature[0];
                if (rc == '[' || rc == 'L'] {
                    swapField(pField, refField);
                    c = rc;
                    clazz->ifieldRefCount++;
                    break;
                }
            }
            /* We may or may not have swapped a field.*/
        } else {
            /* This is a reference field.*/
            clazz->ifieldRefCount++;
        }
        /*If we've hit the end of the reference fields, break.*/
        if (c != '[' && c != 'L')
            break;
 
        pField->byteOffset = fieldOffset;
#ifdef WITH_TAINT_TRACKING
        fieldOffset += sizeof(u4) + sizeof(u4); /* interleaved tag */
#else
        fieldOffset += sizeof(u4);
#endif
        LOGVV("  --- offset1 '%s'=%d", pField->name,pField->byteOffset);
}
……
 
/* Alignment is good, shuffle any double-wide fields forward, and finish assigning field offsets to all fields.*/
for ( ; i < clazz->ifieldCount; i++) {
        InstField* pField = &clazz->ifields[i];
        char c = pField->signature[0];
 
        if (c != 'D' && c != 'J') {
            while (j > i) {
                InstField* doubleField = &clazz->ifields[j--];
                char rc = doubleField->signature[0];
                if (rc == 'D' || rc == 'J') {
                    swapField(pField, doubleField);
                    c = rc;
                    break;
                }
            }
        } else {
        }
        pField->byteOffset = fieldOffset;
#ifdef WITH_TAINT_TRACKING
        fieldOffset += sizeof(u4) + sizeof(u4); /* room for tag */
        if (c == 'J' || c == 'D')
            fieldOffset += sizeof(u4) + sizeof(u4); /* keep 64-bit aligned */
#else
        fieldOffset += sizeof(u4);
        if (c == 'J' || c == 'D')
            fieldOffset += sizeof(u4);
#endif /* ndef WITH_TAINT_TRACKING */
    }

顯然,在計算類中各個實例域的偏移值的時候,因爲TaintDroid對實例域的空間進行了倍增(交叉存儲污點),因此這裏應該加上2sizeof(u4)。另外須要注意的是對於Double和Long類型的數據,要加上4sizeof(u4)!

至此類的實例域和靜態域的污點跟蹤分析完畢,下一步輪到數組了。

3)對數組對象ArrayObject:Object的修改:

struct ArrayObject : Object {
    /* number of elements; immutable after init */
    u4              length;
#ifdef WITH_TAINT_TRACKING
    Taint           taint;
#endif
    u8              contents[1];
};

在length成員以後添加Taint tiant成員。之因此這樣作,是由於出於性能的考慮:若是數組中每一個成員都存儲一個tag的話,對性能的影響就太大了,因此TaintDroid對每一個ArrayObject對象只分配一個tag。

一樣的,修改了ArrayObject的結構體,就必須同步修改涉及到對ArrayObject進行操做的函數。這裏以oo/Array.cpp中的allocArray函數爲例:

static ArrayObject* allocArray(ClassObject* arrayClass, size_t length,
    size_t elemWidth, int allocFlags)
{
    ……
    ArrayObject* newArray = (ArrayObject*)dvmMalloc(totalSize, allocFlags);
    if (newArray != NULL) {
        DVM_OBJECT_INIT(newArray, arrayClass);
        newArray->length = length;
#ifdef WITH_TAINT_TRACKING
        newArray->taint.tag = TAINT_CLEAR;
#endif
        dvmTrackAllocation(arrayClass, totalSize);
    }
}

在分配一個新的數組的時候,TaintDroid將它的taint成員賦值爲TAINT_CLEAR(即清空污點信息)。

4)特殊類StringObject的結構分析。它的結構體以下:

struct StringObject : Object {
    /* variable #of u4 slots; u8 uses 2 slots */
    u4              instanceData[1];
    /** Returns this string's length in characters. */
    int length() const;
    /**
     * Returns this string's length in bytes when encoded as modified UTF-8.
     * Does not include a terminating NUL byte.
     */
    int utfLength() const;
    /** Returns this string's char[] as an ArrayObject. */
    ArrayObject* array() const;
    /** Returns this string's char[] as a u2*. */
    const u2* chars() const;
};

因爲StringObject提供了一個方法array(),此方法返回一個ArrayObject型指針,因此在獲取和設置StringObject的污點信息的時候,須要經過StringObject.array()->taint.tag進行操做。

4 進一步分析DVM污點傳播邏輯

在前一章節中,咱們分析了兩參數相加的DVM opcode(OP_ADD_INT_2ADDR),這是由於咱們當時對類的靜態域、實例域以及數組的污點存儲並不熟悉,因此也就僅僅能捏一捏這類軟柿子而已,如今咱們挑戰一下更高難度的數組操做相關的opcode——OP_AGET_OBJECT(即aget-obj)。該opcode的彙編實如今dalvik/vm/mterp/armv*te_taint/OP_AGET_OBJECT.S文件中:

%verify "executed"
%include "armv5te_taint/OP_AGET.S"

轉到OP_AGET.S:

%default { "load":"ldr", "shift":"2" }   //表示移位基準爲2位,即乘以4
%verify "executed"
    /*
     *Array get, 32 bits or less.  vAA <- vBB[vCC].
     *
     *Note: using the usual FETCH/and/shift stuff, this fits in exactly 17
     *instructions.  We use a pair of FETCH_Bs instead.
     *
     *for: aget, aget-object, aget-boolean, aget-byte, aget-char, aget-short
     */
    /* op vAA, vBB, vCC */
    FETCH_B(r2, 1, 0)                   @ r2<- BB
    mov     r9, rINST, lsr #8           @ r9<- AA
    FETCH_B(r3, 1, 1)                   @ r3<- CC
    GET_VREG(r0, r2)                    @ r0<- vBB (array object)
    GET_VREG(r1, r3)                    @ r1<- vCC (requested index)
    cmp     r0, #0                      @ null array object?
    beq     common_errNullObject        @ yes, bail
// begin WITH_TAINT_TRACKING
    bl                .L${opcode}_taint_prop_1
// end WITH_TAINT_TRACKING
    ldr     r3, [r0, #offArrayObject_length]    @ r3<- arrayObj->length
    add     r0, r0, r1, lsl #$shift     @ r0<- arrayObj + index*width
    cmp     r1, r3                      @ compare unsigned index, length
// begin WITH_TAINT_TRACKING
//    bcs     common_errArrayIndex        @ index >= length, bail        // in subroutine
//    FETCH_ADVANCE_INST(2)               @ advance rPC, load rINST // in subroutine
    bl                .L${opcode}_taint_prop_2
// end WITH_TAINT_TRACKING
    $load   r2, [r0, #offArrayObject_contents]  @ r2<- vBB[vCC]
    GET_INST_OPCODE(ip)                 @ extract opcode from rINST
    SET_VREG(r2, r9)                    @ vAA<- r2
    GOTO_OPCODE(ip)                     @ jump to next instruction
 
%break
 
.L${opcode}_taint_prop_1:
    ldr            r2, [r0, #offArrayObject_taint]   @獲取數組對象vBB的taint,賦給r2
    SET_TAINT_FP(r10)
    GET_VREG_TAINT(r3, r3, r10)                  @獲取索引數據vCC的taint,賦給r3
    orr            r2, r3, r2                  @ r2<- r2 | r1
    bx            lr
 
.L${opcode}_taint_prop_2:
    bcs     common_errArrayIndex        @ index >= length, bail
    FETCH_ADVANCE_INST(2)               @ advance rPC, load rINST
    SET_TAINT_FP(r3)
    SET_VREG_TAINT(r2, r9, r3)            @將r2(即此時的污點信息)賦值給vAA的taint tag
    bx      lr

顯然重點在*_taint_prop_1*_taint_prop_2兩個代碼段。簡要歸納它們的功能:

1)taint_prop_1首先取得數組對象vBB的taint。注意這裏offArrayObject_taint定義在dalvik/vm/common/asm-constants.h中:

#ifdef WITH_TAINT_TRACKING
MTERP_OFFSET(offArrayObject_taint,        ArrayObject, taint, 12) //結合ArrayObject數據結構,不難理解此代碼
#endif
 
MTERP_OFFSET宏的定義以下:
# define MTERP_OFFSET(_name, _type, _field, _offset)                        \
    if (OFFSETOF_MEMBER(_type, _field) != _offset) {                        \
        ALOGE("Bad asm offset %s (%d), should be %d",                        \
            #_name, _offset, OFFSETOF_MEMBER(_type, _field));               \
        failed = true;                                                      \
    }

獲取了vBB的taint tag以後,再獲取索引vCC的taint tag,而後將二者相或,最終結果賦給r2寄存器;

2)taint_prop_2再將此時的r2寄存器中的tag信息賦值給vAA的taint tag。這樣就完成了aget-object的污點傳播了。

至此整個DVM的變量級污點跟蹤機制咱們都已經分析完畢,下一步就是分析Native層的方法級污點跟蹤,這裏給各位讀者預留一個問題:爲何在DVM中能夠實現變量街污點跟蹤,可是native層卻只能實現方法級污點跟蹤呢?

做者:簡行、走位@阿里聚安全,更多技術文章,請訪問阿里聚安全博客

相關文章
相關標籤/搜索