T-Tree、T*-Tree的理解與簡單內存數據庫的實現

時間 2019-12-07

標籤 tree 理解簡單內存數據庫實現欄目 SQL 简体版

原文原文鏈接

章節目錄

T*-tree的介紹
T*-tree節點與Ｃ語言實現
T*-tree的插入、刪除、查找與旋轉
實現簡單的key-value內存數據庫
附件、代碼
待加入功能
參考文獻

T-tree和T*-tree極爲類似，他們的不一樣主要是T×-tree的節點結構比T-tree多了一個指向successor的指針位,指向successor的指針的存在減少了樹的尋找和遍歷的時間複雜度.
注：本文中關於ttree的head file ：ttree.h和ttree_defs.h來源於Dan Kruchinin <dkruchinin@acm.org>，Github：dkruchinin/libttreenode

T*-tree的介紹

在計算機科學中，T-tree是一種二叉樹，它有一個左子樹和一個右子樹，由主存儲器數據庫使用，例如Datablitz，EXtremeDB，MySQL Cluster，Oracle TimesTen和MobileLite。T樹是一種平衡的索引樹數據結構，針對索引和實際數據都徹底保存在內存中的狀況進行了優化，就像B樹是針對面向塊的輔助存儲設備（如硬盤）上的存儲而優化的索引結構。 T樹尋求得到內存樹結構（如AVL樹）的性能優點，同時避免它們常見的大存儲空間開銷。T樹不保留索引樹節點自己內的索引數據字段的副本。相反，它們利用了這樣的事實：實際數據老是與索引一塊兒在主存儲器中，所以它們只包含指向實際數據字段的指針。
git

雖然T樹之前被普遍用於主存數據庫，但最近的研究代表它們在現代硬件上的表現並不比B樹好。主要緣由是現代的內存和硬盤的速度差別愈來愈大了，內存訪問速度比硬盤訪問速度快，而且CPU的核心緩存容量也愈來愈大。github

T*-tree的節點結構與C語言實現

T樹節點一般由指向父節點，左右子節點，有序數據指針數組和一些額外控制數據的指針組成。具備兩個子樹的節點稱爲內部節點（internal nodes），沒有子樹的節點稱爲葉子節點（leaf nodes），而僅具備一個子樹的節點稱爲半葉節點（half-leaf nodes）。若是值在節點的當前最小值和最大值之間，則該節點稱爲（bounding node）。對於每一個內部節點,它的左子樹中會有一個葉子節點或半葉子節點中有一個predecessor（稱爲最大下限-GLB(Greatest Lower Bound)），還在右子樹中包含其最大數據值的後繼者（LUB-lower upper bound）的節點(包含GLB或LUP的節點或許距離參照內部節點的距離很遠,但也有可能剛好相鄰。正由於T-tree的每一個節點是有序的，而且不一樣節點之間保證左子樹的數據都比節點數據的最小值小，右子樹的數據都比節點數據的最大值大，所以B-tree最左邊的節點中最左邊的數據是整棵樹最小的數據，最右邊的節點中的最大值是整棵樹最大的數據。葉子和半葉節點中的數據元素量在1~最大元素量之間，內部節點將其佔用保持在預約義的最小元素量和最大元素數之間。如圖：
數據庫

T-tree，T-treenode的插入、刪除、旋轉和查找代碼來自於:Github:dkruchinin/libttree數組

typedef struct ttree_node {
        struct ttree_node *parent;     /**< Pointer to node's parent */
        struct ttree_node *successor;  /**< Pointer to node's soccussor */
        union {
            struct ttree_node *sides[2];
            struct  {
                struct ttree_node *left;   /**< Pointer to node's left child  */
                struct ttree_node *right;  /**< Pointer to node's right child */
            };
        };
        union {
            uint32_t pad;
            struct {
                signed min_idx     :12;  /**< Index of minimum item in node's array */
                signed max_idx     :12;  /**< Index of maximum item in node's array */
                signed bfc         :4;   /**< Node's balance factor */
                unsigned node_side :4;  /**< Node's side(TNODE_LEFT, TNODE_RIGHT or TNODE_ROOT) */
            };
        };

T*-tree的插入、刪除、查找與旋轉

插入

int ttree_insert(Ttree *ttree, void *item)
{
    TtreeCursor cursor;

    /*
     * If the tree already contains the same key item has and
     * tree's wasn't allowed to hold duplicate keys, signal an error.
     */
    if (ttree_lookup(ttree, ttree_item2key(ttree, item), &cursor) && ttree->keys_are_unique) {
        return -1;
    }

    ttree_insert_at_cursor(&cursor, item);
    return 0;
}

void ttree_insert_at_cursor(TtreeCursor *cursor, void *item)
{
    Ttree *ttree = cursor->ttree;
    TtreeNode *at_node, *n;
    TtreeCursor tmp_cursor;
    void *key;

    TTREE_ASSERT(cursor->ttree != NULL);
    //TTREE_ASSERT(cursor->state == CURSOR_PENDING);
    key = ttree_item2key(ttree, item);

    n = at_node = cursor->tnode;
    if (!ttree->root) { /* The root node has to be created. */
        at_node = allocate_ttree_node(ttree);
        at_node->keys[first_tnode_idx(ttree)] = key;
        at_node->min_idx = at_node->max_idx = first_tnode_idx(ttree);
        ttree->root = at_node;
        tnode_set_side(at_node, TNODE_ROOT);
        ttree_cursor_open_on_node(cursor, ttree, at_node, TNODE_SEEK_START);
        return;
    }
    if (cursor->side == TNODE_BOUND) {
        if (tnode_is_full(ttree, n)) {
            /*
             * If node is full its max item should be removed and
             * new key should be inserted into it. Removed key becomes
             * new insert value that should be put in successor node.
             */
            void *tmp = n->keys[n->max_idx--];

            increase_tnode_window(ttree, n, &cursor->idx);
            n->keys[cursor->idx] = key;
            key = tmp;

            ttree_cursor_copy(&tmp_cursor, cursor);
            cursor = &tmp_cursor;

            /*
             * If current node hasn't successor and right child
             * New node have to be created. It'll become the right child
             * of the current node.
             */
            if (!n->successor || !n->right) {
                cursor->side = TNODE_RIGHT;
                cursor->idx = first_tnode_idx(ttree);
                goto create_new_node;
            }

            at_node = n->successor;
            /*
             * If successor hasn't any free rooms, new value is inserted
             * into newly created node that becomes left child of the current
             * node's successor.
             */
            if (tnode_is_full(ttree, at_node)) {
                cursor->side = TNODE_LEFT;
                cursor->idx = first_tnode_idx(ttree);
                goto create_new_node;
            }

            /*
             * If we're here, then successor has free rooms and key
             * will be inserted to one of them.
             */
            cursor->idx = at_node->min_idx;
            cursor->tnode = at_node;
        }

        increase_tnode_window(ttree, at_node, &cursor->idx);
        at_node->keys[cursor->idx] = key;
        cursor->state = CURSOR_OPENED;
        return;
    }

create_new_node:
    n = allocate_ttree_node(ttree);
    n->keys[cursor->idx] = key;
    n->min_idx = n->max_idx = cursor->idx;
    n->parent = at_node;
    at_node->sides[cursor->side] = n;
    tnode_set_side(n, cursor->side);
    cursor->tnode = n;
    cursor->state = CURSOR_OPENED;
    fixup_after_insertion(ttree, n, cursor);
}

刪除

void *ttree_delete(Ttree *ttree, void *key)
{
    TtreeCursor cursor;
    void *ret;

    ret = ttree_lookup(ttree, key, &cursor);
    if (!ret) {
        return ret;
    }

    ttree_delete_at_cursor(&cursor);
    return ret;
}

void *ttree_delete_at_cursor(TtreeCursor *cursor)
{
    Ttree *ttree = cursor->ttree;
    TtreeNode *tnode, *n;
    void *ret;

    TTREE_ASSERT(cursor->ttree != NULL);
    TTREE_ASSERT(cursor->state == CURSOR_OPENED);
    tnode = cursor->tnode;
    ret = ttree_key2item(ttree, tnode->keys[cursor->idx]);
    decrease_tnode_window(ttree, tnode, &cursor->idx);
    cursor->state = CURSOR_CLOSED;
    if (UNLIKELY(cursor->idx > tnode->max_idx)) {
        cursor->idx = tnode->max_idx;
    }

    /*
     * If after a key was removed, T*-tree node contains more than
     * minimum allowed number of items, the proccess is completed.
     */
    if (tnode_num_keys(tnode) > min_tnode_entries(ttree)) {
        return ret;
    }
    if (is_internal_node(tnode)) {
        int idx;

        /*
         * If it is an internal node, we have to recover number
         * of items from it by moving one item from its successor.
         */
        n = tnode->successor;
        idx = tnode->max_idx + 1;
        increase_tnode_window(ttree, tnode, &idx);
        tnode->keys[idx] = n->keys[n->min_idx++];
        if (UNLIKELY(cursor->idx > tnode->max_idx)) {
            cursor->idx = tnode->max_idx;
        }
        if (!tnode_is_empty(n) && is_leaf_node(n)) {
            return ret;
        }

        /*
         * If we're here, then successor is either a half-leaf
         * or an empty leaf.
         */
        tnode = n;
    }
    if (!is_leaf_node(tnode)) {
        int items, diff;

        n = tnode->left ? tnode->left : tnode->right;
        items = tnode_num_keys(n);

        /*
         * If half-leaf can not be merged with a leaf,
         * the proccess is completed.
         */
        if (items > (ttree->keys_per_tnode - tnode_num_keys(tnode))) {
            return ret;
        }

        if (tnode_get_side(n) == TNODE_RIGHT) {
            /*
             * Merge current node with its right leaf. Items from the leaf
             * are placed after the maximum item in a node.
             */
            diff = (ttree->keys_per_tnode - tnode->max_idx - items) - 1;
            if (diff < 0) {
                memcpy(tnode->keys + tnode->min_idx + diff,
                       tnode->keys + tnode->min_idx, sizeof(void *) *
                       tnode_num_keys(tnode));
                tnode->min_idx += diff;
                tnode->max_idx += diff;
                if (cursor->tnode == tnode) {
                    cursor->idx += diff;
                }
            }
            memcpy(tnode->keys + tnode->max_idx + 1, n->keys + n->min_idx,
                   sizeof(void *) * items);
            tnode->max_idx += items;
        }
        else {
            /*
             * Merge current node with its left leaf. Items the leaf
             * are placed before the minimum item in a node.
             */
            diff = tnode->min_idx - items;
            if (diff < 0) {
                register int i;

                for (i = tnode->max_idx; i >= tnode->min_idx; i--) {
                    tnode->keys[i - diff] = tnode->keys[i];
                }

                tnode->min_idx -= diff;
                tnode->max_idx -= diff;
                if (cursor->tnode == tnode) {
                    cursor->idx -= diff;
                }
            }

            memcpy(tnode->keys + tnode->min_idx - items, n->keys + n->min_idx,
                   sizeof(void *) * items);
            tnode->min_idx -= items;
        }

        n->min_idx = 1;
        n->max_idx = 0;
        tnode = n;
    }
    if (!tnode_is_empty(tnode)) {
        return ret;
    }

    /* if we're here, then current node will be removed from the tree. */
    n = tnode->parent;
    if (!n) {
        ttree->root = NULL;
        free(tnode);
        return ret;
    }

    n->sides[tnode_get_side(tnode)] = NULL;
    fixup_after_deletion(ttree, tnode, NULL);
    free(tnode);
    return ret;
}

查找

void *ttree_lookup(Ttree *ttree, void *key, TtreeCursor *cursor)
{
    TtreeNode *n, *marked_tn, *target;
    int side = TNODE_BOUND, cmp_res, idx;
    void *item = NULL;
    enum ttree_cursor_state st = CURSOR_PENDING;

    /*
     * Classical T-tree search algorithm is O(log(2N/M) + log(M - 2))
     * Where N is total number of items in the tree and M is a number of
     * items per node. In worst case each node on the path requires 2
     * comparison(with its min and max items) plus binary search in the last
     * node(bound node) excluding its first and last items.
     *
     * Here is used another approach that was suggested in
     * "Tobin J. Lehman , Michael J. Carey, A Study of Index Structures for
     * Main Memory Database Management Systems".
     * It reduces O(log(2N/M) + log(M - 2)) to true O(log(N)).
     * This algorithm compares the search
     * key only with minimum item in each node. If search key is greater,
     * current node is marked for future consideration.
     */
    target = n = ttree->root;
    marked_tn = NULL;
    idx = first_tnode_idx(ttree);
    if (!n) {
        goto out;
    }
    while (n) {
        target = n;
        cmp_res = ttree->cmp_func(key, tnode_key_min(n));
        if (cmp_res < 0)
            side = TNODE_LEFT;
        else if (cmp_res > 0) {
            marked_tn = n; /* mark current node for future consideration. */
            side = TNODE_RIGHT;
        }
        else { /* ok, key is found, search is completed. */
            side = TNODE_BOUND;
            idx = n->min_idx;
            item = ttree_key2item(ttree, tnode_key_min(n));
            st = CURSOR_OPENED;
            goto out;
        }

        n = n->sides[side];
    }
    if (marked_tn) {
        int c = ttree->cmp_func(key, tnode_key_max(marked_tn));

        if (c <= 0) {
            side = TNODE_BOUND;
            target = marked_tn;
            if (!c) {
                item = ttree_key2item(ttree, tnode_key_max(target));
                idx = target->max_idx;
                st = CURSOR_OPENED;
            }
            else { /* make internal binary search */
                struct tnode_lookup tnl;

                tnl.key = key;
                tnl.low_bound = target->min_idx + 1;
                tnl.high_bound = target->max_idx - 1;
                item = lookup_inside_tnode(ttree, target, &tnl, &idx);
                st = (item != NULL) ? CURSOR_OPENED : CURSOR_PENDING;
            }

            goto out;
        }
    }

    /*
     * If we're here, item wasn't found. So the only thing
     * needs to be done is to determine the position where search key
     * may be placed to. If target node is not empty, key may be placed
     * to its min or max positions.
     */
    if (!tnode_is_full(ttree, target)) {
        side = TNODE_BOUND;
        idx = ((marked_tn != target) || (cmp_res < 0)) ?
            target->min_idx : (target->max_idx + 1);
        st = CURSOR_PENDING;
    }

out:
    if (cursor) {
        ttree_cursor_open_on_node(cursor, ttree, target, TNODE_SEEK_START);
        cursor->side = side;
        cursor->idx = idx;
        cursor->state = st;
    }

    return item;
}

旋轉

static void __rotate_single(TtreeNode **target, int side)
{
    TtreeNode *p, *s;
    int opside = opposite_side(side);

    p = *target;
    TTREE_ASSERT(p != NULL);
    s = p->sides[side];
    TTREE_ASSERT(s != NULL);
    tnode_set_side(s, tnode_get_side(p));
    p->sides[side] = s->sides[opside];
    s->sides[opside] = p;
    tnode_set_side(p, opside);
    s->parent = p->parent;
    p->parent = s;
    if (p->sides[side]) {
        p->sides[side]->parent = p;
        tnode_set_side(p->sides[side], side);
    }
    if (s->parent) {
        if (s->parent->sides[side] == p)
            s->parent->sides[side] = s;
        else
            s->parent->sides[opside] = s;
    }

    *target = s;
}

/*
 * There are two cases of single rotation possible:
 * 1) Right rotation (side = TNODE_LEFT)
 *         [P]             [L]
 *        /  \            /  \
 *      [L]  x1    =>   x2   [P]
 *     /  \                 /  \
 *    x2  x3               x3  x1
 *
 * 2) Left rotation (side = TNODE_RIHGT)
 *      [P]                [R]
 *     /  \               /  \
 *    x1  [R]      =>   [P]   x2
 *       /  \          /  \
 *     x3   x2        x1  x3
 */
static void rotate_single(TtreeNode **target, int side)
{
    TtreeNode *n;

    __rotate_single(target, side);
    n = (*target)->sides[opposite_side(side)];

    /*
     * Recalculate balance factors of nodes after rotation.
     * Let X was a root node of rotated subtree and Y was its
     * child. After single rotation Y is new root of subtree and X is its child.
     * Y node may become either balanced or overweighted to the
     * same side it was but 1 level less.
     * X node scales at 1 level down and possibly it has new child, so
     * its balance should be recalculated too. If it still internal node and
     * its new parent was not overwaighted to the opposite to X side,
     * X is overweighted to the opposite to its new parent side,
     * otherwise it's balanced. If X is either half-leaf or leaf,
     * balance racalculation is obvious.
     */
    if (is_internal_node(n)) {
        n->bfc = (n->parent->bfc != side2bfc(side)) ? side2bfc(side) : 0;
    }
    else {
        n->bfc = !!(n->right) - !!(n->left);
    }

    (*target)->bfc += side2bfc(opposite_side(side));
    TTREE_ASSERT((abs(n->bfc < 2) && (abs((*target)->bfc) < 2)));
}

/*
 * There are two possible cases of double rotation:
 * 1) Left-right rotation: (side == TNODE_LEFT)
 *      [P]                     [r]
 *     /  \                    /  \
 *   [L]  x1                [L]   [P]
 *  /  \          =>       / \    / \
 * x2  [r]                x2 x4  x3 x1
 *    /  \
 *  x4   x3
 *
 * 2) Right-left rotation: (side == TNODE_RIGHT)
 *      [P]                     [l]
 *     /  \                    /  \
 *    x1  [R]               [P]   [R]
 *       /  \     =>        / \   / \
 *      [l] x2             x1 x3 x4 x2
 *     /  \
 *    x3  x4
 */
static void rotate_double(TtreeNode **target, int side)
{
    int opside = opposite_side(side);
    TtreeNode *n = (*target)->sides[side];

    __rotate_single(&n, opside);

    /*
     * Balance recalculation is very similar to recalculation after
     * simple single rotation.
     */
    if (is_internal_node(n->sides[side])) {
        n->sides[side]->bfc = (n->bfc == side2bfc(opside)) ? side2bfc(side) : 0;
    }
    else {
        n->sides[side]->bfc =
            !!(n->sides[side]->right) - !!(n->sides[side]->left);
    }

    TTREE_ASSERT(abs(n->sides[side]->bfc) < 2);
    n = n->parent;
    __rotate_single(target, side);
    if (is_internal_node(n)) {
        n->bfc = ((*target)->bfc == side2bfc(side)) ? side2bfc(opside) : 0;
    }
    else {
        n->bfc = !!(n->right) - !!(n->left);
    }

    /*
     * new root node of subtree is always ideally balanced
     * after double rotation.
     */
    TTREE_ASSERT(abs(n->bfc) < 2);
    (*target)->bfc = 0;
}

static void rebalance(Ttree *ttree, TtreeNode **node, TtreeCursor *cursor)
{
    int lh = left_heavy(*node);
    int sum = abs((*node)->bfc + (*node)->sides[opposite_side(lh)]->bfc);

    if (sum >= 2) {
        rotate_single(node, opposite_side(lh));
        goto out;
    }

    rotate_double(node, opposite_side(lh));

    /*
     * T-tree rotation rules difference from AVL rules in only one aspect.
     * After double rotation is done and a leaf became a new root node of
     * subtree and both its left and right childs are half-leafs.
     * If the new root node contains only one item, N - 1 items should
     * be moved into it from one of its childs.
     * (N is a number of items in selected child node).
     */
    if ((tnode_num_keys(*node) == 1) &&
        is_half_leaf((*node)->left) && is_half_leaf((*node)->right)) {
        TtreeNode *n;
        int offs, nkeys;

        /*
         * If right child contains more items than left, they will be moved
         * from the right child. Otherwise from the left one.
         */
        if (tnode_num_keys((*node)->right) >= tnode_num_keys((*node)->left)) {
            /*
             * Right child was selected. So first N - 1 items will be copied
             * and inserted after parent's first item.
             */
            n = (*node)->right;
            nkeys = tnode_num_keys(n);
            (*node)->keys[0] = (*node)->keys[(*node)->min_idx];
            offs = 1;
            (*node)->min_idx = 0;
            (*node)->max_idx = nkeys - 1;
            if (!cursor) {
                goto no_cursor;
            }
            else if (cursor->tnode == n) {
                if (cursor->idx < n->max_idx) {
                    cursor->tnode = *node;
                    cursor->idx = (*node)->min_idx +
                        (cursor->idx - n->min_idx + 1);
                }
                else {
                    cursor->idx = first_tnode_idx(ttree);
                }
            }
        }
        else {
            /*
             * Left child was selected. So its N - 1 items
             * (starting after the min one)
             * will be copied and inserted before parent's single item.
             */
            n = (*node)->left;
            nkeys = tnode_num_keys(n);
            (*node)->keys[ttree->keys_per_tnode - 1] =
                (*node)->keys[(*node)->min_idx];
            (*node)->min_idx = offs = ttree->keys_per_tnode - nkeys;
            (*node)->max_idx = ttree->keys_per_tnode - 1;
            if (!cursor) {
                goto no_cursor;
            }
            else if (cursor->tnode == n) {
                if (cursor->idx > n->min_idx) {
                    cursor->tnode = *node;
                    cursor->idx = (*node)->min_idx + (cursor->idx - n->min_idx);
                }
                else {
                    cursor->idx = first_tnode_idx(ttree);
                }
            }

            n->max_idx = n->min_idx++;
        }

no_cursor:
        memcpy((*node)->keys + offs,
               n->keys + n->min_idx, sizeof(void *) * (nkeys - 1));
        n->keys[first_tnode_idx(ttree)] = n->keys[n->max_idx];
        n->min_idx = n->max_idx = first_tnode_idx(ttree);
    }

out:
    if (ttree->root->parent) {
        ttree->root = *node;
    }
}

實現簡單的key-value內存數據庫

實現簡單的key-value內存數據庫,用hashtable來連接key-value的關係。key不光插入到ttree中，並且還存到hash-table中。
hash_table採用了macro：hash-table（uthash.h）`uthash.h的幫助文檔:macro：uthash.h幫助文檔
hashkey-value對的插入:
插入以前先HASH_FIND_INT看看，key-value存不存在，若是不存在則能夠插，存在的話不能插入。緩存

void add_user(int user_id, char *name) {
    struct my_struct *s;
    HASH_FIND_INT(users, &user_id, s);  /* id already in the hash? */
    if (s==NULL) {
      s = (struct my_struct *)malloc(sizeof *s);
      s->id = user_id;
      HASH_ADD_INT( users, id, s );  /* id: name of key field */
    }
    strcpy(s->name, name);
}

解析存有key-value格式的文件

找到一個key-value格式的實例:xlarge.del
fopen()讀取文件,讀完以後 fclose()關閉。
由於待會兒要用strtok來拆開每一行，因此malloc個file_line數據結構

FILE * fp;
    fp = fopen("/home/vory/programing/c/key_value_mmdb/xlarge.del","r");
    file_line = malloc(1000 * sizeof(char));
    memset(file_line, 0, 1000 * sizeof(char));
    ......
    
    fclose(fp);

fgets獲取每一行:app

char * buf;
    buf = malloc(sizeof(input));
    memset(buf,0,sizeof(input));

    while(fgets(input ,256,fp)){

        strcpy(buf,input);
        ......
        
    }

strtok切割每一行爲若干字符串。strtok將目標字符串的符合delim中的元素所有替換爲'0',strtok用了以後，原來的目標代碼就被破壞了，所以，新malloc一個接受複製的字符串，將目標字符串strcpy()以後，對複製的字符串進行strtok()操做。用完以後free()。less

strcpy(buf,source_string);
        
        token = strtok(buf,delim);
//        printf("%s\n",token);
        parameter[parametercount] = malloc(sizeof(input));
        strcpy(parameter[parametercount],token);
        parametercount++;
        token = strtok(NULL,delim);
//        printf("%s\n",token);
        parameter[parametercount] = malloc(sizeof(input));
        strcpy(parameter[parametercount],token);

　實例的xlarge.del 文件的內容大概是這樣：每一行分紅兩部分，KEY 和VALUE被逗號隔開着。ide

41231234,"Teenage Caveman"
3061234,"Banger Sisters, The"
18861234,"Hope Floats"
29381234,"No Looking Back"
1288,"Erotic Confessions: Volume 8"
2954,"Normal Life"
43901234,"Utomlyonnye solntsem"
20801234,"Island of Dr. Moreau, The"
3019,"One Hell of a Guy"
6712341,"Adventures of Pluto Nash, The"
33031234,"Pronto"
34701234,"Ripper, The"
106612341,"Devotion"
39481234,"Starship Troopers"
32381234,"Polish Wedding"
30551234,"Oscar and Lucinda"
42391,"Tomcats"
1661123411,"Gojira ni-sen mireniamu"
10611234,"Devil in a Blue Dress"
61612341,"Bully"
102612341,"Defenders: Taking the First, The"
1650,"Go Fish"
43512341,"Black Rose of Harlem"

strtok不光會把第一個delim(',')給標記成'0',並且會把全部的','都給標記爲'0'，當遇到這種行的時候，value就出問題了，得改。

6712341,"Adventures of Pluto Nash, The"
這行有倆逗號，strtok會把倆逗號都標記爲'\0'，得改。

解析從文件讀來的每一行：每一行最多解析爲２個參數([KEY] [VALUE])。

void parse_file(char * string){
    char * buf;
    char * delim;
    delim = NULL;
    delim = ",";
    char * token;
    token = NULL;
    
    buf = malloc(1000*sizeof(char));
    memset(buf,0, 1000*sizeof(char));
    
    if (!parf0){
        parf0 =malloc(500*sizeof(char));
    }
    
    memset(parf0,0, 500*sizeof(char));
    
    if (!parf1){
        parf1 =malloc(500*sizeof(char));        
    }
    
    memset(parf1,0, 500*sizeof(char));

    strcpy(buf, string);
    token = strtok(buf, delim);
    if(token != NULL) {
        strcpy(parf0, token);
    }
    token = strtok(NULL, delim);
    if (token != NULL){
        strcpy(parf1, token);
    }
    free(buf);
}

上面的解析從文件每一行的key-value會致使value不全。2019年3月20號，修改以下：

void parse_file(char * string){
    char * buf;
    char * delim;
    delim = NULL;
    delim = ",";
    char * token;
    token = NULL;

    buf = malloc(1000*sizeof(char));
    memset(buf,0, 1000*sizeof(char));

    if (!parf0){
        parf0 =malloc(500*sizeof(char));
    }
    
    memset(parf0,0, 500*sizeof(char));
    if (!parf1){
        parf1 =malloc(500*sizeof(char));
    }
    
    memset(parf1,0, 500*sizeof(char));
    strcpy(buf, string);
    strcpy(parf1 , strstr(buf,delim));


    token = strtok(buf, delim);
    if(token != NULL) {
        strcpy(parf0, token);
    }
    int fori = 0;
    for (fori = 0;fori < strlen(parf1) - 2;fori++){
        parf1[fori] = parf1[fori + 2];
    }
    
    parf1[fori -2] = '\0';
    free(buf);
}

即首先利用char * strstr()來返回第一個帶有','的字符串，而後再用strtok標記逗號爲'0',最後寫一個簡單的循環，讓parf1的全部字符向前覆蓋兩個字符，將末尾字符賦'0'，修改完成。

strtol將字符型的數據轉換成long int型:
long int strtol(const char nptr,char *endptr,int base);
strtol不只能夠識別十進制整數，還能夠識別其它進制的整數，取決於base參數，base爲10，則識別十進制整數。

all_items[bcount].key = strtol(parf0,NULL ,10);
        bcount++;
        hash_user_id = strtol(parf0,NULL ,10);
        strcpy(hash_name,parf1);

解析從命令行輸入的命令（[COMMANDS] [KEY] [VALUE])

從文件讀取每一行是 [KEY] [VALUE]的格式，可是從命令行讀取是[COMMANDS] [KEY] [VALUE]的格式。
我將copy_string,par1，par2，par3定義在了別處。
這樣就能夠解析從stdin輸入的句子了，被空格隔開的句子最多分爲發出3路給par1，par2，par3。若是stdin輸入的句子只包含了一個空格（也就是含有[COMMANDS] [KEY]的結構）則只能被分發爲2路給par1和par2.
malloc()以後要free(),我在別處free() 了。

void parseinput(char *string){
    char * delim;
    delim = " ";
    copy_string = malloc(150*sizeof(char));
    memset(copy_string,0,150* sizeof(char));
    char * token;
    token = NULL;
    
    par1 = malloc(50*sizeof(char));
    par2 = malloc(50*sizeof(char));
    par3 = malloc(50*sizeof(char));
    memset(par1,0,50*sizeof(char));
    memset(par2,0,50*sizeof(char));
    memset(par3,0,50*sizeof(char));

    strcpy(copy_string,string);
    printf("%s is copystring .\n ",copy_string);
    printf("%s is string . \n",string);

    token = strtok(copy_string,delim);
    if (token != NULL){
        printf("%s is token1 \n",token);
        strcpy(par1,token);
    }

    token = strtok(NULL,delim);

    if (token != NULL){
        printf("%s is token2 \n",token);

        strcpy(par2,token);
    }
    token = strtok(NULL,delim);
    if (token != NULL){
        printf("%s is token3 \n",token);

        strcpy(par3,token);
    }
    free(copy_string);
}

設置枚舉COMMANDS，配合switch語句來對不一樣COMMANDS進行處理

enum commands{
    FIND=0,
    INSERT,
    DELETE,
    PRINT,
};

switch (COMMANDS) {
            case FIND:
//                printf("NOW IN FIND SWITCH \n");
                finds = find_user(strtol(par2, NULL, 10));
                if (finds != NULL) {

                    strcpy(out_buff, finds->name);
                } else {
                    strcpy(out_buff, "key  NOT FOUND!");
                }
                printf("FIND OUT PUT IS :%s\n", out_buff);
                break;
            case INSERT:
                printf("RECEIVE INSERT\n");
                finds = find_user(strtol(par2, NULL, 10));
                if (finds == NULL) {

                    printf("The key you want to insert doesn't in MMDB\n .......Inerting now......\n");
                } else {
                    printf( "key already EXIST!!!\n!");
                    break;
                }
                *insertkey = strtol(par2, NULL, 10);
                printf("inserkey = %ld\n", *insertkey);
                ret = ttree_insert(&ttree, insertkey);

                strcpy(hash_name, par3);
                hash_user_id = strtol(par2, NULL, 10);
                add_user(hash_user_id, hash_name);
                if (ret < 0) {

                    fprintf(stdout, "Failed to insert  key %ld!\n",  strtol(par2,NULL,100));

                }else{
                    printf("SUCCESSFULLY INSERTED %ld",hash_user_id);
                }

                ////insert to ttree ,& insert to hash_table////
                break;
            case DELETE:
                *insertkey = strtol(par2, NULL, 10);

                finds = find_user(*insertkey);
                if(finds == NULL){
                    printf("KEY DOESN'T EXIT\n");
                    break;

                }
                else{
                    printf("key  %ld deleted ! ", *insertkey);
                }

                ttree_delete(&ttree, &insertkey);
                delete_user(finds);
                res = ttree_delete(&ttree, insertkey);
                if (res == NULL) {
                    printf("Failed to delete item %ld on step ",strtol(par2, NULL, 10));
                }

                break;
            case PRINT:
                printf("go print\n");


                for (s = test1_users; s != NULL; s = s->hh.next) {


                    memset(printid, 0, 500 * sizeof(char));
                    memset(printname, 0, 500 * sizeof(char));
                    strcpy(printname, s->name);

                    sprintf(printidstring,"%ld",s->user_id);
                    strcpy(printid, printidstring);
                    strcat(print_very_long, printid);;
                    strcat(print_very_long, printname);
                }
                printf("%s",print_very_long);

                break;
            default:
                printf("this is default\n");
                strcpy(out_buff, "switch go to default");
                break;
        }

初始化T*-tree

#define ttree_init(ttree, num_keys, is_unique, cmpf, data_struct, key_field) _ttree_init(ttree, num_keys, is_unique, cmpf, offsetof(data_struct, key_field))

int __ttree_init(Ttree *ttree, int num_keys, bool is_unique, ttree_cmp_func_fn cmpf, size_t key_offs);
...........

    ret = ttree_init(&ttree, 8, false, __cmpfunc, struct item, key);
    if (ret < 0) {
        fprintf(stderr, "Failed to initialize T*-tree. [ERR=%d]\n", ret);
        free(all_items);
        exit(EXIT_FAILURE);
    }

將讀取的每一行插入t*tree，並將key-value插入hashtable

在一個循環中解析每一行，當真個文件的全部行都讀完則跳出循環。

while (fgets(file_line, 1000, fp)) {
        parse_file(file_line);
        all_items[bcount].key = strtol(parf0, NULL, 10);
        hash_name = malloc(500 * sizeof(char));
        memset(hash_name, 0, 500 * sizeof(char));
        hash_user_id = strtol(parf0, NULL, 10);
        strcpy(hash_name, parf1);
        s = find_user(hash_user_id);
        if (s == NULL) { add_user(hash_user_id, hash_name); }
        free(hash_name);
        memset(file_line, 0, 1000 * sizeof(char));
    }
    
    
    for (i = 0; i < num_keys; i++) {
        ret = ttree_insert(&ttree, &all_items[i]);
        if (ret < 0) {
            fprintf(stderr, "Failed to insert item %d with key %ld! [ERR=%d]\n", i, all_items[i].key, ret);
            free(all_items);
            exit(EXIT_FAILURE);
        }
    }

打印出t*tree的全部key

for (i = 0; i < num_keys; i++) {
        printf("%ld ", all_items[i].key);
    }

給t*tree的全部key排序

從小到大排序，遞歸實現。

printf("\nSorted keys:\n");
    printf("{ ");
    tnode = ttree_node_leftmost(ttree.root);
    while (tnode) {
        tnode_for_each_index(tnode, i) {

            printf("%d ", *(int *) tnode_key(tnode, i));
        }
        tnode = tnode->successor;
    }
    printf("}\n");

程序結束前free()，釋放內存空間

free(par1);
free(par2);
free(par3);
fclose(fp);
ttree_destroy(&ttree);
free(all_items);

附件&代碼

github代碼

本文章全部的代碼都在個人github裏：Slarsar's github/key_value_mmdb
關於T*tree的參考代碼：Github/bernardobreder/demo-ttree
參考的ttree的headfile：Github/dkruchinin/libttree
hashtable的headfile：Github/troydhanson/uthash

待加入功能

數據庫操做日誌......

從不一樣的目錄地址讀入數據庫......

數據庫備份......

TCP/IP 遠程操做......

參考文獻

[1].Tobin J. Lehman and Michael J. Carey. 1986. A Study of Index Structures for Main Memory Database Management Systems. In Proceedings of the 12th International Conference on Very Large Data Bases (VLDB '86), Wesley W. Chu, Georges Gardarin, Setsuo Ohsuga, and Yahiko Kambayashi (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 294-303.
[2].Kong-Rim Choi and Kyung-Chang Kim, "T*-tree: a main memory database index structure for real time applications," Proceedings of 3rd International Workshop on Real-Time Computing Systems and Applications, Seoul, South Korea, 1996, pp. 81-88.
doi: 10.1109/RTCSA.1996.554964
[3].wikipidia about T-tree
[4].An Open-source T*-tree Library