使用php trie_filter擴展,cpu高負載問題排查

       今天一同事在使用php trie_filter跑腳本,循環匹配關鍵詞,發現cpu利用率達到了99%。經過strace命令跟蹤,發現進程在反覆的打開讀取和關閉.tree文件,定位到trie_filter_load這個方法被反覆調用。按正常的狀況,這個文件應該會一次讀到內存中,在內存中查詢匹配關鍵詞。爲了證明,我追蹤到擴展源碼以下:php

PHP_FUNCTION(trie_filter_load)
{
    Trie *trie;
    char *path;
    int path_len;ui

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s",
                &path, &path_len) == FAILURE) {
        RETURN_NULL();
    }進程

    trie = trie_new_from_file(path);
    if (!trie) {
        php_error_docref(NULL TSRMLS_CC, E_WARNING,
                "Unable to load %s", path);
        RETURN_NULL();
    }內存

    ZEND_REGISTER_RESOURCE(return_value, trie, le_trie_filter);
}源碼

 

trie變量保存的是一個Trie *結構體,還不能保證已讀取到內存中,繼續打開libdatrie源碼,查找到.datrie/trie.cit

Trie *
trie_new_from_file (const char *path)
{
    Trie       *trie;
    FILE       *trie_file;變量

    trie_file = fopen (path, "r");
    if (!trie_file)
        return NULL;擴展

    trie = trie_fread (trie_file);
    fclose (trie_file);
    return trie;
}file

粗看之下trie 結構體讀取並保存了文件內容,再繼續定位trie_fread循環

Trie *
trie_fread (FILE *file)
{
    Trie       *trie;

    trie = (Trie *) malloc (sizeof (Trie));
    if (!trie)
        return NULL;

    if (NULL == (trie->alpha_map = alpha_map_fread_bin (file)))
        goto exit_trie_created;
    if (NULL == (trie->da   = da_fread (file)))
        goto exit_alpha_map_created;
    if (NULL == (trie->tail = tail_fread (file)))
        goto exit_da_created;

    trie->is_dirty = FALSE;
    return trie;

exit_da_created:
    da_free (trie->da);
exit_alpha_map_created:
    alpha_map_free (trie->alpha_map);
exit_trie_created:
    free (trie);
    return NULL;
}

經過alpha_map_fread_bin方法讀取二進制文件到trie->alpha_map

AlphaMap *
alpha_map_fread_bin (FILE *file)
{
    long        save_pos;
    uint32      sig;
    int32       total, i;
    AlphaMap   *alpha_map;

    /* check signature */
    save_pos = ftell (file);
    if (!file_read_int32 (file, (int32 *) &sig) || ALPHAMAP_SIGNATURE != sig)
        goto exit_file_read;

    if (NULL == (alpha_map = alpha_map_new ()))
        goto exit_file_read;

    /* read number of ranges */
    if (!file_read_int32 (file, &total))
        goto exit_map_created;

    /* read character ranges */
    for (i = 0; i < total; i++) {
        int32   b, e;

        if (!file_read_int32 (file, &b) || !file_read_int32 (file, &e))
            goto exit_map_created;
        alpha_map_add_range (alpha_map, b, e);
    }

    return alpha_map;

exit_map_created:
    alpha_map_free (alpha_map);
exit_file_read:
    fseek (file, save_pos, SEEK_SET);
    return NULL;
}

Bool
file_read_int32 (FILE *file, int32 *o_val)
{
    unsigned char   buff[4];

    if (fread (buff, 4, 1, file) == 1) {
        *o_val = (buff[0] << 24) | (buff[1] << 16) |  (buff[2] << 8) | buff[3];
        return TRUE;
    }

    return FALSE;
}

至此已肯定trie_filter_load方法在打開tree文件後讀取到內存,所以將trie_filter_load的返回值賦給靜態變量,就不用反覆打開讀取關閉tree文件了,修改後跟蹤進程恢復正常。

相關文章
相關標籤/搜索