Linux cma內存的使用

CMA的全稱叫作contiguous memory allocator，它是爲了便於進行連續物理內存申請的一塊區域，通常咱們把這塊區域定義爲reserved-memory。node

早期的Linux內核中沒有cma的實現，若是驅動想要申請一個大塊的物理連續內存，那麼只能經過預留專屬內存的形式，而後在驅動中使用ioremap來映射後做爲私有內存使用。這樣帶來的後果就是有一部份內存將被預留出來不能做爲系統中的通用內存來使用，好比camera、audio設備，它們在工做時是須要大塊連續內存進行DMA操做的，而當這些設備不工做時，預留的內存也沒法被其餘模塊所使用。linux

如何使得操做系統可以充分的利用物理內存呢？好比當一些設備須要使用大塊連續物理內存時，能夠比較容易的申請到，而當這些設備不工做時，這些內存又能夠當作普通的內存那樣被系統其餘模塊申請使用。引入CMA就是爲了解決這個問題的，定義爲cma區域的內存，也是由操做系統來管理的，當一個驅動模塊想要申請大塊連續內存時，經過內存管理子系統把CMA區域的內存進行遷移，空出連續內存給驅動使用；而當驅動模塊釋放這塊連續內存後，它又被歸還給操做系統管理，能夠給其餘申請者分配使用。算法

我前面的文章有介紹過《對於MIGRATE_MOVABLE的理解》，其中有講到，buddy system在對內存進行管理時，不一樣size的內存塊是分類管理的，其中有一類就是 MIGRATE_CMA 類型，這種類型的內存必須是能夠遷移的，以保證在分配給dma使用時可以申請成功。

數組

enum {
    MIGRATE_UNMOVABLE,
    MIGRATE_RECLAIMABLE,
    MIGRATE_MOVABLE,
    MIGRATE_PCPTYPES,   /* the number of types on the pcp lists */
    MIGRATE_RESERVE = MIGRATE_PCPTYPES,
#ifdef CONFIG_CMA
    /*
     * MIGRATE_CMA migration type is designed to mimic the way
     * ZONE_MOVABLE works.  Only movable pages can be allocated
     * from MIGRATE_CMA pageblocks and page allocator never
     * implicitly change migration type of MIGRATE_CMA pageblock.
     *
     * The way to use it is to change migratetype of a range of
     * pageblocks to MIGRATE_CMA which can be done by
     * __free_pageblock_cma() function.  What is important though
     * is that a range of pageblocks must be aligned to
     * MAX_ORDER_NR_PAGES should biggest page be bigger then
     * a single pageblock.
     */
    MIGRATE_CMA,
#endif
#ifdef CONFIG_MEMORY_ISOLATION
    MIGRATE_ISOLATE,    /* can't allocate from here */
#endif
    MIGRATE_TYPES
};

CMA的定義app

按照CMA的使用範圍，它也能夠分爲兩種類型，一種是通用的CMA區域，該區域是給整個系統分配使用的，另外一種是專用的CMA區域，這種是專門爲單個模塊定義的，定義它的目的是不太但願和其餘模塊共享該區域，咱們能夠在dts中定義不一樣的CMA區域，每一個區域實際上就是一個reserved memory，對於共享的CMA

函數

reserved_memory: reserved-memory {
    #address-cells = <2>;
    #size-cells = <2>;
    ranges;

    /* global autoconfigured region for contiguous allocations */
    linux,cma {
        compatible = "shared-dma-pool";
        alloc-ranges = <0x0 0x00000000 0x0 0xffffffff>;
        reusable;
        alignment = <0x0 0x400000>;
        size = <0x0 0x2000000>;
        linux,cma-default;
    };
};

對於CMA區域的dts配置來講，有三個關鍵點：測試

第一點，必定要包含有reusable，表示當前的內存區域除了被dma使用以外，還能夠被內存管理子系統reuse。
第二點，不能包含有no-map屬性，該屬性表示是否須要建立頁表映射，對於通用的內存，必需要建立映射纔可使用，而CMA是能夠做爲通用內存進行分配使用的，所以必需要建立頁表映射。
第三點，對於共享的CMA區域，須要配置上linux,cma-default屬性，標誌着它是共享的CMA。
對於一個專用的CMA，它的配置方式以下：

ui

reserved_memory: reserved-memory {
    #address-cells = <2>;
    #size-cells = <2>;
    ranges;

    priv_mem: priv_region {
        compatible = "shared-dma-pool";
        alloc-ranges = <0x0 0x00000000 0x0 0xffffffff>;
        reusable;
        alignment = <0x0 0x400000>;
        size = <0x0 0xC00000>;
    };
};

先在reserved memory中定義專用的CMA區域，注意這裏和上面共享的惟一區別就是在專用CMA區域中是不包含 linux,cma-default; 屬性的。那麼咱們在使用時怎麼用呢？參見以下：
this

qcom,testmodule {
    compatible = "qcom,testmodule";
    memory-region = <&priv_mem>;
};

在須要使用的模塊中定義memory-region屬性，而且把對應CMA handler經過dts傳遞給該模塊。這樣在模塊中可使用spa

struct page   *page = NULL;
page = cma_alloc(dev_get_cma_area(dev)，mem_size, 0, GFP_KERNEL);

這裏利用了dev_get_cma_area能夠獲取對應的cma handler，若是獲取不到，好比對應模塊中並未定義memory-region，那麼就會返回共享的cma handler，還記的上面的 linux,cma-default; 屬性嗎，共享cma區域會被做爲缺省cma來使用。

CMA內存分配和釋放

當一個內核模塊要使用CMA內存時，使用的接口依然是dma的接口：

extern void *
dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
           gfp_t flag);

extern void
dma_free_coherent(struct device *dev, size_t size, void *cpu_addr,
            dma_addr_t dma_handle);

使能了CMA的平臺上述的兩個接口都會最終運行到以下的實現中來：

struct page *dma_alloc_from_contiguous(struct device *dev, size_t count,
                       unsigned int align, bool no_warn)
{
    if (align > CONFIG_CMA_ALIGNMENT)
        align = CONFIG_CMA_ALIGNMENT;

    return cma_alloc(dev_get_cma_area(dev), count, align, no_warn);
}

bool dma_release_from_contiguous(struct device *dev, struct page *pages,
                 int count)
{
    return cma_release(dev_get_cma_area(dev), pages, count);
}

這裏最終使用到的是CMA中的實現來分配和釋放內存。

generic dma coherent

對於dma framwork來講，當咱們使能而且配置了CMA區域時會使用CMA進行內存分配，可是內核依然對於舊的實現方式進行了兼容，能夠經過 CONFIG_HAVE_GENERIC_DMA_COHERENT 來進行配置。

obj-$(CONFIG_DMA_CMA)           += contiguous.o
obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += coherent.o removed.o

對於generic dma coherent的實現方式，依然借用了dts中的reserved memory節點，只不過在其中會定義 no-map 屬性，進而這塊內存就從系統中剝離出來了，沒法被夥伴系統所使用，可是能夠在dma核心層經過remap的形式建立頁表映射來使用它。

linux kernel中的CMA即，連續內存區管理，其提供配置爲CONFIG_CMA和CONFIG_CMA_DEBUG 毋庸置疑，其管理的是一塊塊連續內存塊。這個在物理地址上是連續的。這點跟咱們使用的夥伴算法以及虛擬地址有點不同。儘管夥伴算法中使用kmalloc申請連續物理內存也能夠，可是在長時間測試環境下，連續物理內存可能申請不到。所以，內核設計者設計了CMA，即連續物理內存管理。其定製了一塊連續物理內存，專門用於須要連續物理內存的場景，好比DMA。對於這一塊連續物理內存來講，由於物理內存有限，而且使用對象也有限，因此須要很是嚴格的限制。整個CMA區大小以及base地址和對齊都有限制。函數cma_declare_contiguous()用於對這些CMA區進行一些申明。好比base，size，limit等函數cma_init_reserved_mem()用於從保留內存塊裏面獲取一塊內存用於CMA塊。須要注意，這裏定義的塊數爲MAX_CMA_AREAS，也就是說，你用戶想使用的CMA塊個數，或者用戶數最大爲MAX_CMA_AREAS 咱們CMA就是對這MAX_CMA_AREAS個塊進行管理。以後調用函數cma_init_reserved_areas()把這些CMA塊激活。固然，咱們正常使用時，能夠調用函數cma_alloc()分配CMA內存或者cma_release()對申請的CMA內存釋放。咱們先看內核對CMA內存的一個全局約束，即函數cma_declare_contiguous()實現： /** * cma_declare_contiguous() - reserve custom contiguous area * @base: Base address of the reserved area optional, use 0 for any * @size: Size of the reserved area (in bytes), * @limit: End address of the reserved memory (optional, 0 for any). * @alignment: Alignment for the CMA area, should be power of 2 or zero * @order_per_bit: Order of pages represented by one bit on bitmap. * @fixed: hint about where to place the reserved area * @res_cma: Pointer to store the created cma region. * * This function reserves memory from early allocator. It should be * called by arch specific code once the early allocator (memblock or bootmem) * has been activated and all other subsystems have already allocated/reserved * memory. This function allows to create custom reserved areas. * * If @fixed is true, reserve contiguous area at exactly @base. If false, * reserve in range from @base to @limit. */ int __init cma_declare_contiguous(phys_addr_t base, phys_addr_t size, phys_addr_t limit, phys_addr_t alignment, unsigned int order_per_bit, bool fixed, struct cma **res_cma) { phys_addr_t memblock_end = memblock_end_of_DRAM(); phys_addr_t highmem_start; int ret = 0; #ifdef CONFIG_X86 /* * high_memory isn't direct mapped memory so retrieving its physical * address isn't appropriate. But it would be useful to check the * physical address of the highmem boundary so it's justifiable to get * the physical address from it. On x86 there is a validation check for * this case, so the following workaround is needed to avoid it. */ highmem_start = __pa_nodebug(high_memory); #else highmem_start = __pa(high_memory); #endif pr_debug("%s(size %pa, base %pa, limit %pa alignment %pa)\n", __func__, &size, &base, &limit, &alignment); if (cma_area_count == ARRAY_SIZE(cma_areas)) { pr_err("Not enough slots for CMA reserved regions!\n"); return -ENOSPC; } if (!size) return -EINVAL; if (alignment && !is_power_of_2(alignment)) return -EINVAL; /* * Sanitise input arguments. * Pages both ends in CMA area could be merged into adjacent unmovable * migratetype page by page allocator's buddy algorithm. In the case, * you couldn't get a contiguous memory, which is not what we want. */ alignment = max(alignment, (phys_addr_t)PAGE_SIZE << max_t(unsigned long, MAX_ORDER - 1, pageblock_order)); base = ALIGN(base, alignment); size = ALIGN(size, alignment); limit &= ~(alignment - 1); if (!base) fixed = false; /* size should be aligned with order_per_bit */ if (!IS_ALIGNED(size >> PAGE_SHIFT, 1 << order_per_bit)) return -EINVAL; /* * If allocating at a fixed base the request region must not cross the * low/high memory boundary. */ if (fixed && base < highmem_start && base + size > highmem_start) { ret = -EINVAL; pr_err("Region at %pa defined on low/high memory boundary (%pa)\n", &base, &highmem_start); goto err; } /* * If the limit is unspecified or above the memblock end, its effective * value will be the memblock end. Set it explicitly to simplify further * checks. */ if (limit == 0 || limit > memblock_end) limit = memblock_end; /* Reserve memory */ if (fixed) { if (memblock_is_region_reserved(base, size) || memblock_reserve(base, size) < 0) { ret = -EBUSY; goto err; } } else { phys_addr_t addr = 0; /* * All pages in the reserved area must come from the same zone. * If the requested region crosses the low/high memory boundary, * try allocating from high memory first and fall back to low * memory in case of failure. */ if (base < highmem_start && limit > highmem_start) { addr = memblock_alloc_range(size, alignment, highmem_start, limit, MEMBLOCK_NONE); limit = highmem_start; } if (!addr) { addr = memblock_alloc_range(size, alignment, base, limit, MEMBLOCK_NONE); if (!addr) { ret = -ENOMEM; goto err; } } /* * kmemleak scans/reads tracked objects for pointers to other * objects but this address isn't mapped and accessible */ kmemleak_ignore_phys(addr); base = addr; } ret = cma_init_reserved_mem(base, size, order_per_bit, res_cma); if (ret) goto err; pr_info("Reserved %ld MiB at %pa\n", (unsigned long)size / SZ_1M, &base); return 0; err: pr_err("Failed to reserve %ld MiB\n", (unsigned long)size / SZ_1M); return ret; } /** * cma_init_reserved_mem() - create custom contiguous area from reserved memory * @base: Base address of the reserved area * @size: Size of the reserved area (in bytes), * @order_per_bit: Order of pages represented by one bit on bitmap. * @res_cma: Pointer to store the created cma region. * * This function creates custom contiguous area from already reserved memory. */ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, unsigned int order_per_bit, struct cma **res_cma) { struct cma *cma; phys_addr_t alignment; /* Sanity checks */ if (cma_area_count == ARRAY_SIZE(cma_areas)) { pr_err("Not enough slots for CMA reserved regions!\n"); return -ENOSPC; } if (!size || !memblock_is_region_reserved(base, size)) return -EINVAL; /* ensure minimal alignment required by mm core */ alignment = PAGE_SIZE << max_t(unsigned long, MAX_ORDER - 1, pageblock_order); /* alignment should be aligned with order_per_bit */ if (!IS_ALIGNED(alignment >> PAGE_SHIFT, 1 << order_per_bit)) return -EINVAL; if (ALIGN(base, alignment) != base || ALIGN(size, alignment) != size) return -EINVAL; /* * Each reserved area must be initialised later, when more kernel * subsystems (like slab allocator) are available. */ cma = &cma_areas[cma_area_count]; cma->base_pfn = PFN_DOWN(base); cma->count = size >> PAGE_SHIFT; cma->order_per_bit = order_per_bit; *res_cma = cma; cma_area_count++; totalcma_pages += (size / PAGE_SIZE); return 0; } 這些reserve的內存存放到cma_areas[]數組中。須要注意，這些reserve的內存是存放計入totalcma_pages中的。因爲這些全部reserve的內存都是以cma_areas[]形式管理，因此，其管理的很是有限。函數cma_init_reserved_areas()會把早期reserve的內存放入zone管理中的MIGRATE_CMA鏈表中。 static int __init cma_init_reserved_areas(void) { int i; for (i = 0; i < cma_area_count; i++) { int ret = cma_activate_area(&cma_areas[i]); if (ret) return ret; } return 0; } core_initcall(cma_init_reserved_areas); static int __init cma_activate_area(struct cma *cma) { int bitmap_size = BITS_TO_LONGS(cma_bitmap_maxno(cma)) * sizeof(long); unsigned long base_pfn = cma->base_pfn, pfn = base_pfn; unsigned i = cma->count >> pageblock_order; struct zone *zone; cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL); if (!cma->bitmap) return -ENOMEM; WARN_ON_ONCE(!pfn_valid(pfn)); zone = page_zone(pfn_to_page(pfn)); do { unsigned j; base_pfn = pfn; for (j = pageblock_nr_pages; j; --j, pfn++) { WARN_ON_ONCE(!pfn_valid(pfn)); /* * alloc_contig_range requires the pfn range * specified to be in the same zone. Make this * simple by forcing the entire CMA resv range * to be in the same zone. */ if (page_zone(pfn_to_page(pfn)) != zone) goto err; } init_cma_reserved_pageblock(pfn_to_page(base_pfn)); } while (--i); mutex_init(&cma->lock); #ifdef CONFIG_CMA_DEBUGFS INIT_HLIST_HEAD(&cma->mem_head); spin_lock_init(&cma->mem_head_lock); #endif return 0; err: kfree(cma->bitmap); cma->count = 0; return -EINVAL; } #ifdef CONFIG_CMA /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ void __init init_cma_reserved_pageblock(struct page *page) { unsigned i = pageblock_nr_pages; struct page *p = page; do { __ClearPageReserved(p); set_page_count(p, 0); } while (++p, --i); set_pageblock_migratetype(page, MIGRATE_CMA); if (pageblock_order >= MAX_ORDER) { i = pageblock_nr_pages; p = page; do { set_page_refcounted(p); __free_pages(p, MAX_ORDER - 1); p += MAX_ORDER_NR_PAGES; } while (i -= MAX_ORDER_NR_PAGES); } else { set_page_refcounted(page); __free_pages(page, pageblock_order); } adjust_managed_page_count(page, pageblock_nr_pages); } #endif void adjust_managed_page_count(struct page *page, long count) { spin_lock(&managed_page_count_lock); page_zone(page)->managed_pages += count; totalram_pages += count; #ifdef CONFIG_HIGHMEM if (PageHighMem(page)) totalhigh_pages += count; #endif spin_unlock(&managed_page_count_lock); } EXPORT_SYMBOL(adjust_managed_page_count);