嘗試作一個.NET簡單、高效、避免OOM的Excel工具

園友好,最近晚輩嘗試作一個.NET簡單、高效、避免OOM的Excel工具 MiniExcelgit

主要目前主流框架大多將資料全載入到內存方便操做,但這會致使內存消耗問題,MiniExcel 嘗試以 Stream 角度寫底層算法邏輯,能讓本來1000多MB佔用下降到幾MB,避免內存不夠狀況。適合像是低規格 azure app service 或是讀取大文件等情境。github

特色

  • 低內存耗用,避免OOM(out of memoery)、頻繁 Full GC 狀況
  • 支持即時操做每行數據
  • 兼具搭配 LINQ 延遲查詢特性,能辦到低消耗、快速分頁等複雜查詢
    圖片:與主流框架對比的消耗、效率差
  • 輕量,不依賴任何套件,DLL小於100KB
  • 簡便操做的 Dapper API 風格

性能測試

Test1,000,000x10.xlsx 作基準與主流框架作性能測試,總共 1千萬筆 "HelloWorld",文件大小 23 MB算法

Benchmarks 邏輯能夠在 MiniExcel.Benchmarks 查看或是提交 PR,運行指令sql

dotnet run -p .\benchmarks\MiniExcel.Benchmarks\ -c Release -f netcoreapp3.1 -- -f * --join

最後一次運行結果 :數據庫

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4341.0), X64 RyuJIT
  Job-ZYYABG : .NET Framework 4.8 (4.8.4341.0), X64 RyuJIT
IterationCount=3  LaunchCount=3  WarmupCount=3
Method 最大內存耗用 平均時間 Gen 0 Gen 1 Gen 2
'MiniExcel QueryFirst' 0.109 MB 726.4 us - - -
'ExcelDataReader QueryFirst' 15.24 MB 10,664,238.2 us 566000.0000 1000.0000 -
'MiniExcel Query' 17.3 MB 14,179,334.8 us 367000.0000 96000.0000 7000.0000
'ExcelDataReader Query' 17.3 MB 22,565,088.7 us 1210000.0000 2000.0000 -
'Epplus QueryFirst' 1,452 MB 18,198,015.4 us 535000.0000 132000.0000 9000.0000
'Epplus Query' 1,451 MB 23,647,471.1 us 1451000.0000 133000.0000 9000.0000
'OpenXmlSDK Query' 1,412 MB 52,003,270.1 us 978000.0000 353000.0000 11000.0000
'OpenXmlSDK QueryFirst' 1,413 MB 52,348,659.1 us 978000.0000 353000.0000 11000.0000
'ClosedXml QueryFirst' 2,158 MB 66,188,979.6 us 2156000.0000 575000.0000 9000.0000
'ClosedXml Query' 2,184 MB 191,434,126.6 us 2165000.0000 577000.0000 10000.0000

Method 最大內存耗用 平均時間 Gen 0 Gen 1 Gen 2
'MiniExcel Create Xlsx' 15 MB 11,531,819.8 us 1020000.0000 - -
'Epplus Create Xlsx' 1,204 MB 22,509,717.7 us 1370000.0000 60000.0000 30000.0000
'OpenXmlSdk Create Xlsx' 2,621 MB 42,473,998.9 us 1370000.0000 460000.0000 50000.0000
'ClosedXml Create Xlsx' 7,141 MB 140,939,928.6 us 5520000.0000 1500000.0000 80000.0000

安裝

Install-Package MiniExcel -Version 0.10.0
# or 
dotnet add package MiniExcel --version 0.10.0

其餘請查看 from NuGetapi

Query 查詢 Excel 返回強型別 IEnumerable 數據 [Try it]

推薦使用 Stream.Query 效率會相對較好。mvc

public class UserAccount
{
    public Guid ID { get; set; }
    public string Name { get; set; }
    public DateTime BoD { get; set; }
    public int Age { get; set; }
    public bool VIP { get; set; }
    public decimal Points { get; set; }
}

var rows = MiniExcel.Query<UserAccount>(path);

// or

using (var stream = File.OpenRead(path))
    var rows = stream.Query<UserAccount>();

Query 查詢 Excel 返回Dynamic IEnumerable 數據 [Try it]

  • Key 系統預設爲 A,B,C,D...Z
MiniExcel 1
Github 2
var rows = MiniExcel.Query(path).ToList();

// or 
using (var stream = File.OpenRead(path))
{
    var rows = stream.Query().ToList();
                
    Assert.Equal("MiniExcel", rows[0].A);
    Assert.Equal(1, rows[0].B);
    Assert.Equal("Github", rows[1].A);
    Assert.Equal(2, rows[1].B);
}

查詢數據以第一行數據當Key [Try it]

note : 同名以右邊數據爲準app

Input Excel :框架

Column1 Column2
MiniExcel 1
Github 2
var rows = MiniExcel.Query(useHeaderRow:true).ToList();

// or

using (var stream = File.OpenRead(path))
{
    var rows = stream.Query(useHeaderRow:true).ToList();

    Assert.Equal("MiniExcel", rows[0].Column1);
    Assert.Equal(1, rows[0].Column2);
    Assert.Equal("Github", rows[1].Column1);
    Assert.Equal(2, rows[1].Column2);
}

Query 查詢支援延遲加載(Deferred Execution),能配合LINQ First/Take/Skip辦到低消耗、高效率複雜查詢

Query Firstless

var row = MiniExcel.Query(path).First();
Assert.Equal("HelloWorld", row.A);

// or

using (var stream = File.OpenRead(path))
{
    var row = stream.Query().First();
    Assert.Equal("HelloWorld", row.A);
}

創建 Excel 文件 [Try it]

  1. 必須是 non-abstract 類別有 public parameterless constructor
  2. MiniExcel SaveAs 支援 IEnumerable參數``延遲查詢,除非必要請不要使用 ToList 等方法讀取所有數據到內存

圖片 : 是否呼叫 ToList 的內存差異

Anonymous or strongly type:

var path = Path.Combine(Path.GetTempPath(), $"{Guid.NewGuid()}.xlsx");
MiniExcel.SaveAs(path, new[] {
    new { Column1 = "MiniExcel", Column2 = 1 },
    new { Column1 = "Github", Column2 = 2}
});

Datatable:

var path = Path.Combine(Path.GetTempPath(), $"{Guid.NewGuid()}.xlsx");
var table = new DataTable();
{
    table.Columns.Add("Column1", typeof(string));
    table.Columns.Add("Column2", typeof(decimal));
    table.Rows.Add("MiniExcel", 1);
    table.Rows.Add("Github", 2);
}

MiniExcel.SaveAs(path, table);

Dapper:

using (var connection = GetConnection(connectionString))
{
    var rows = connection.Query(@"select 'MiniExcel' as Column1,1 as Column2 union all select 'Github',2");
    MiniExcel.SaveAs(path, rows);
}

IEnumerable<IDictionary<string, object>>

var values = new List<Dictionary<string, object>>()
{
    new Dictionary<string,object>{{ "Column1", "MiniExcel" }, { "Column2", 1 } },
    new Dictionary<string,object>{{ "Column1", "Github" }, { "Column2", 2 } }
};
MiniExcel.SaveAs(path, values);

output :

Column1 Column2
MiniExcel 1
Github 2

SaveAs 支援 Stream [Try it]

using (var stream = File.Create(path))
{
    stream.SaveAs(values);
}

Excel Column Name/Ignore Attribute

e.g

input excel :

Test1 Test2 Test3 Test4 Test5 Test6 Column1 Column2
Test1 Test2 Test3 Test4 Test5 Test6 Column1 Column2
public class ExcelAttributeDemo
{
    [ExcelColumnName("Column1")]
    public string Test1 { get; set; }
    [ExcelColumnName("Column2")]
    public string Test2 { get; set; }
    [ExcelIgnore]
    public string Test3 { get; set; }
    public string Test4 { get; set; }
    public string Test5 { get; }
    public string Test6 { get; private set; }
}

var rows = MiniExcel.Query<ExcelAttributeDemo>(path).ToList();
Assert.Equal("Column1", rows[0].Test1);
Assert.Equal("Column2", rows[0].Test2);
Assert.Null(rows[0].Test3);
Assert.Equal("Test4", rows[0].Test4);
Assert.Null(rows[0].Test5);
Assert.Null(rows[0].Test6);

例子 : SQLite & Dapper 讀取大數據新增到數據庫

note : 請不要呼叫 call ToList/ToArray 等方法,這會將全部數據讀到內存內

using (var connection = new SQLiteConnection(connectionString))
{
    connection.Open();
    using (var transaction = connection.BeginTransaction())
    using (var stream = File.OpenRead(path))
    {
	   var rows = stream.Query();
	   foreach (var row in rows)
			 connection.Execute("insert into T (A,B) values (@A,@B)", new { row.A, row.B }, transaction: transaction);
	   transaction.Commit();
    }
}

效能:
image

例子 : ASP.NET Core 3.1 or MVC 5 下載 Excel Xlsx API Demo

public class ExcelController : Controller
{
    public IActionResult Download()
    {
        var values = new[] {
            new { Column1 = "MiniExcel", Column2 = 1 },
            new { Column1 = "Github", Column2 = 2}
        };
        var stream = new MemoryStream();
        stream.SaveAs(values);
        return File(stream,
            "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
            "demo.xlsx");
    }
}

Excel 類別自動判斷

MiniExcel 預設會根據擴展名或是 Stream 類別判斷是 xlsx 仍是 csv,但會有失準時候,請自行指定。

stream.SaveAs(excelType:ExcelType.CSV);
//or
stream.SaveAs(excelType:ExcelType.XLSX);
//or
stream.Query(excelType:ExcelType.CSV);
//or
stream.Query(excelType:ExcelType.XLSX);

Dynamic Query 轉換 IDictionary<string,object> 數據

foreach(IDictionary<string,object> row = MiniExcel.Query(path))
{
    //..
}

野人獻曝 Github : https://github.com/shps951023/MiniExcel

相關文章
相關標籤/搜索