我相信不少人在作項目的都碰到過Excel數據導出的需求,我從最開始使用最原始的HTML拼接(將須要導出的數據拼接成TABLE標籤)到後來happy的使用開源的NPOI, EPPlus等開源組件導出EXCEL,但不久前,我在一個項目碰到一個需求:要將幾個分別有近60多萬的數據源導出到Excel中,咱們先不要討論這個需求自己是否合理,客戶就是要這樣。我前後用NPOI和EPPlus,都發現同一個問題:OutOfMemoryException,我電腦12G內存竟然不夠用?git
的確內存溢出了,但內存還剩下好幾個G的,就會溢出,我用 .NET作的網站,開發的時候Host應該是Visual Studio安裝的IIS Express, 應該是VS自己的限制,不過在網上查閱資料也沒發現這的確也是困擾一些人的,也沒查到什麼結果,好在還有Google, 躍過牆外,在Stack Overflow上查到資料: OpenXML , 這不是什麼新技: Office 2007在設計的時候, 爲了更好的和其它應用程序交互,使用了XML + ZIP技術來實現excel, world, PPT等組件的本地保存, 咱們所使用xlsx, dox, pptx文件本質上就一個ZIP壓縮包,包內是組織好的XML文件,也就是說,咱們能夠經過生成, 修改, 生成合規的XML文件,再壓縮成ZIP包,這就是一個能夠被Office識別的文件了。github
用圖說話:app
在園子裏其實也有很多人介紹過 Open XML, 我想就多一個視角來介紹Open XML吧,好像也有很長時間沒人寫關於這個博文。ide
什麼是Office Open XML?網站
咱們來看下維基百科的定義:ui
Office Open XML (also informally known as OOXML or Microsoft Open XML (MOX)[2) is a zipped, XML-based file format developed by Microsoft[3] for representing spreadsheets, charts, presentations and word processing documents. The format was initially standardized by Ecma (as ECMA-376), and by the ISO and IEC (as ISO/IEC 29500) in later versions.this
Starting with Microsoft Office 2007, the Office Open XML file formats have become the default[4] target file format of Microsoft Office.[5][6] Microsoft Office 2010 provides read support for ECMA-376, read/write support for ISO/IEC 29500 Transitional, and read support for ISO/IEC 29500 Strict.[7] Microsoft Office 2013 and Microsoft Office 2016 additionally support both reading and writing of ISO/IEC 29500 Strict.[8]respa
refer: https://en.wikipedia.org/wiki/Office_Open_XML設計
從Office 2007開始,就開始使用XML文件格式做爲Microsoft Office的默認保存方式,其實咱們一般用的NPOI office 2007部分和EPPlus就是使用Open XML來開發的。excel
爲何同是使用Open XML, NPOI和EPPLus會出現內存溢出的問題?
這兩個開源組件有對Office套件有着很全面的支持,它們會把數據加載到內存中一次性處理,若是碰到數據量過大,就極可能 遇到這個問題,網上EPPlus在20多萬條數據的就溢出了,NPOI在11多萬的時候就會溢出, 這個是和數據的列數和內容有關係,無論怎樣,咱們之後多是會碰到這種大量數據的EXCEL導出,咱們不須要很複雜的功能,就是想要導出一個EXCEL列表,這實際上是能夠作到的。
Open XML怎樣作不會內存溢出?
NPOI和EPPlus在導出大量數據 的Excel列表時可能 會發生內存溢出的問題,緣由是它們都把數據保存在內存中,由於它們支持各類複雜的功能,那麼簡單的列表,就是數量超大,咱們把它經過文件流寫入磁盤,這個問題就解決了。
如何使用OPEN XML?
咱們須要去微軟官網下載OFFICE OPEN XML的SDK,連接: https://www.microsoft.com/en-hk/download/details.aspx?id=30425,推薦使用NuGet在VISULAL STUDIO直接將引用添加到Project。
在GitHub還有一些示例代碼:https://github.com/OfficeDev/Open-XML-SDK
代碼實現
說了這麼多廢話,咱們看如何用OPEN XML實現一個EXCEL列表的導出:
從原理上講就是用OpenXML一個一個把標籤寫入本地磁盤。
我截取我寫的導出類的幾個方法來來解釋:
/// <summary> /// 指定磁盤路徑初始化OpenWorkDoucment /// </summary> /// <param name="fileName"></param> private void OpenWorkDocument(string fileName) { document = SpreadsheetDocument.Create(fileName, SpreadsheetDocumentType.Workbook); }
///<summary>
///用datatable做爲數據源,實際狀況能夠根據須要調整
///</summary>
public void AddSheet(DataTable dt, string sheetName)
{ if (dt == null || dt.Rows.Count == 0) { throw new ArgumentNullException(nameof(dt), "data source can not be null"); } if (document == null) { throw new ArgumentNullException(nameof(document), "please init document first"); } //this list of attributes will be used when writing a start element List<OpenXmlAttribute> attributes;
//這是咱們爲何不會溢出的關鍵點, 使用XmlWriter寫入磁盤 OpenXmlWriter writer; WorksheetPart workSheetPart = document.WorkbookPart.AddNewPart<WorksheetPart>(); writer = OpenXmlWriter.Create(workSheetPart);
//使用OpenXML麻煩的地方就是咱們要用SDK去拼接XML內容 writer.WriteStartElement(new Worksheet()); writer.WriteStartElement(new SheetViews()); //sheetViews writer.WriteStartElement(new SheetView() //sheetView { TabSelected = true, WorkbookViewId = 0U //這裏的下標是從0開始的 }); //這裏是凍結列頭,別問爲何是A2,我試了A1不行 Pane pane = new Pane() { State = new EnumValue<PaneStateValues>(PaneStateValues.Frozen), VerticalSplit = new DoubleValue((double)1), TopLeftCell = new StringValue("A2"), ActivePane = new EnumValue<PaneValues>(PaneValues.BottomLeft) }; //對於一些文檔自己的結構的描述,咱們能夠直接把準備屬性設置正確,直接寫入,由於描述實例很佔用資源小,固然咱們也能夠把描述結點的子節點,子子節點都經過WriteStartElememt寫入,不過很麻煩,容易出錯 writer.WriteStartElement(pane); //Pane writer.WriteEndElement(); //Pane writer.WriteStartElement(new Selection() { Pane = new EnumValue<PaneValues>(PaneValues.BottomLeft) }); writer.WriteEndElement(); //Selection 關閉標籤 writer.WriteEndElement(); //sheetView 關閉標籤 writer.WriteEndElement(); //sheetViews 關閉標籤 writer.WriteStartElement(new SheetData()); var rowIndex = 0; foreach (DataRow row in dt.Rows) { //build header if (rowIndex == 0) { //create a new list of attributes attributes = new List<OpenXmlAttribute>(); // add the row index attribute to the list attributes.Add(new OpenXmlAttribute("r", null, (rowIndex + 1).ToString())); //header start writer.WriteStartElement(new Row(), attributes); foreach (DataColumn col in dt.Columns) { attributes = new List<OpenXmlAttribute>(); //這裏注意,在Excel在處理字符串的時候,會將全部的字符串保存到sharedStrings.xml, cell內寫入在sharedString.XML的索引, 屬性t(type)設置爲s(str)//咱們在導出excel的時候把sharedString.mxl考慮進來會加大複雜程度,因此將t設置爲str, 一個不存在的type, excel會直接解析cell內的字串值 attributes.Add(new OpenXmlAttribute("t", null, "str")); //經過s指定style樣式的下標 attributes.Add(new OpenXmlAttribute("s", null, FORMAT_INDEX_HEADER.ToString()));
//能過r指定單元格位置,好像不是必需, 注意這裏下標位置是從1開始的 attributes.Add(new OpenXmlAttribute("r", "", string.Format("{0}{1}", GetColumnName(col.Ordinal + 1), rowIndex + 1))); writer.WriteStartElement(new Cell(), attributes); writer.WriteElement(new CellValue(col.ColumnName)); writer.WriteEndElement(); } //header end writer.WriteEndElement(); rowIndex++; } //數據寫入,咱們經過xmlWriter不會觸發異常//create a new list of attributes attributes = new List<OpenXmlAttribute>(); // add the row index attribute to the list attributes.Add(new OpenXmlAttribute("r", null, (rowIndex + 1).ToString())); //header start writer.WriteStartElement(new Row(), attributes); foreach (DataColumn col in dt.Columns) { attributes = new List<OpenXmlAttribute>(); switch (col.DataType.ToString()) { case "System.Int32": attributes.Add(new OpenXmlAttribute("s", null, FORMAT_INDEX_INT.ToString())); attributes.Add(new OpenXmlAttribute("t", null, "n")); //number break; case "System.Double": case "System.Decimal": case "System.Float": attributes.Add(new OpenXmlAttribute("s", null, FORMAT_INDEX_DEC.ToString())); //header style attributes.Add(new OpenXmlAttribute("t", null, "n")); //number break; default: attributes.Add(new OpenXmlAttribute("s", null, FORMAT_INDEX_STR.ToString())); //header style attributes.Add(new OpenXmlAttribute("t", null, "str")); //string break; } //add the cell reference attribute attributes.Add(new OpenXmlAttribute("r", null, string.Format("{0}{1}", GetColumnName(col.Ordinal + 1), rowIndex + 1))); writer.WriteStartElement(new Cell(), attributes); writer.WriteElement(new CellValue(row[col.Ordinal].ToString())); writer.WriteEndElement(); } //header end writer.WriteEndElement(); rowIndex++; } // End SheetData writer.WriteEndElement(); // End Worksheet writer.WriteEndElement(); writer.Close(); if (document.WorkbookPart.Workbook == null) { document.WorkbookPart.Workbook = new Workbook(); document.WorkbookPart.Workbook.Append(new Sheets()); } //數據寫入完成後,註冊一個sheet引用到workbook.xml, 也就是在excel最下面的sheet name var sheet = new Sheet() { Name = !String.IsNullOrWhiteSpace(sheetName) ? sheetName : ("Sheet " + DateTime.Now.ToString("ms")), SheetId = UInt32Value.FromUInt32((uint)m_sheetIndex++), Id = document.WorkbookPart.GetIdOfPart(workSheetPart) }; document.WorkbookPart.Workbook.Sheets.Append(sheet); }
//生成Style樣式, 注意下標從0開始, 依次加1, 若是有跳過1直接設置3這樣狀況, 可能沒法正常解析到樣式 private Stylesheet GenerateStylesheet() { Stylesheet styleSheet = null; Fonts fonts = new Fonts( new Font( // Index 0 - default new FontSize() { Val = 11 } ), new Font( // Index 1 - header new FontSize() { Val = 11 }, new Bold(), new Color() { Rgb = "FFFFFF" } )); Fills fills = new Fills( new Fill(new PatternFill() { PatternType = PatternValues.None }), // Index 0 - default new Fill(new PatternFill() { PatternType = PatternValues.Gray125 }), // Index 1 - default new Fill(new PatternFill(new ForegroundColor { Rgb = new HexBinaryValue() { Value = "0070c0" } }) { PatternType = PatternValues.Solid }) ); Borders borders = new Borders( new Border(), // index 0 default new Border( // index 1 black border new LeftBorder(new Color() { Auto = true }) { Style = BorderStyleValues.Thin }, new RightBorder(new Color() { Auto = true }) { Style = BorderStyleValues.Thin }, new TopBorder(new Color() { Auto = true }) { Style = BorderStyleValues.Thin }, new BottomBorder(new Color() { Auto = true }) { Style = BorderStyleValues.Thin }, new DiagonalBorder()) ); NumberingFormats numbers = new NumberingFormats( new NumberingFormat() { NumberFormatId = 0, FormatCode = new StringValue("#,##0.00") }, new NumberingFormat() { NumberFormatId = 1, FormatCode = new StringValue("0") } ); CellFormats cellFormats = new CellFormats( // default new CellFormat() { FormatId = FORMAT_INDEX_DEFUALT }, // body string new CellFormat { FormatId = FORMAT_INDEX_STR, FontId = 0, FillId = 0, BorderId = 1, ApplyBorder = true }, // body decimal new CellFormat { FormatId = FORMAT_INDEX_DEC, FontId = 0, FillId = 0, BorderId = 1, NumberFormatId = 0, ApplyBorder = true }, //header new CellFormat { FormatId = FORMAT_INDEX_HEADER, FontId = 1, FillId = 2, BorderId = 1, ApplyFill = true }, // header // body int new CellFormat { FormatId = FORMAT_INDEX_INT, FontId = 0, FillId = 0, BorderId = 1, NumberFormatId = 1, ApplyBorder = true } ); styleSheet = new Stylesheet(numbers, fonts, fills, borders, cellFormats); return styleSheet; }
private void WriteWorkbookStyle() { if (document != null) { WorkbookStylesPart stylePart = document.WorkbookPart.AddNewPart<WorkbookStylesPart>(); var styleSheet = GenerateStylesheet(); styleSheet.Save(stylePart); } }
設置樣式,凍結首行,這些均可以簡單完成,若是須要添加圖表什麼的,仍是建議用NPOI, EPPlus等開源方案,有圖表的excel不會太大。
對於Open XML的介紹就到這裏了,有什麼錯誤的地方,請指正。