從Excel導出宕機到初學Apache POI




當數據量超出65536條後,在使用HSSFWorkbook或XSSFWorkbook,程序會報OutOfMemoryError:Javaheap space;內存溢出錯誤。這時應該用SXSSFworkbook。

嗯,關於POI是啥我一點也不知道,更別說XSSFWorkbook etc。經過此次定位問題簡單知道POI是一個文件OI的工具,具體機制以及內部原理是啥,怎麼進行編碼徹底是一臉懵逼狀態。@_@web


Why should I use Apache POI?

來源:Apache POI
A major use of the Apache POI api is for Text Extraction applications such as web spiders, index builders, and content management systems.網頁爬蟲

So why should you use POIFS, HSSF or XSSF?windows

You'd use POIFS if you had a document written in OLE 2 Compound Document Format, probably written using MFC, that you needed to read in Java. Alternatively, you'd use POIFS to write OLE 2 Compound Document Format if you needed to inter-operate with software running on the Windows platform. We are not just bragging when we say that POIFS is the most complete and correct implementation of this file format to date!api

You'd use HSSF if you needed to read or write an Excel file using Java (XLS). You'd use XSSF if you need to read or write an OOXML Excel file using Java (XLSX). The combined SS interface allows you to easily read and write all kinds of Excel files (XLS and XLSX) using Java. Additionally there is a specialized SXSSF implementation which allows to write very large Excel (XLSX) files in a memory optimized way.app

OLE: (Object Linkingand Embedding)對象連接和嵌入

MFC:(Microsoft Foundation Classes)微軟基礎類庫框架

OOXML: (Office Open XML standards)微軟公司爲Office 2007產品開發的技術規範,現已成爲國際文檔格式標準,兼容前國際標準開放文檔格式和中國文檔標準「標文通」(外語簡稱:UOF)。dom

Excel workbooks (SS=HSSF+XSSF)





操做系統:windows 10 專業版

處理器:Intel Core(TM) i5-4200M CPU @2.5GHz


JVM: Java HotSpot(TM) 64-Bit Server VM (25.72-b15, mixed mode)

Java: 版本 1.8.0_72, 供應商 Oracle Corporation


package my.poi;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;

public class XSSF {

    public static void main(String[] args) throws IOException, InterruptedException {
        // Thread.sleep(3 * 1000);  這個是爲了可以使用監控到內存的使用
        (new XSSF()).generateXLSX(8, 5000, 500);

    public void generateXLSX(int sheetNum, int rowNum, int column) {
        String fileName = FILE_PATH + FILE_NAME_PREFIX + (rowNum * sheetNum) + SEPARATOR + new Random().nextLong() + FILE_NAME_SUFFIX;
        OutputStream out = new FileOutputStream(fileName);
        Workbook workbook = generateSheet(sheetNum, rowNum, column);

    private Workbook generateSheet(int sheetNum, int rowNum, int column) throws IOException {
        Workbook workbook = new XSSFWorkbook();  //其實就是就是new一個對象的問題@_@
        for (int sheetIndex = 0; sheetIndex < sheetNum; sheetIndex++) {
            String sheetName = SHEET_NAME_PREFIX + SEPARATOR + sheetIndex;
            Sheet sheet = workbook.createSheet(sheetName);
            for (int i = 0; i < rowNum; i++) {
                Row row = sheet.createRow(i);
                for (int j = 0; j < column; j++) {
                    Cell cell = row.createCell(j);
                    cell.setCellValue(sheetName + "-" + i + "-" + j);
        return workbook;

package my.poi;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;

public class SXSSF {

    public static void main(String[] args) throws IOException, InterruptedException {
        // Thread.sleep(3 * 1000);  這個是爲了可以使用監控到內存的使用
        (new XSSF()).generateXLSX(8, 5000, 500);

    public void generateXLSX(int sheetNum, int rowNum, int column) {
        String fileName = FILE_PATH + FILE_NAME_PREFIX + (rowNum * sheetNum) + SEPARATOR + new Random().nextLong() + FILE_NAME_SUFFIX;
        OutputStream out = new FileOutputStream(fileName);
        Workbook workbook = generateSheet(sheetNum, rowNum, column);

    private Workbook generateSheet(int sheetNum, int rowNum, int column) throws IOException {
        Workbook workbook = new SXSSFWorkbook();  //其實就是就是new一個對象的問題@_@
        for (int sheetIndex = 0; sheetIndex < sheetNum; sheetIndex++) {
            String sheetName = SHEET_NAME_PREFIX + SEPARATOR + sheetIndex;
            Sheet sheet = workbook.createSheet(sheetName);
            for (int i = 0; i < rowNum; i++) {
                Row row = sheet.createRow(i);
                for (int j = 0; j < column; j++) {
                    Cell cell = row.createCell(j);
                    cell.setCellValue(sheetName + "-" + i + "-" + j);
        return workbook;



大小:1,198,522,368 個字節 大小:3,165,650,944 個字節
已使用:434,768,696 個字節 已使用:2,687,171,360 個字節
最大:3,193,962,496 個字節 最大:3,193,962,496 個字節
執行時間:2min 執行時間:2h+,(沒等到,要先睡了)
生成文件大小:98.7 MB (103,504,645 字節) 預計同樣大(尚未生成過)
類:1,345 實例:10,322,964 字節:422,870,624 類:943 實例:52,630,412 字節:2,687,336,880



SXSSF 比 XSSF 佔用內存低的原理



來自官方的說明 SXSSF (Streaming Usermodel API):

SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

You can specify the window size at workbook construction time via new SXSSFWorkbook(int windowSize) or you can set it per-sheet via SXSSFSheet#setRandomAccessWindowSize(int windowSize)

When a new row is created via createRow() and the total number of unflushed records would exceed the specified window size, then the row with the lowest index value is flushed and cannot be accessed via getRow() anymore.

The default window size is 100 and defined by SXSSFWorkbook.DEFAULT_WINDOW_SIZE.

A windowSize of -1 indicates unlimited access. In this case all records that have not been flushed by a call to flushRows() are available for random access.

Note that SXSSF allocates temporary files that you must always clean up explicitly, by calling the dispose method.

SXSSFWorkbook defaults to using inline strings instead of a shared strings table. This is very efficient, since no document content needs to be kept in memory, but is also known to produce documents that are incompatible with some clients. With shared strings enabled all unique strings in the document has to be kept in memory. Depending on your document content this could use a lot more resources than with shared strings disabled.

Please note that there are still things that still may consume a large amount of memory based on which features you are using, e.g. merged regions, hyperlinks, comments, ... are still only stored in memory and thus may require a lot of memory if used extensively.

Carefully review your memory budget and compatibility needs before deciding whether to enable shared strings or not.


package my.poi;

import junit.framework.Assert;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.streaming.SXSSFSheet;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import org.junit.Test;

import java.io.FileOutputStream;
import java.io.IOException;

public class SXSSFTest {
    public void autoFlush() throws Throwable {
        // keep 100 rows in memory, exceeding rows will be flushed to disk
        try (SXSSFWorkbook wb = new SXSSFWorkbook(100);) {
            Sheet sh = wb.createSheet();
            for (int rownum = 0; rownum < 1000; rownum++) {
                Row row = sh.createRow(rownum);
                for (int cellnum = 0; cellnum < 10; cellnum++) {
                    Cell cell = row.createCell(cellnum);
                    String address = new CellReference(cell).formatAsString();
            // Rows with rownum < 900 are flushed and not accessible
            for (int rownum = 0; rownum < 900; rownum++) {
            // ther last 100 rows are still in memory
            for (int rownum = 900; rownum < 1000; rownum++) {

         try (FileOutputStream out = new FileOutputStream("sxssf.xlsx")) {

            // dispose of temporary files backing this workbook on disk

    public void manuallyFlush(String[] args) throws Throwable {
        // turn off auto-flushing and accumulate all rows in memory
        try (SXSSFWorkbook wb = new SXSSFWorkbook(-1)) {
            SXSSFSheet sh = (SXSSFSheet)wb.createSheet();
            for (int rownum = 0; rownum < 1000; rownum++) {
                Row row = sh.createRow(rownum);
                for (int cellnum = 0; cellnum < 10; cellnum++) {
                    Cell cell = row.createCell(cellnum);
                    String address = new CellReference(cell).formatAsString();
                // manually control how rows are flushed to disk
                if (rownum % 100 == 0) {
                    sh.flushRows(100); // retain 100 last rows and flush all others
                    // ((SXSSFSheet)sh).flushRows() is a shortcut for ((SXSSFSheet)sh).flushRows(0),
                    // this method flushes all rows

            try (FileOutputStream out = new FileOutputStream("sxssf.xlsx")) {
            // dispose of temporary files backing this workbook on disk





<!-- https://mvnrepository.com/artifact/org.apache.poi/poi -->
        <!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml -->
        <!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml-schemas -->
        <!-- https://mvnrepository.com/artifact/org.apache.poi/poi-scratchpad -->
        <!-- https://mvnrepository.com/artifact/org.apache.poi/poi-excelant -->

