上次簡述了使用poi讀取大xls文件,這裏說下讀取xlsx格式的文件的方法html
先準備一個大的excel文件(xlsx大小5M),再將jvm的heap縮小到100m(JVM 參數 -Xmx100m)用於模擬OOM
並使用參數在OOM時dump內存 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=d://dump.hprofjava
在gradle中引入解析xlsx須要的jar包apache
compile 'org.apache.poi:poi:3.15' compile 'org.apache.poi:poi-ooxml:3.15' compile 'xerces:xercesImpl:2.11.0'
以後讀取xlsx文件api
public static void main(String [] args) throws IOException { InputStream is = new FileInputStream("d://large.xlsx"); Workbook wb = new XSSFWorkbook(is); }
運行以後app
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3236) at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178) at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource$FakeZipEntry.<init>(ZipInputStreamZipEntrySource.java:136) at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:56) at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:99) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:342) at org.apache.poi.util.PackageHelper.open(PackageHelper.java:37) at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:285) at blog.excel.Xlsx.main(Xlsx.java:17) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
一樣報出了OOM,緣由也是處理xlsx時會將數據徹底讀入內存,致使內存溢出。jvm
POI也爲xlsx提供了流式讀取的方式,用於減少內存的使用xss
public class ExampleEventUserModel{ public void processOneSheet(String filename) throws Exception { OPCPackage pkg = OPCPackage.open(filename); XSSFReader r = new XSSFReader( pkg ); SharedStringsTable sst = r.getSharedStringsTable(); XMLReader parser = fetchSheetParser(sst); // 得到第一個sheet InputStream sheet2 = r.getSheet("rId1"); InputSource sheetSource = new InputSource(sheet2); parser.parse(sheetSource); sheet2.close(); } public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException { XMLReader parser = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser" ); ContentHandler handler = new SheetHandler(sst); parser.setContentHandler(handler); return parser; } /** * 處理sax的handler */ private static class SheetHandler extends DefaultHandler { private SharedStringsTable sst; private String lastContents; private boolean nextIsString; private SheetHandler(SharedStringsTable sst) { this.sst = sst; } //元素開始時的handler public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException { // c => 單元格 if(name.equals("c")) { System.out.print(attributes.getValue("r") + " - "); // 獲取單元格類型 String cellType = attributes.getValue("t"); if(cellType != null && cellType.equals("s")) { nextIsString = true; } else { nextIsString = false; } } lastContents = ""; } //元素結束時的handler public void endElement(String uri, String localName, String name) throws SAXException { if(nextIsString) { int idx = Integer.parseInt(lastContents); lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString(); nextIsString = false; } // v => 單元格內容 if(name.equals("v")) { System.out.println(lastContents); } } //讀取元素間內容時的handler public void characters(char[] ch, int start, int length) throws SAXException { lastContents += new String(ch, start, length); } } public static void main(String[] args) throws Exception { ExampleEventUserModel example = new ExampleEventUserModel(); example.processOneSheet("d://large.xlsx"); } }
一樣的使用這種方法能夠流式讀取打的xlsx文件,可是隻限於讀取內部的數據,並且沒法進行修改操做。以後會介紹寫大文件的方法fetch