java 使用 POI 操做 XWPFDocumen 建立和讀取 Office Word 文檔基礎篇

時間 2019-11-11

標籤 java 使用 poi xwpfdocumen 建立讀取 office word 文檔基礎欄目 Java 简体版

原文原文鏈接

注：有不正確的地方還望大神可以指出，抱拳了老鐵！html

參考 API:http://poi.apache.org/apidocs/org/apache/poi/xwpf/usermodel/XWPFDocument.htmlapache

主要參考文章 1：http://www.cnblogs.com/Springmoon-venn/p/5494602.htmlapi

主要參考文章 2：http://elim.iteye.com/blog/2049110數組

主要參考文章 3：http://doc.okbase.net/oh_Maxy/archive/154764.htmlapp

1、基本屬性

建議你們使用 office word 來建立文檔。（wps 和 word 結構有些不同）ide

IBodyElement ------------------- 迭代器（段落和表格）字體

XWPFComment ------------------- 評論（我的理解應該是批註）ui

XWPFSDTthis

XWPFFooter ------------------- 頁腳spa

XWPFFootnotes ------------------- 腳註

XWPFHeader ------------------- 頁眉

XWPFHyperlink ------------------- 超連接

XWPFNumbering ------------------- 編號（我也不知是啥...）

XWPFParagraph ------------------- 段落

XWPFPictureData ------------------- 圖片

XWPFStyles ------------------- 樣式（設置多級標題的時候用）

XWPFTable ------------------- 表格

2、正文段落

一個文檔包含多個段落，一個段落包含多個 Runs，一個 Runs 包含多個 Run，Run 是文檔的最小單元

獲取全部段落：List paragraphs = word.getParagraphs();

獲取一個段落中的全部 Runs：List xwpfRuns = xwpfParagraph.getRuns();

獲取一個 Runs 中的一個 Run：XWPFRun run = xwpfRuns.get(index);

XWPFRun-- 表明具備相同屬性的一段文本

3、正文表格

一個文檔包含多個表格，一個表格包含多行，一行包含多列（格），每一格的內容至關於一個完整的文檔

獲取全部表格：List xwpfTables = doc.getTables();

獲取一個表格中的全部行：List xwpfTableRows = xwpfTable.getRows();

獲取一行中的全部列：List xwpfTableCells = xwpfTableRow.getTableCells();

獲取一格里的內容：List paragraphs = xwpfTableCell.getParagraphs();

以後和正文段落同樣

注：

表格的一格至關於一個完整的 docx 文檔，只是沒有頁眉和頁腳。裏面能夠有表格，使用 xwpfTableCell.getTables() 獲取，and so on
在 poi 文檔中段落和表格是徹底分開的，若是在兩個段落中有一個表格，在 poi 中是沒辦法肯定表格在段落中間的。（固然除非你原本知道了，這句是廢話）。只有文檔的格式固定，才能正確的獲得文檔的結構

我的理解：我不能肯定表格所處的位置（第一個段落後面，仍是第二個段落後面...）

三、頁眉：

一個文檔能夠有多個頁眉, 頁眉裏面能夠包含段落和表格

獲取文檔的頁眉：List headerList = doc.getHeaderList();

獲取頁眉裏的全部段落：List paras = header.getParagraphs();

獲取頁眉裏的全部表格：List tables = header.getTables();

以後就同樣了

4、頁腳：

頁腳和頁眉基本相似，能夠獲取表示頁數的角標

言歸正傳 ------- 乾貨：

5、經過 XWPFDocument 讀：段落 + 表格

a、獲取文檔的全部段落

InputStream is = new FileInputStream("D:\table.docx");
XWPFDocument doc = new XWPFDocument(is);
List paras = doc.getParagraphs();

獲取段落內容

for (XWPFParagraph para : paras) {// 當前段落的屬性 //CTPPr pr = para.getCTP().getPPr();
System.out.println(para.getText());
}

b、獲取文檔中全部的表格

List tables = doc.getTables();
List rows;
List cells; for (XWPFTable table : tables) { // 表格屬性
    CTTblPr pr = table.getCTTbl().getTblPr(); // 獲取表格對應的行
    rows = table.getRows(); for (XWPFTableRow row : rows) { // 獲取行對應的單元格
        cells = row.getTableCells(); for (XWPFTableCell cell : cells) {
            System.out.println(cell.getText());;
        }
    }
}

6、XWPFDocument 生成 word

直接 new 一個空的 XWPFDocument，以後再往這個 XWPFDocument 裏面填充內容，而後再把它寫入到對應的輸出流中。

新建一個文檔

XWPFDocument doc = new XWPFDocument(); //建立一個段落
    XWPFParagraph para = doc.createParagraph(); //一個XWPFRun表明具備相同屬性的一個區域：一段文本
    XWPFRun run = para.createRun();
　　 run.setBold(true); // 加粗
　　 run.setText("加粗的內容");
    run = para.createRun();
　　 run.setColor("FF0000");
　　 run.setText("紅色的字。");
    OutputStream os = new FileOutputStream("D:\\simpleWrite.docx"); //把doc輸出到輸出流
　　 doc.write(os); this.close(os);

新建一個表格

//XWPFDocument doc = new XWPFDocument(); // 建立一個 5 行 5 列的表格 
    XWPFTable table = doc.createTable(5, 5); // 這裏增長的列本來初始化建立的那 5 行在經過 getTableCells()方法獲取時獲取不到，但經過 row 新增的就能夠。 //table.addNewCol(); // 給表格增長一列，變成 6 列 
table.createRow(); // 給表格新增一行，變成 6 行 
    List rows = table.getRows(); // 表格屬性 
    CTTblPr tablePr = table.getCTTbl().addNewTblPr(); // 表格寬度 
    CTTblWidth width = tablePr.addNewTblW();  
width.setW(BigInteger.valueOf(8000));
    XWPFTableRow row;
    List cells;
    XWPFTableCell cell; 
　　int rowSize = rows.size(); 
　　int cellSize; 
　　for (int i=0; i) {
        row = rows.get(i); // 新增單元格 
        row.addNewTableCell(); // 設置行的高度 
        row.setHeight(500); // 行屬性 //CTTrPr rowPr = row.getCtRow().addNewTrPr(); // 這種方式是能夠獲取到新增的 cell 的。 //List list = row.getCtRow().getTcList(); 
        cells = row.getTableCells();
        cellSize = cells.size(); for (int j=0; j) {
            cell = cells.get(j); if ((i+j)%2==0) { // 設置單元格的顏色 
                cell.setColor("ff0000"); // 紅色 
            } else {
                cell.setColor("0000ff"); // 藍色 
            } // 單元格屬性 
            CTTcPr cellPr = cell.getCTTc().addNewTcPr();
            cellPr.addNewVAlign().setVal(STVerticalJc.CENTER); if (j == 3) { // 設置寬度 
                cellPr.addNewTcW().setW(BigInteger.valueOf(3000));
            }
            cell.setText(i + "," + j);
        }
    } // 文件不存在時會自動建立 
    OutputStream os = new FileOutputStream("D:\\table.docx"); // 寫入文件 
doc.write(os); this.close(os);

7、段落內容替換

/** * 替換段落裏面的變量 
 * @param para 要替換的段落 
 * @param params 參數 */
private void replaceInPara(XWPFParagraph para, Map params) {
    List runs;
    Matcher matcher; if (this.matcher(para.getParagraphText()).find()) {
        runs = para.getRuns(); for (int i=0; i) {
            XWPFRun run = runs.get(i);
            String runText = run.toString();
            matcher = this.matcher(runText); if (matcher.find()) { while ((matcher = this.matcher(runText)).find()) {
                runText = matcher.replaceFirst(String.valueOf(params.get(matcher.group(1))));
            } // 直接調用 XWPFRun 的 setText() 方法設置文本時，在底層會從新建立一個 XWPFRun，把文本附加在當前文本後面， // 因此咱們不能直接設值，須要先刪除當前 run, 而後再本身手動插入一個新的 run。 
                para.removeRun(i);
                para.insertNewRun(i).setText(runText);
            }
        }
    }
}

直接調用 XWPFRun 的 setText() 方法設置文本時，在底層會從新建立一個 XWPFRun，把文本附加在當前文本後面，因此咱們不能直接設值，須要先刪除當前 run, 而後再本身手動插入一個新的 run。

// 抽取 word docx 文件中的圖片

String path ="D://abc.docx";
    File file = new File(path); try {
        FileInputStream fis = new FileInputStream(file);
        XWPFDocument document = new XWPFDocument(fis);
        XWPFWordExtractor xwpfWordExtractor = new XWPFWordExtractor(document);
        String text = xwpfWordExtractor.getText();
        System.out.println(text);
        List picList =document.getAllPictures(); for (XWPFPictureData pic : picList) {
            System.out.println(pic.getPictureType() + file.separator + pic.suggestFileExtension() +file.separator+pic.getFileName()); byte[] bytev = pic.getData();
            FileOutputStream fos = new FileOutputStream("D:\\abc\\docxImage\\"+pic.getFileName());
            fos.write(bytev);
        }
        fis.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

8、多級標題結構

/** * 自定義樣式方式寫 word，參考 statckoverflow 的源碼
 *
 * @throws IOException */
public static void writeSimpleDocxFile() throws IOException {
    XWPFDocument docxDocument = new XWPFDocument(); // 老外自定義了一個名字，中文版的最好仍是按照 word 給的標題名來，不然級別上可能會亂
    addCustomHeadingStyle(docxDocument, " 標題 1", 1);
    addCustomHeadingStyle(docxDocument, " 標題 2", 2); // 標題 1
    XWPFParagraph paragraph = docxDocument.createParagraph();
    XWPFRun run = paragraph.createRun();
    run.setText(" 標題 1");
    paragraph.setStyle(" 標題 1"); // 標題 2
    XWPFParagraph paragraph2 = docxDocument.createParagraph();
    XWPFRun run2 = paragraph2.createRun();
    run2.setText(" 標題 2");
    paragraph2.setStyle(" 標題 2"); // 正文
    XWPFParagraph paragraphX = docxDocument.createParagraph();
    XWPFRun runX = paragraphX.createRun();
    runX.setText("正文"); // word 寫入到文件
    FileOutputStream fos = new FileOutputStream("D:/myDoc2.docx");
    docxDocument.write(fos);
    fos.close();} /** * 增長自定義標題樣式。這裏用的是stackoverflow的源碼
 *
 * @param docxDocument 目標文檔
 * @param strStyleId 樣式名稱
 * @param headingLevel 樣式級別 */
private static void addCustomHeadingStyle(XWPFDocument docxDocument, String strStyleId, int headingLevel) {

    CTStyle ctStyle = CTStyle.Factory.newInstance();
    ctStyle.setStyleId(strStyleId);

    CTString styleName = CTString.Factory.newInstance();
    styleName.setVal(strStyleId);
    ctStyle.setName(styleName);

    CTDecimalNumber indentNumber = CTDecimalNumber.Factory.newInstance();
    indentNumber.setVal(BigInteger.valueOf(headingLevel)); // lower number > style is more prominent in the formats bar
    ctStyle.setUiPriority(indentNumber);

    CTOnOff onoffnull = CTOnOff.Factory.newInstance();
    ctStyle.setUnhideWhenUsed(onoffnull); // style shows up in the formats bar
    ctStyle.setQFormat(onoffnull); // style defines a heading of the given level
    CTPPr ppr = CTPPr.Factory.newInstance();
    ppr.setOutlineLvl(indentNumber);
    ctStyle.setPPr(ppr);

    XWPFStyle style = new XWPFStyle(ctStyle); // is a null op if already defined
    XWPFStyles styles = docxDocument.createStyles();

    style.setType(STStyleType.PARAGRAPH);
    styles.addStyle(style);

}

建立文本對象

XWPFDocument docxDocument = new XWPFDocument();

建立段落對象

XWPFParagraph paragraphX = docxDocument.createParagraph();

XWPFParagraph 段落屬性

//paragraphX.addRun(runX0);//彷佛並無什麼卵用
//paragraphX.removeRun(1);//按數組下標刪除run(文本)
paragraphX.setAlignment(ParagraphAlignment.LEFT);//對齊方式
//paragraphX.setBorderBetween(Borders.LIGHTNING_1);//邊界 （可是我設置了好幾個值都沒有效果）
//paragraphX.setFirstLineIndent(100);//首行縮進：-----效果不詳
//paragraphX.setFontAlignment(3);//字體對齊方式：1左對齊 2居中3右對齊
//paragraphX.setIndentationFirstLine(567);//首行縮進：567==1釐米
//paragraphX.setIndentationHanging(567);//指定縮進，從父段落的第一行刪除，將第一行上的縮進移回到文本流方向的開頭。
//paragraphX.setIndentationLeft(2);//-----效果不詳
//paragraphX.setIndentationRight(2);//-----效果不詳
//paragraphX.setIndentFromLeft(2);//-----效果不詳
//paragraphX.setIndentFromRight(2);//-----效果不詳
//paragraphX.setNumID(new BigInteger("3"));//設置段落編號-----有效果看不懂（僅僅是整段縮進4個字）
//paragraphX.setPageBreak(true);//段前分頁
//paragraphX.setSpacingAfter(1);//指定文檔中此段最後一行以絕對單位添加的間距。-----效果不詳
//paragraphX.setSpacingBeforeLines(2);//指定在該行的第一行中添加行單位以前的間距-----效果不詳
//paragraphX.setStyle("標題 3");//段落樣式：須要結合addCustomHeadingStyle(docxDocument, "標題 3", 3)配合使用
paragraphX.setVerticalAlignment(TextAlignment.BOTTOM);//文本對齊方式(我猜在 table 裏面會有比較明顯獲得效果)
paragraphX.setWordWrapped(true);//這個元素指定一個消費者是否應該突破拉丁語文本超過一行的文本範圍，打破單詞跨兩行（打破字符水平）或移動到如下行字（打破字級）-----(我沒看懂: 填個 false 還報異常了)