java操做PDF（PDFBOX和Itext框架）

時間 2019-11-06

標籤 java pdf pdfbox itext 框架欄目 Java 简体版

原文原文鏈接

java有不少能夠操做pdf的框架，pdfbox和itext就是其中的兩種 html

pdfbox有以下做用 java

提取文本，包括Unicode字符。
和Jakarta Lucene等文本搜索引擎的整合過程十分簡單。
加密/解密PDF文檔。框架

從PDF和XFDF格式中導入或導出表單數據。
向已有PDF文檔中追加內容。
將一個PDF文檔切分爲多個文檔。
覆蓋PDF文檔。

下面是一個使用pdfbox的測試程序測試

public class PdfBoxTest {

    public void getText(String file) throws Exception{
        //是否排序
        boolean sort = false;
        //pdf文件名
        String pdfFile = file;
        //輸入文本文件名稱
        String textFile = null;
        //編碼方式
        String encoding = "UTF-8";
        //開始提取頁數
        int startPage = 1;
        //結束提取頁數
        int endPage = Integer.MAX_VALUE;
        //文件輸入流，輸入文本文件
        Writer output = null;
        //內存中存儲的PDF Document
        PDDocument document = null;

        try{
            try{
                //首先看成一個URL來加載文件，若是獲得異常再從本地系統裝載文件
                URL url = new URL(pdfFile);
                document = PDDocument.load(url);
                String fileName = url.getFile();

                if(fileName.length() > 4){
                    //以原來pdf名稱來命名新產生的txt文件
                    File outputFile = new File(fileName.substring(0, fileName.length()-4) + ".txt");
                    textFile = outputFile.getName();
                }
            }catch(Exception e){
                //若是做爲URL裝載獲得異常則從文件系統裝載
                document = PDDocument.load(pdfFile);
                if(pdfFile.length() > 4){
                    textFile = pdfFile.substring(0, pdfFile.length() - 4) + ".txt";
                }
            }
            //文件輸出流，寫入文件到textFile
            output = new OutputStreamWriter(new FileOutputStream(textFile),encoding);
            //PDFTextStripper來提取文本
            PDFTextStripper stripper = new PDFTextStripper();
            //設置是否排序
            stripper.setSortByPosition(sort);
            //設置起始頁
            stripper.setStartPage(startPage);
            //設置結束頁
            stripper.setEndPage(endPage);
            //調用PDFTextStripper的writeText提取並輸出文本
            stripper.writeText(document, output);
        }finally{
            if(output != null){
                output.close();
            }
            if(document != null){
                document.close();
            }
        }
    }
    /** *//**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
        PdfBoxTest test = new PdfBoxTest();
        try{
            test.getText("E://test.pdf");
        }catch(Exception e){
            e.printStackTrace();
        }
    }
}

iText是著名的開放源碼的站點sourceforge一個項目，是用於生成PDF文檔的一個java類庫。經過iText不只能夠生成PDF或rtf的文檔，並且能夠將XML、Html文件轉化爲PDF文件。

下面是一個使用itext生成pdf的例子字體

public class ITextTest {
    public static void main(String args[]){
        writePdf();
    }

    public static void writePdf(){
        Document document = new Document();
        try {
            PdfWriter.getInstance(document, new FileOutputStream("Helloworld.pdf"));
        } catch (DocumentException e) {
            e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
        } catch (FileNotFoundException e) {
            e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
        }
        document.open();
        try {
            document.add(new Paragraph("Hello World"));
        } catch (DocumentException e) {
            e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
        }
        document.close();
    }

}

默認的iText字體設置不支持中文字體，須要下載遠東字體包iTextAsian.jar，不然不能往PDF文檔中輸出中文字體。經過下面的代碼就能夠在文檔中使用中文了：

BaseFont bfChinese = BaseFont.createFont("STSong-Light", "UniGB-UCS2-H", BaseFont.NOT_EMBEDDED);
com.lowagie.text.Font FontChinese = new com.lowagie.text.Font(bfChinese, 12, com.lowagie.text.Font.NORMAL);
Paragraph pragraph=new Paragraph("你好", FontChinese); 搜索引擎

參考文檔：http://yuleihome.iteye.com/blog/181348 編碼

http://www.cnblogs.com/hejycpu/archive/2009/01/19/1378380.html 加密

http://gohands.iteye.com/blog/160534 url

總結一下，本文采用pdfbox和itext分別演示瞭如何讀取pdf和生成pdf的簡單方法。 spa