java實現word轉pdf在線預覽（前端使用PDF.js；後端使用openoffice、aspose）

時間 2019-11-12

標籤 java 實現 word pdf 在線預覽前端使用 pdf.js 後端 openoffice aspose 欄目 Java 简体版

原文原文鏈接

背景

　　以前一直是用戶點擊下載word文件到本地，而後使用office或者wps打開。需求優化，要實現能夠直接在線預覽，無需下載到本地而後再打開。javascript

　　隨後開始上網找資料，網上資料一大堆，方案也各有不一樣，大概有這麼幾種方案：css

　　1.word轉html而後轉pdfhtml

　　2.Openoffice + swftools + Flexmapper + jodconverter前端

　　3.kkFileViewjava

　　分析以後最後決定使用Openoffice+PDF.js方式實現jquery

環境搭建

　　1.安裝Openoffice，下載地址：http://www.openoffice.org/download/index.htmllinux

　　安裝完成以後，cmd進入安裝目錄執行命令：soffice "-accept=socket,host=localhost,port=8100;urp;StarOffice.ServiceManager" -nologo -headless -nofirststartwizardgit

　　2.PDF.js，下載地址：http://mozilla.github.io/pdf.js/github

　　下載以後解壓，目錄結構以下：web

代碼實現

　　編碼方面，分前端後：

　　後端：java後端使用openoffice把word文檔轉換成pdf文件，返回流

　　前端：把PDF.js解壓後的文件加到項目中，修改對應路徑，PDF.js拿到後端返回的流直接展現

後端

　　項目使用springboot，pom文件添加依賴

<!-- openoffice word轉pdf -->
        <dependency>
            <groupId>com.artofsolving</groupId>
            <artifactId>jodconverter</artifactId>
            <version>2.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.openoffice</groupId>
            <artifactId>jurt</artifactId>
            <version>3.0.1</version>
        </dependency>
        <dependency>
            <groupId>org.openoffice</groupId>
            <artifactId>ridl</artifactId>
            <version>3.0.1</version>
        </dependency>
        <dependency>
            <groupId>org.openoffice</groupId>
            <artifactId>juh</artifactId>
            <version>3.0.1</version>
        </dependency>
        <dependency>
            <groupId>org.openoffice</groupId>
            <artifactId>unoil</artifactId>
            <version>3.0.1</version>
        </dependency>

　　application.properties配置openoffice服務地址與端口

openoffice.host=127.0.0.1
openoffice.port=8100

　　doc文件轉pdf文件

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.net.ConnectException;

import javax.servlet.http.HttpServletResponse;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;

import com.xxx.utils.Doc2PdfUtil;

@Controller
@RequestMapping("/doc2PdfController")
public class Doc2PdfController {
    @Value("${openoffice.host}")
    private String OpenOfficeHost;
    @Value("${openoffice.port}")
    private Integer OpenOfficePort;
    
    private Logger logger = LoggerFactory.getLogger(Doc2PdfController.class);
    
    @RequestMapping("/doc2pdf")
    public void doc2pdf(String fileName,HttpServletResponse response){
        File pdfFile = null;
        OutputStream outputStream = null;
        BufferedInputStream bufferedInputStream = null;
        
        Doc2PdfUtil doc2PdfUtil = new Doc2PdfUtil(OpenOfficeHost, OpenOfficePort);
        
        try {
            //doc轉pdf，返回pdf文件
            pdfFile = doc2PdfUtil.doc2Pdf(fileName);
            outputStream = response.getOutputStream();
            response.setContentType("application/pdf;charset=UTF-8");  
            bufferedInputStream = new BufferedInputStream(new FileInputStream(pdfFile));  
            byte buffBytes[] = new byte[1024];  
            outputStream = response.getOutputStream();  
            int read = 0;    
            while ((read = bufferedInputStream.read(buffBytes)) != -1) {    
                outputStream.write(buffBytes, 0, read);    
            }
        } catch (ConnectException e) {
            logger.info("****調用Doc2PdfUtil doc轉pdf失敗****");
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }  finally {
            if(outputStream != null){
                try {
                    outputStream.flush();
                    outputStream.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }    
            }
            if(bufferedInputStream != null){
                try {
                    bufferedInputStream.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
}

import java.io.File;
import java.net.ConnectException;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.artofsolving.jodconverter.DocumentConverter;
import com.artofsolving.jodconverter.openoffice.connection.OpenOfficeConnection;
import com.artofsolving.jodconverter.openoffice.connection.SocketOpenOfficeConnection;
import com.artofsolving.jodconverter.openoffice.converter.StreamOpenOfficeDocumentConverter;

public class Doc2PdfUtil {
    private String OpenOfficeHost; //openOffice服務地址
    private Integer OpenOfficePort; //openOffice服務端口
    
    public Doc2PdfUtil(){
    }

    public Doc2PdfUtil(String OpenOfficeHost, Integer OpenOfficePort){
        this.OpenOfficeHost = OpenOfficeHost;
        this.OpenOfficePort = OpenOfficePort;
    }
    
    private Logger logger = LoggerFactory.getLogger(Doc2PdfUtil.class);
    
    /**
     * doc轉pdf
     * @return pdf文件路徑
     * @throws ConnectException
     */
    public File doc2Pdf(String fileName) throws ConnectException{
        File docFile = new File(fileName + ".doc");
        File pdfFile = new File(fileName + ".pdf");
        if (docFile.exists()) {
            if (!pdfFile.exists()) {
                OpenOfficeConnection connection = new SocketOpenOfficeConnection(OpenOfficeHost, OpenOfficePort);
                try {
                    connection.connect();
                    DocumentConverter converter = new StreamOpenOfficeDocumentConverter(connection);
                    //最核心的操做，doc轉pdf
                    converter.convert(docFile, pdfFile);
                    connection.disconnect();
                    logger.info("****pdf轉換成功，PDF輸出：" + pdfFile.getPath() + "****");
                } catch (java.net.ConnectException e) {
                    logger.info("****pdf轉換異常，openoffice服務未啓動！****");
                    e.printStackTrace();
                    throw e;
                } catch (com.artofsolving.jodconverter.openoffice.connection.OpenOfficeException e) {
                    System.out.println("****pdf轉換器異常，讀取轉換文件失敗****");
                    e.printStackTrace();
                    throw e;
                } catch (Exception e) {
                    e.printStackTrace();
                    throw e;
                }
            }
        } else {
            logger.info("****pdf轉換異常，須要轉換的doc文檔不存在，沒法轉換****");
        }
        return pdfFile;
    }
}

前端

　　把pdfjs-2.0.943-dist下的兩個文件夾build、web總體加到項目中，而後把viewer.html改爲viewer.jsp，並調整了位置，去掉了默認的pdf文件compressed.tracemonkey-pldi-09.pdf，未來使用咱們生成的文件

　　viewer.jsp、viewer.js注意點：

　　1.引用的js、css路徑要修改過來

　　2.viewer.jsp中調用pdf/web/viewer.js，viewer.js中配置了默認的pdf文件路徑，咱們要動態生成pdf，所以須要修改，在jsp中定義一個參數DEFAULT_URL，而後在js中使用它

　　3.jsp中寫了一個ajax獲取pdf流，以後賦值給DEFAULT_URL，而後再讓viewer.js去加載，所以須要把/pdf/web/viewer.js放到ajax方法後面

　　4.viewer.js中把compressed.tracemonkey-pldi-09.pdf改爲咱們定義的變量DEFAULT_URL；pdf.worker.js的路徑修改爲對應路徑

<%@ page language="java" contentType="text/html; charset=utf-8"
    pageEncoding="utf-8"%>
<%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c"%>
<!DOCTYPE html>
<!--
Copyright 2012 Mozilla Foundation

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Adobe CMap resources are covered by their own copyright but the same license:

    Copyright 1990-2015 Adobe Systems Incorporated.

See https://github.com/adobe-type-tools/cmap-resources
-->
<html dir="ltr" mozdisallowselectionprint>
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
    <meta name="google" content="notranslate">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <c:set var="qtpath" value="${pageContext.request.contextPath}"/>
    <script>
        var qtpath = '${qtpath}';
        var fileName = '${fileName}';
    </script>
    
    <title>PDF.js viewer</title>


    <link rel="stylesheet" href="${qtpath}/res/pdf/web/viewer.css">


<!-- This snippet is used in production (included from viewer.html) -->
<link rel="resource" type="application/l10n" href="${qtpath}/res/pdf/web/locale/locale.properties">
<script type="text/javascript" src="${qtpath}/res/js/jquery/jquery-2.1.4.min.js"></script>
<script type="text/javascript">
    var DEFAULT_URL = "";//注意，刪除的變量在這裏從新定義  
    var PDFData = "";  
    $.ajax({  
        type:"post",  
        async:false,  //
        mimeType: 'text/plain; charset=x-user-defined',  
        url:'${qtpath}/doc2PdfController/doc2pdf',
        data:{'fileName':fileName},
        success:function(data){  
           PDFData = data;  
        }  
    });  
    var rawLength = PDFData.length;  
    //轉換成pdf.js能直接解析的Uint8Array類型,見pdf.js-4068  
    var array = new Uint8Array(new ArrayBuffer(rawLength));    
    for(i = 0; i < rawLength; i++) {  
      array[i] = PDFData.charCodeAt(i) & 0xff;  
    }  
    DEFAULT_URL = array;
</script>
<script type="text/javascript" src="${qtpath}/res/pdf/build/pdf.js"></script>
<script type="text/javascript" src="${qtpath}/res/pdf/web/viewer.js"></script>

  </head>

  ...

效果

分割線

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

　　本覺得完美的實現了doc在線預覽，上測試環境後發現了一個大坑，咱們的doc文件不是在本地office建立後上傳的，是其餘同事用freemarker ftl模板生成的，這種生成的doc文件根本不是微軟標準的doc，本質是xml數據結構，openoffice拿這種文件去轉換pdf文件直接就報錯了

　　上網查資料查了半天也沒找到這種問題的解決方案，想一想只能是放棄openoffice改用其餘方法了（freemarker ftl生成doc這個確定是不能動的）

　　看到一些博客使用word--html--pdf生成pdf，還有的使用freemarker ftl xml 生成pdf感受仍是太繁瑣了，我只是想拿現有的doc（雖然是freemarker ftl生成的）轉換成pdf啊

　　繼續看博客查資料，看到一種方法，使用aspose把doc轉換成pdf，抱着試一試的心態在本地測試了下，沒想到居然成了，感受太意外了，aspose方法超級簡單，只要導入jar包，幾行代碼就能夠搞定，而且轉換速度比openoffice要快不少。非常奇怪，這麼好用這麼簡單的工具爲何沒在我一開始搜索word轉pdf的時候就出現呢

aspose doc轉pdf

　　在maven倉庫搜索aspose，而後把依賴加入pom.xml發現jar包下載不下來，沒辦法，最後在csdn下載aspose jar包，而後mvn deploy到倉庫

　　pom.xml

<!-- word轉pdf maven倉庫沒有須要本地jar包發佈到私服 -->
        <dependency>
            <groupId>com.aspose.words</groupId>
            <artifactId>aspose-words-jdk16</artifactId>
            <version>14.9.0</version>
        </dependency>

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.net.ConnectException;

import javax.servlet.http.HttpServletResponse;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;

import com.xxx.utils.Doc2PdfUtil;

@Controller
@RequestMapping("/doc2PdfController")
public class Doc2PdfController {
    
    private Logger logger = LoggerFactory.getLogger(Doc2PdfController.class);
    
    @RequestMapping("/doc2pdf")
    public void doc2pdf(String fileName,HttpServletResponse response){
        File pdfFile = null;
        OutputStream outputStream = null;
        BufferedInputStream bufferedInputStream = null;
        String docPath = fileName + ".doc";
        String pdfPath = fileName + ".pdf";
        try {
            pdfFile = Doc2PdfUtil.doc2Pdf(docPath, pdfPath);
            outputStream = response.getOutputStream();
            response.setContentType("application/pdf;charset=UTF-8");  
            bufferedInputStream = new BufferedInputStream(new FileInputStream(pdfFile));  
            byte buffBytes[] = new byte[1024];  
            outputStream = response.getOutputStream();  
            int read = 0;    
            while ((read = bufferedInputStream.read(buffBytes)) != -1) {    
                outputStream.write(buffBytes, 0, read);    
            }
        } catch (ConnectException e) {
            logger.info("****調用Doc2PdfUtil doc轉pdf失敗****");
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }  finally {
            if(outputStream != null){
                try {
                    outputStream.flush();
                    outputStream.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }    
            }
            if(bufferedInputStream != null){
                try {
                    bufferedInputStream.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
}

　　Doc2PdfUtil.java

import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.FileOutputStream;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.aspose.words.License;
import com.aspose.words.SaveFormat;

public class Doc2PdfUtil {
    
    private static Logger logger = LoggerFactory.getLogger(Doc2PdfUtil.class);
    
    /**
     * doc轉pdf
     * @param docPath doc文件路徑，包含.doc
     * @param pdfPath pdf文件路徑，包含.pdf
     * @return
     */
    public static File doc2Pdf(String docPath, String pdfPath){
        File pdfFile = new File(pdfPath);
        try {
            String s = "<License><Data><Products><Product>Aspose.Total for Java</Product><Product>Aspose.Words for Java</Product></Products><EditionType>Enterprise</EditionType><SubscriptionExpiry>20991231</SubscriptionExpiry><LicenseExpiry>20991231</LicenseExpiry><SerialNumber>8bfe198c-7f0c-4ef8-8ff0-acc3237bf0d7</SerialNumber></Data><Signature>sNLLKGMUdF0r8O1kKilWAGdgfs2BvJb/2Xp8p5iuDVfZXmhppo+d0Ran1P9TKdjV4ABwAgKXxJ3jcQTqE/2IRfqwnPf8itN8aFZlV3TJPYeD3yWE7IT55Gz6EijUpC7aKeoohTb4w2fpox58wWoF3SNp6sK6jDfiAUGEHYJ9pjU=</Signature></License>";
            ByteArrayInputStream is = new ByteArrayInputStream(s.getBytes());
            License license = new License();
            license.setLicense(is);
            com.aspose.words.Document document = new com.aspose.words.Document(docPath);
            document.save(new FileOutputStream(pdfFile),SaveFormat.PDF);
        } catch (Exception e) {
            logger.info("****aspose doc轉pdf異常");
            e.printStackTrace();
        }
        return pdfFile;
    }
}

　　aspose-words-jdk16-14.9.0.jar下載地址

　　https://download.csdn.net/download/u013279345/10868189

window下正常，linux下亂碼的解決方案

　　使用com.aspose.words將word模板轉爲PDF文件時，在開發平臺window下轉換沒有問題，中文也不會出現亂碼。可是將服務部署在正式服務器（Linux）上，轉換出來的PDF中文就出現了亂碼。在網上找了好久，才找到緣由，現將解決辦法分享給你們。

1、問題緣由分析

在window下沒有問題可是在linux下有問題，就說明不是代碼或者輸入輸出流編碼的問題，根本緣由是兩個平臺環境的問題。出現亂碼說明linux環境中沒有相應的字體以供使用，因此就會致使亂碼的出現。將轉換無問題的windos主機中的字體拷貝到linux平臺下進行安裝，重啓服務器後轉換就不會出現亂碼了。

2、window字體複製到linux環境並安裝

按照教程安裝完成後重啓linux服務器便可搞定亂碼問題。

1. From Windows

Windows下字體庫的位置爲C:\Windows\fonts，這裏麪包含全部windows下可用的字體。

2. To Linux　　

linux的字體庫是 /usr/share/Fonts 。

在該目錄下新建一個目錄，好比目錄名叫 windows（根據我的的喜愛，本身理解就行，固然這裏是有權限要求的，你能夠用sudo來執行）。

而後將 windows 字體庫中你要的字體文件複製到新建的目錄下(只須要複製*.ttc，和*.ttf的文件).

複製全部字體：
   sudo cp *.ttc /usr/share/fonts/windows/
   sudo cp *.ttf /usr/share/fonts/windows/

更改這些字體庫的權限：
    sudo chmod 755 /usr/share/fonts/windows/*

而後進入Linux字體庫：
cd /usr/share/fonts/windows/

接着根據當前目錄下的字體創建scale文件
    sudo mkfontscale

接着創建dir文件
   sudo mkfontdir

而後運行
   sudo fc-cache

重啓 Linux 操做系統就可使用這些字體了。

linux下亂碼問題解決方案轉載自:

https://blog.csdn.net/hanchuang213/article/details/64905214

https://blog.csdn.net/shanelooli/article/details/7212812