POI執行解析word轉化HTML

目前來講解析word文檔顯示在html上有三種辦法html

分別是:POI(比較麻煩)apache

    插件(要付費,或者天天只容許調用500次,不適合大企業)dom

   把word轉化成爲PDF而後經過flash體如今頁面上(不怎麼樣,麻煩+可操做性不強)maven

     使用H5執行,不太熟悉H5優化

 

既然選擇了POI那麼就開始作了。ui

第一步先maven導入jar包.spa

<dependency> 
     <groupId>org.apache.poi</groupId> 
     <artifactId>poi</artifactId> 
     <version>3.14</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.poi</groupId> 
     <artifactId>poi-scratchpad</artifactId> 
     <version>3.14</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.poi</groupId> 
     <artifactId>poi-ooxml</artifactId> 
     <version>3.14</version> 
    </dependency> 
    <dependency> 
     <groupId>fr.opensagres.xdocreport</groupId> 
     <artifactId>xdocreport</artifactId> 
     <version>1.0.6</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.poi</groupId> 
     <artifactId>poi-ooxml-schemas</artifactId> 
     <version>3.14</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.poi</groupId> 
     <artifactId>ooxml-schemas</artifactId> 
     <version>1.3</version> 
    </dependency> 

 

POI在解析的時候會有版本問題致使沒法調用某些對象。因此word2003跟word2007須要使用不一樣的方法進行轉化插件

先解析2007code

 @Test
    public void word2007ToHtml() throws Exception {
        String filepath = "e:/files/";
        String sourceFileName =filepath+"前言.docx"; 
        String targetFileName = filepath+"1496717486420.html"; 
        String imagePathStr = filepath+"/image/";  
        OutputStreamWriter outputStreamWriter = null; 
        try { 
          XWPFDocument document = new XWPFDocument(new FileInputStream(sourceFileName)); 
          XHTMLOptions options = XHTMLOptions.create(); 
          // 存放圖片的文件夾 
          options.setExtractor(new FileImageExtractor(new File(imagePathStr))); 
          // html中圖片的路徑 
          options.URIResolver(new BasicURIResolver("image")); 
          outputStreamWriter = new OutputStreamWriter(new FileOutputStream(targetFileName), "utf-8"); 
          XHTMLConverter xhtmlConverter = (XHTMLConverter) XHTMLConverter.getInstance(); 
          xhtmlConverter.convert(document, outputStreamWriter, options); 
        } finally { 
          if (outputStreamWriter != null) { 
            outputStreamWriter.close(); 
          } 
        }
      } 

而後沒試過的2003orm

    @Test
    public void test(){
        DocxToHtml("E://files//1496635038432.doc","E://files//1496635038432.html");
    }
    public static void DocxToHtml(String fileAllName,String outPutFile){
        HWPFDocument wordDocument;
        try {
            //根據輸入文件路徑與名稱讀取文件流
            InputStream in=new FileInputStream(fileAllName);
            //把文件流轉化爲輸入wordDom對象
            wordDocument = new HWPFDocument(in);
            //經過反射構建dom建立者工廠
            DocumentBuilderFactory domBuilderFactory=DocumentBuilderFactory.newInstance();
            //生成dom建立者
            DocumentBuilder domBuilder=domBuilderFactory.newDocumentBuilder();
            //生成dom對象
            Document dom=domBuilder.newDocument();
            //生成針對Dom對象的轉化器
            WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(dom);    
            //轉化器重寫內部方法
             wordToHtmlConverter.setPicturesManager( new PicturesManager()    
             {    
                 public String savePicture( byte[] content,    
                         PictureType pictureType, String suggestedName,    
                         float widthInches, float heightInches )    
                 {    
                     return suggestedName;    
                 }    
             } ); 
            //轉化器開始轉化接收到的dom對象
            wordToHtmlConverter.processDocument(wordDocument); 
            //保存文檔中的圖片
        /*    List<?> pics=wordDocument.getPicturesTable().getAllPictures();    
            if(pics!=null){    
                for(int i=0;i<pics.size();i++){    
                    Picture pic = (Picture)pics.get(i);   
                    try {    
                        pic.writeImageContent(new FileOutputStream("E:/test/"+ pic.suggestFullFileName()));    
                    } catch (FileNotFoundException e) {    
                        e.printStackTrace();    
                    }      
                }    
            } */
            //從加載了輸入文件中的轉換器中提取DOM節點
            Document htmlDocument = wordToHtmlConverter.getDocument();  
            //從提取的DOM節點中得到內容
            DOMSource domSource = new DOMSource(htmlDocument);
            
            //字節碼輸出流
            ByteArrayOutputStream out = new ByteArrayOutputStream(); 
            //輸出流的源頭
            StreamResult streamResult = new StreamResult(out);    
            //轉化工廠生成序列轉化器
            TransformerFactory tf = TransformerFactory.newInstance();    
            Transformer serializer = tf.newTransformer();
            //設置序列化內容格式
            serializer.setOutputProperty(OutputKeys.ENCODING, "GB2312");    
            serializer.setOutputProperty(OutputKeys.INDENT, "yes");    
            serializer.setOutputProperty(OutputKeys.METHOD, "html");
            
            serializer.transform(domSource, streamResult);    
            //生成文件方法
            writeFile(new String(out.toByteArray()), outPutFile);
            out.close(); 
        } catch (FileNotFoundException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        } catch (IOException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        } catch (TransformerConfigurationException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                } catch (TransformerException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (ParserConfigurationException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
        }
    }
    
    
     public static void writeFile(String content, String path) {    
            FileOutputStream fos = null;    
            BufferedWriter bw = null;    
            try {    
                File file = new File(path);    
                fos = new FileOutputStream(file);    
                bw = new BufferedWriter(new OutputStreamWriter(fos,"GB2312"));    
                bw.write(content);    
            } catch (FileNotFoundException fnfe) {    
                fnfe.printStackTrace();    
            } catch (IOException ioe) {    
                ioe.printStackTrace();    
            } finally {    
                try {    
                    if (bw != null)    
                        bw.close();    
                    if (fos != null)    
                        fos.close();    
                } catch (IOException ie) {    
                }    
            }    
        }    

 

這兩個方法能夠將word轉化成HTML,注意若是是在IE8的狀況下會沒法顯示錶格邊框。

我會進一步優化這個方法

相關文章
相關標籤/搜索