目前來講解析word文檔顯示在html上有三種辦法html
分別是:POI(比較麻煩)apache
插件(要付費,或者天天只容許調用500次,不適合大企業)dom
把word轉化成爲PDF而後經過flash體如今頁面上(不怎麼樣,麻煩+可操做性不強)maven
使用H5執行,不太熟悉H5優化
既然選擇了POI那麼就開始作了。ui
第一步先maven導入jar包.spa
<dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>3.14</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-scratchpad</artifactId> <version>3.14</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>3.14</version> </dependency> <dependency> <groupId>fr.opensagres.xdocreport</groupId> <artifactId>xdocreport</artifactId> <version>1.0.6</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml-schemas</artifactId> <version>3.14</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>ooxml-schemas</artifactId> <version>1.3</version> </dependency>
POI在解析的時候會有版本問題致使沒法調用某些對象。因此word2003跟word2007須要使用不一樣的方法進行轉化插件
先解析2007code
@Test public void word2007ToHtml() throws Exception { String filepath = "e:/files/"; String sourceFileName =filepath+"前言.docx"; String targetFileName = filepath+"1496717486420.html"; String imagePathStr = filepath+"/image/"; OutputStreamWriter outputStreamWriter = null; try { XWPFDocument document = new XWPFDocument(new FileInputStream(sourceFileName)); XHTMLOptions options = XHTMLOptions.create(); // 存放圖片的文件夾 options.setExtractor(new FileImageExtractor(new File(imagePathStr))); // html中圖片的路徑 options.URIResolver(new BasicURIResolver("image")); outputStreamWriter = new OutputStreamWriter(new FileOutputStream(targetFileName), "utf-8"); XHTMLConverter xhtmlConverter = (XHTMLConverter) XHTMLConverter.getInstance(); xhtmlConverter.convert(document, outputStreamWriter, options); } finally { if (outputStreamWriter != null) { outputStreamWriter.close(); } } }
而後沒試過的2003orm
@Test public void test(){ DocxToHtml("E://files//1496635038432.doc","E://files//1496635038432.html"); } public static void DocxToHtml(String fileAllName,String outPutFile){ HWPFDocument wordDocument; try { //根據輸入文件路徑與名稱讀取文件流 InputStream in=new FileInputStream(fileAllName); //把文件流轉化爲輸入wordDom對象 wordDocument = new HWPFDocument(in); //經過反射構建dom建立者工廠 DocumentBuilderFactory domBuilderFactory=DocumentBuilderFactory.newInstance(); //生成dom建立者 DocumentBuilder domBuilder=domBuilderFactory.newDocumentBuilder(); //生成dom對象 Document dom=domBuilder.newDocument(); //生成針對Dom對象的轉化器 WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(dom); //轉化器重寫內部方法 wordToHtmlConverter.setPicturesManager( new PicturesManager() { public String savePicture( byte[] content, PictureType pictureType, String suggestedName, float widthInches, float heightInches ) { return suggestedName; } } ); //轉化器開始轉化接收到的dom對象 wordToHtmlConverter.processDocument(wordDocument); //保存文檔中的圖片 /* List<?> pics=wordDocument.getPicturesTable().getAllPictures(); if(pics!=null){ for(int i=0;i<pics.size();i++){ Picture pic = (Picture)pics.get(i); try { pic.writeImageContent(new FileOutputStream("E:/test/"+ pic.suggestFullFileName())); } catch (FileNotFoundException e) { e.printStackTrace(); } } } */ //從加載了輸入文件中的轉換器中提取DOM節點 Document htmlDocument = wordToHtmlConverter.getDocument(); //從提取的DOM節點中得到內容 DOMSource domSource = new DOMSource(htmlDocument); //字節碼輸出流 ByteArrayOutputStream out = new ByteArrayOutputStream(); //輸出流的源頭 StreamResult streamResult = new StreamResult(out); //轉化工廠生成序列轉化器 TransformerFactory tf = TransformerFactory.newInstance(); Transformer serializer = tf.newTransformer(); //設置序列化內容格式 serializer.setOutputProperty(OutputKeys.ENCODING, "GB2312"); serializer.setOutputProperty(OutputKeys.INDENT, "yes"); serializer.setOutputProperty(OutputKeys.METHOD, "html"); serializer.transform(domSource, streamResult); //生成文件方法 writeFile(new String(out.toByteArray()), outPutFile); out.close(); } catch (FileNotFoundException e1) { // TODO Auto-generated catch block e1.printStackTrace(); } catch (IOException e1) { // TODO Auto-generated catch block e1.printStackTrace(); } catch (TransformerConfigurationException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (TransformerException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (ParserConfigurationException e) { // TODO Auto-generated catch block e.printStackTrace(); } } public static void writeFile(String content, String path) { FileOutputStream fos = null; BufferedWriter bw = null; try { File file = new File(path); fos = new FileOutputStream(file); bw = new BufferedWriter(new OutputStreamWriter(fos,"GB2312")); bw.write(content); } catch (FileNotFoundException fnfe) { fnfe.printStackTrace(); } catch (IOException ioe) { ioe.printStackTrace(); } finally { try { if (bw != null) bw.close(); if (fos != null) fos.close(); } catch (IOException ie) { } } }
這兩個方法能夠將word轉化成HTML,注意若是是在IE8的狀況下會沒法顯示錶格邊框。
我會進一步優化這個方法