以前在項目中用到了大數據文件的導入,再次總結一下心路里程 java
數據文件有兩種能夠選xls,txt.(200M+) sql
因爲以前有利用jxl和POI的經驗,因此首先就選擇了xls文件. 可是在實施是老是報java堆棧不夠用.在幾回增長了堆棧以後仍是無果. 大數據
這是因爲JXL在處理時,一次把整個文件所有讀入並解析的緣由.所以只能另尋他路,選擇了利用java最基本的IO流的操做,而後本身解析.一行一行的解析,而後插入. url
FileInputStream fis = null; InputStreamReader isr = null; BufferedReader br = null; Connection conn = null; PreparedStatement stmt = null; try { Class.forName(jdbc_driver); conn = DriverManager.getConnection(jdbc_url, jdbc_user, jdbc_pwd); String sql = "insert into pmc values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"; stmt = conn.prepareStatement(sql); String str = ""; fis = new FileInputStream(filePath);// FileInputStream isr = new InputStreamReader(fis); br = new BufferedReader(isr); while ((str = br.readLine()) != null) { String[] rowData = tr.split("\\|"); if(rowData.length>=20){ for(int i = 0; i < 20; i++) { stmt.setString(i+1,rowData[i]); } stmt.execute(); } }
只是堆棧問題解決,可是發現速度太慢,採用了addBatch的方法1000條記錄批量插入一次,最終代碼如此: spa
private static int batchsize = 1000;
public void importFormTxt(String filePath) { FileInputStream fis = null; InputStreamReader isr = null; BufferedReader br = null; Connection conn = null; PreparedStatement stmt = null; try { Class.forName(jdbc_driver); conn = DriverManager.getConnection(jdbc_url, jdbc_user, jdbc_pwd); String sql = "insert into pmc values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"; stmt = conn.prepareStatement(sql); String str = ""; fis = new FileInputStream(filePath);// FileInputStream isr = new InputStreamReader(fis); br = new BufferedReader(isr); int rowNum = 0; int batchNo = 1; long tmpT1 = System.currentTimeMillis(); System.out.println("import PMC start at:"+(new SimpleDateFormat("yyyy.MM.dd HH:mm:ss")).format(tmpT1)); while ((str = br.readLine()) != null) { String[] rowData = str.split("\\|"); if(rowData.length>=20){ rowNum++; for(int i = 0; i < 20; i++){ stmt.setString(i+1, rowData[i]); } stmt.addBatch(); } if(rowNum == batchNo * batchsize){ ++batchNo; stmt.executeBatch(); System.out.println("insert into "+rowNum+" success!"); stmt.clearBatch(); } } if ((batchNo - 1) * batchsize < rowNum) { stmt.executeBatch(); System.out.println("insert into "+rowNum+" success!"); stmt.clearBatch(); } long tmpT2 = System.currentTimeMillis(); System.out.println("import PMC end at:"+(new SimpleDateFormat("yyyy.MM.dd HH:mm:ss")).format(tmpT2)); System.out.println("use time:"+(tmpT2-tmpT1)/1000+"s"); } catch (FileNotFoundException e) { System.out.println("no file found"); } catch (IOException e) { System.out.println("read file failure"); } catch (ClassNotFoundException e) { e.printStackTrace(); } catch (SQLException e) { e.printStackTrace(); } finally { try { br.close(); isr.close(); fis.close(); stmt.close(); conn.close(); } catch (IOException e) { e.printStackTrace(); } catch (SQLException e) { e.printStackTrace(); } } }突然間發現,java最基本的就能夠解決最實際的問題.有時候第三方的jar包反而把問題搞複雜了.