java導入大數據文件

以前在項目中用到了大數據文件的導入,再次總結一下心路里程 java

數據文件有兩種能夠選xls,txt.(200M+) sql

因爲以前有利用jxl和POI的經驗,因此首先就選擇了xls文件. 可是在實施是老是報java堆棧不夠用.在幾回增長了堆棧以後仍是無果. 大數據

這是因爲JXL在處理時,一次把整個文件所有讀入並解析的緣由.所以只能另尋他路,選擇了利用java最基本的IO流的操做,而後本身解析.一行一行的解析,而後插入. url

FileInputStream fis = null;
		InputStreamReader isr = null;
		BufferedReader br = null;
		Connection conn = null;
		PreparedStatement stmt = null;
		try {
			Class.forName(jdbc_driver);
			conn = DriverManager.getConnection(jdbc_url, jdbc_user, jdbc_pwd);
			String sql = "insert into pmc values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)";
			stmt = conn.prepareStatement(sql);
			
			String str = "";
			
			fis = new FileInputStream(filePath);// FileInputStream

			isr = new InputStreamReader(fis);

			br = new BufferedReader(isr);
                        
                        while ((str = br.readLine()) != null) {
				String[] rowData = tr.split("\\|");
				
				if(rowData.length>=20){
					
				for(int i = 0; i < 20; i++) {
			         stmt.setString(i+1,rowData[i]); 
					}
					stmt.execute();
				}
                      }

只是堆棧問題解決,可是發現速度太慢,採用了addBatch的方法1000條記錄批量插入一次,最終代碼如此: spa

private static int batchsize = 1000;
public void importFormTxt(String filePath) {
		FileInputStream fis = null;
		InputStreamReader isr = null;
		BufferedReader br = null;
		Connection conn = null;
		PreparedStatement stmt = null;
		try {
			Class.forName(jdbc_driver);
			conn = DriverManager.getConnection(jdbc_url, jdbc_user, jdbc_pwd);
			String sql = "insert into pmc values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)";
			stmt = conn.prepareStatement(sql);
			
			String str = "";
			
			fis = new FileInputStream(filePath);// FileInputStream

			isr = new InputStreamReader(fis);

			br = new BufferedReader(isr);
			
			int rowNum = 0;
			int batchNo = 1;
			long tmpT1 = System.currentTimeMillis();
			System.out.println("import PMC start at:"+(new SimpleDateFormat("yyyy.MM.dd HH:mm:ss")).format(tmpT1));
			while ((str = br.readLine()) != null) {
				String[] rowData = str.split("\\|");
				
				if(rowData.length>=20){
					rowNum++;
					for(int i = 0; i < 20; i++){
						stmt.setString(i+1, rowData[i]);
					}
					stmt.addBatch();
				}
				
				if(rowNum == batchNo * batchsize){
					++batchNo;
					stmt.executeBatch();
					System.out.println("insert into "+rowNum+" success!");
					stmt.clearBatch();
				}
			}
			if ((batchNo - 1) * batchsize < rowNum) {
				stmt.executeBatch();
				System.out.println("insert into "+rowNum+" success!");
				stmt.clearBatch();
			}
			long tmpT2 = System.currentTimeMillis();
			System.out.println("import PMC end at:"+(new SimpleDateFormat("yyyy.MM.dd HH:mm:ss")).format(tmpT2));
			System.out.println("use time:"+(tmpT2-tmpT1)/1000+"s");
			
		} catch (FileNotFoundException e) {
			System.out.println("no file found");
		} catch (IOException e) {
			System.out.println("read file failure");
		} catch (ClassNotFoundException e) {
			e.printStackTrace();
		} catch (SQLException e) {
			e.printStackTrace();
		} finally {
			try {
				br.close();
				isr.close();
				fis.close();
				stmt.close();
				conn.close();
			} catch (IOException e) {
				e.printStackTrace();
			} catch (SQLException e) {
				e.printStackTrace();
			}
		}
	}
突然間發現,java最基本的就能夠解決最實際的問題.有時候第三方的jar包反而把問題搞複雜了.
相關文章
相關標籤/搜索