咱們經常用String的split()方法去分割字符串,有兩個地方值得注意:html
1. 當分隔符是句號時("."),須要轉義:java
因爲String.split是基於正則表達式來分割字符串,而句號在正則表達式裏表示任意字符。正則表達式
//Wrong: //String[] words = tmp.split("."); //Correct: String[] words = tmp.split("\\.");
因此,假設分隔符在正則表達式裏有必定的意義時,須要格外留心,必須將它們轉義才能達到分割的效果。api
2. 假設字符串最後有連續多個分隔符,且這些分隔符都須要被分割的話,須要調用split(String regex,int limit)這個方法:oracle
String abc = "a,b,c,,,"; String[] str = abc.split(","); System.out.println(Arrays.toString(str)+" "+str.length); String[] str2 = abc.split(",",-1); System.out.println(Arrays.toString(str2)+" "+str2.length);
輸出以下:app
[a, b, c] 3
[a, b, c, , , ] 6ide
須要輸出csv文件的時候,尤爲須要注意。oop
3. 假設須要快速分割字符串,split()並非最有效的方法。在split()方法內,有以下的實現:測試
1 public String[] split(String regex, int limit) { 2 return Pattern.compile(regex).split(this, limit); 3 }
頻繁調用split()會不斷建立Pattern這個對象,所以能夠這樣去實現,減小Pattern的建立:ui
1 //create the Pattern object outside the loop 2 Pattern pattern = Pattern.compile(" "); 3 4 for (int i = 0; i < 1000000; i++) 5 { 6 String[] split = pattern.split("Hello World", 0); 7 list.add(split); 8 }
另外split()也每每比indexOf()+subString()這個組合分割字符串要稍慢,詳情可看這個帖子。
我在本機作過測試,感受indexOf()+subString()比split()快一倍:
1 public static void main(String[] args) { 2 StringBuilder sb = new StringBuilder(); 3 for (int i = 100000; i < 100000 + 60; i++) 4 sb.append(i).append(' '); 5 String sample = sb.toString(); 6 7 int runs = 100000; 8 for (int i = 0; i < 5; i++) { 9 { 10 long start = System.nanoTime(); 11 for (int r = 0; r < runs; r++) { 12 StringTokenizer st = new StringTokenizer(sample); 13 List<String> list = new ArrayList<String>(); 14 while (st.hasMoreTokens()) 15 list.add(st.nextToken()); 16 } 17 long time = System.nanoTime() - start; 18 System.out.printf("StringTokenizer took an average of %.1f us%n", time / runs 19 / 1000.0); 20 } 21 { 22 long start = System.nanoTime(); 23 Pattern spacePattern = Pattern.compile(" "); 24 for (int r = 0; r < runs; r++) { 25 List<String> list = Arrays.asList(spacePattern.split(sample, 0)); 26 } 27 long time = System.nanoTime() - start; 28 System.out.printf("Pattern.split took an average of %.1f us%n", time / runs 29 / 1000.0); 30 } 31 { 32 long start = System.nanoTime(); 33 for (int r = 0; r < runs; r++) { 34 List<String> list = new ArrayList<String>(); 35 int pos = 0, end; 36 while ((end = sample.indexOf(' ', pos)) >= 0) { 37 list.add(sample.substring(pos, end)); 38 pos = end + 1; 39 } 40 } 41 long time = System.nanoTime() - start; 42 System.out 43 .printf("indexOf loop took an average of %.1f us%n", time / runs / 1000.0); 44 } 45 } 46 }
在jdk1.7測試後,結果以下:
StringTokenizer took an average of 7.2 us
Pattern.split took an average of 7.9 us
indexOf loop took an average of 3.5 us
------------------------------------------
StringTokenizer took an average of 6.8 us
Pattern.split took an average of 5.4 us
indexOf loop took an average of 3.1 us
------------------------------------------
StringTokenizer took an average of 6.0 us
Pattern.split took an average of 5.5 us
indexOf loop took an average of 3.1 us
------------------------------------------
StringTokenizer took an average of 5.9 us
Pattern.split took an average of 5.5 us
indexOf loop took an average of 3.1 us
------------------------------------------
StringTokenizer took an average of 6.4 us
Pattern.split took an average of 5.5 us
indexOf loop took an average of 3.2 us
本文完