在編寫JAVA程序時,不須要像C同樣去手動申請內存和釋放內存,徹底交給JVM來管理,提高了開發效率,可是若是編寫代碼不注意一些細節,那就會形成內存空間的浪費和代碼性能低下等問題。接下來以字符串使用爲例,由於字符串是使用最多的數據類型,再者Java中的字符串是不可變類型:java
public final class String implements java.io.Serializable, Comparable<String>, CharSequence { /** The value is used for character storage. */ private final char value[]; ... ... }
這種不可變類型的好處就是在多線程環境中,具備天生的線程安全特性。但也帶了一些問題,好比對字符串進行拼接、截取等操做時,因不能共享char數組,會產生更多冗餘的字符串實例,而實例越多對佔用的內存也會越多,同時也會增重JVM垃圾回收的負擔。接下來使用Benchmark工具測試字符串各類操做的性能比較。git
測試代碼:正則表達式
@BenchmarkMode(Mode.Throughput) @Warmup(iterations = 3) @Measurement(iterations = 10, time = 5, timeUnit = TimeUnit.SECONDS) @Threads(8) @Fork(2) @OutputTimeUnit(TimeUnit.MILLISECONDS) public class StringBuilderBenchmark { @Benchmark public void testStringAdd() { String a = ""; for (int i = 0; i < 10; i++) { a += i; } print(a); } @Benchmark public void testStringBuilderAdd() { StringBuilder sb = new StringBuilder(); for (int i = 0; i < 10; i++) { sb.append(i); } print(sb.toString()); } private void print(String a) { } public static void main(String[] args) throws RunnerException { Options options = new OptionsBuilder() .include(StringBuilderBenchmark.class.getSimpleName()) .output("./StringBuilderBenchmark.log") .build(); new Runner(options).run(); } }
測試結果:數組
Benchmark Mode Cnt Score Error Units StringBuilderBenchmark.testStringAdd thrpt 20 22163.429 ± 537.729 ops/ms StringBuilderBenchmark.testStringBuilderAdd thrpt 20 43400.877 ± 2447.492 ops/ms
從上面的測試結果來看,使用StringBuilder性能的確要比直接使用字符串拼接要好。安全
測試代碼:多線程
@BenchmarkMode(Mode.Throughput) @Warmup(iterations = 3) @Measurement(iterations = 10, time = 5, timeUnit = TimeUnit.SECONDS) @Threads(8) @Fork(2) @OutputTimeUnit(TimeUnit.MILLISECONDS) @State(Scope.Benchmark) public class StringSplitBenchmark { private static final String regex = "\\."; private static final char CHAR = '.'; private static final Pattern pattern = Pattern.compile(regex); private String[] strings; @Setup public void prepare() { strings = new String[20]; for(int i=0;i<strings.length;i++) { strings[i] = System.currentTimeMillis() + ".aaa.bbb.ccc.ddd" + Math.random(); } } @Benchmark public void testStringSplit() { for(int i=0;i<strings.length;i++) { strings[i].split(regex); } } @Benchmark public void testPatternSplit() { for(int i=0;i<strings.length;i++) { pattern.split(strings[i]); } } @Benchmark public void testCharSplit() { for(int i=0;i<strings.length;i++) { split(strings[i], CHAR, 6); } } public static List<String> split(final String str, final char separatorChar, int expectParts) { if (null == str) { return null; } final int len = str.length(); if (len == 0) { return Collections.emptyList(); } final List<String> list = new ArrayList<String>(expectParts); int i = 0; int start = 0; boolean match = false; while (i < len) { if (str.charAt(i) == separatorChar) { if (match) { list.add(str.substring(start, i)); match = false; } start = ++i; continue; } match = true; i++; } if (match) { list.add(str.substring(start, i)); } return list; } public static void main(String[] args) throws RunnerException { Options options = new OptionsBuilder() .include(StringSplitBenchmark.class.getSimpleName()) .output("./StringSplitBenchmark.log") .build(); new Runner(options).run(); } }
測試結果:併發
Benchmark Mode Cnt Score Error Units StringSplitBenchmark.testCharSplit thrpt 20 872.048 ± 63.872 ops/ms StringSplitBenchmark.testPatternSplit thrpt 20 534.371 ± 28.275 ops/ms StringSplitBenchmark.testStringSplit thrpt 20 814.661 ± 115.653 ops/ms
從測試結果來看testCharSplit 和 testStringSplit 性能差很少,與咱們的預期不同。咱們都知道String.split方法須要傳入一個正則表達式,而在使用正則表達式時,經過使用編譯後的正則表達式性能會更高些,而這裏卻不是。那行我仍是要看看String.split中的實現探個究竟:app
public String[] split(String regex) { return split(regex, 0); } public String[] split(String regex, int limit) { /* fastpath if the regex is a (1)one-char String and this character is not one of the RegEx's meta characters ".$|()[{^?*+\\", or (2)two-char String and the first char is the backslash and the second is not the ascii digit or ascii letter. */ char ch = 0; if (( (regex.value.length == 1 && ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) || (regex.length() == 2 && regex.charAt(0) == '\\' && (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 && ((ch-'a')|('z'-ch)) < 0 && ((ch-'A')|('Z'-ch)) < 0)) && (ch < Character.MIN_HIGH_SURROGATE || ch > Character.MAX_LOW_SURROGATE)) { int off = 0; int next = 0; boolean limited = limit > 0; ArrayList<String> list = new ArrayList<>(); while ((next = indexOf(ch, off)) != -1) { if (!limited || list.size() < limit - 1) { list.add(substring(off, next)); off = next + 1; } else { // last one //assert (list.size() == limit - 1); list.add(substring(off, value.length)); off = value.length; break; } } // If no match was found, return this if (off == 0) return new String[]{this}; // Add remaining segment if (!limited || list.size() < limit) list.add(substring(off, value.length)); // Construct result int resultSize = list.size(); if (limit == 0) { while (resultSize > 0 && list.get(resultSize - 1).length() == 0) { resultSize--; } } String[] result = new String[resultSize]; return list.subList(0, resultSize).toArray(result); } return Pattern.compile(regex).split(this, limit); }
原來String.split方法已經作了優化了,並非咱們想像的全部狀況下都使用正則表達式來切割字符串。這也說明了爲何testCharSplit 與 testStringSplit 性能差很少的緣由了。dom
測試代碼:函數
@BenchmarkMode(Mode.Throughput) @Warmup(iterations = 3) @Measurement(iterations = 10, time = 5, timeUnit = TimeUnit.SECONDS) @Threads(8) @Fork(2) @OutputTimeUnit(TimeUnit.MILLISECONDS) @State(Scope.Benchmark) public class StringReplaceAllBenchmark { private static final String EMPTY = ""; private static final String regex = "\\."; private static final String CHAR = "."; private static final Pattern pattern = Pattern.compile(regex); private String[] strings; @Setup public void prepare() { strings = new String[20]; for (int i = 0; i < strings.length; i++) { strings[i] = System.currentTimeMillis() + ".aaa.bbb.ccc.ddd." + Math.random(); } } @Benchmark public void testStringReplaceAll() { for (int i = 0; i < strings.length; i++) { strings[i].replaceAll(regex, EMPTY); } } @Benchmark public void testPatternReplaceAll() { for (int i = 0; i < strings.length; i++) { pattern.matcher(strings[i]).replaceAll(EMPTY); } } @Benchmark public void testCustomReplaceAll() { for (int i = 0; i < strings.length; i++) { replaceAll(strings[i], CHAR, EMPTY); } } public static String replaceAll(final String str, final String remove, final String replacement) { if (null == str) { return null; } final int len = str.length(); if (len == 0) { return str; } final StringBuilder res = new StringBuilder(len); int offset = 0; int index; while (true) { index = str.indexOf(remove, offset); if (index == -1) { break; } res.append(str, offset, index); if(null != replacement && replacement.length() >0) { res.append(replacement); } offset = index + remove.length(); } if(offset < len) { res.append(str, offset, len); } return res.toString(); } public static void main(String[] args) throws RunnerException { String str = System.currentTimeMillis() + ".aaa.bbb.ccc.ddd." + Math.random(); String str1 = str.replaceAll(regex, EMPTY); String str2 = pattern.matcher(str).replaceAll(EMPTY); String str3 = replaceAll(str, CHAR, EMPTY); System.out.println(str1); System.out.println(str2); System.out.println(str3); Options options = new OptionsBuilder() .include(StringReplaceAllBenchmark.class.getSimpleName()) .output("./StringReplaceAllBenchmark.log") .build(); new Runner(options).run(); } }
測試結果:
Benchmark Mode Cnt Score Error Units StringReplaceAllBenchmark.testCustomReplaceAll thrpt 20 1167.891 ± 39.699 ops/ms StringReplaceAllBenchmark.testPatternReplaceAll thrpt 20 438.079 ± 1.859 ops/ms StringReplaceAllBenchmark.testStringReplaceAll thrpt 20 353.060 ± 11.177 ops/ms
testPatternReplaceAll 和 testStringReplaceAll 都是使用正則表達式來替換,因此性能其差很少。正則表達式在處理一些複雜的狀況時很是方便好用,可是從性能角度來講,能不用的狀況就儘可能不用。
下面的代碼是未優化前的狀況:
public class DesensitizeUtils { /** * 根據value長度取值(切分) * @param value * @return */ public static String desensitizeByLengthOld(String value) { if (value.length() == 2) { value = value.substring(0, 1) + "*"; } else if (value.length() == 3) { value = value.substring(0, 1) + "*" + value.substring(value.length() - 1); } else if (value.length() > 3 && value.length() <= 5) { value = value.substring(0, 1) + "**" + value.substring(value.length() - 2); } else if (value.length() > 5 && value.length() <= 7) { value = value.substring(0, 2) + "***" + value.substring(value.length() - 2); } else if (value.length() > 7) { String str = ""; for(int i=0; i<value.length()-6; i++) { str += "*"; } value = value.substring(0, 3) + str + value.substring(value.length() - 3); } return value; } /** * 中文名稱脫敏策略: * 0. 少於等於1個字 直接返回 * 1. 兩個字 隱藏姓 * 2. 三個及其以上 只保留第一個和最後一個 其餘用星號代替 * @param fullName * @return */ public static String desensitizeChineseNameOld(final String fullName) { if (StringUtils.isBlank(fullName)) { return ""; } if (fullName.length() <= 1) { return fullName; } else if (fullName.length() == 2) { final String name = StringUtils.right(fullName, 1); return StringUtils.leftPad(name, StringUtils.length(fullName), "*"); } else { return StringUtils.left(fullName, 1).concat(StringUtils.removeStart(StringUtils.leftPad(StringUtils.right(fullName, 1), StringUtils.length(fullName), "*"), "*")); } } }
接下來對上面代碼進行優化
public class DesensitizeUtils { private static final char DESENSITIZE_CODE = '*'; }
if (StringUtils.isBlank(fullName)) { return StringUtils.EMPTY; }
使用常量後能夠避免高併發狀況下頻繁實例化字符串,提升程序的總體性能。
把獲取長度提出,避免重複獲取
if (value.length() == 2) { } else if (value.length() == 3) { } else if (value.length() > 3 && value.length() <= 5) { } else if (value.length() > 5 && value.length() <= 7) { } else if (value.length() > 7) { }
優化後:
int length = value.length(); if (length == 2) { } else if (length == 3) { } else if (length > 3 && length <= 5) { } else if (length > 5 && length <= 7) { } else if (length > 7) { }
優化後代碼更加簡潔,若是value.length() 方法是個很是耗時的操做,那麼勢必形成重複調用,耗時乘倍增長。
爲了複用,節約成本,咱們或多或少會使用別人寫提供的類庫,可是在使用以前也要對其原理要有必定的瞭解,並結合本身的實際狀況來選擇合理的方案,以免踩坑。
使用字符串的substring方法很是方便截取字串,可是因爲字符串是不可變類型,因此它每次返回一個新的字符串,在下面的代碼中,就會產生多個字符串實例:
value = value.substring(0, 2) + "***" + value.substring(length - 2);
使用StringBuilder的 append(CharSequence s, int start, int end) 方法來優化:
public AbstractStringBuilder append(CharSequence s, int start, int end) { if (s == null) s = "null"; if ((start < 0) || (start > end) || (end > s.length())) throw new IndexOutOfBoundsException( "start " + start + ", end " + end + ", s.length() " + s.length()); int len = end - start; ensureCapacityInternal(count + len); for (int i = start, j = count; i < end; i++, j++) value[j] = s.charAt(i); count += len; return this; }
這個方法經過for循環來複制字符串,還不是最好的方案,若是JDK能進一步優化會更好一些,優化方法以下:
public AbstractStringBuilder append(String s, int start, int end) { if (s == null) s = "null"; if ((start < 0) || (start > end) || (end > s.length())) throw new IndexOutOfBoundsException( "start " + start + ", end " + end + ", s.length() " + s.length()); int len = end - start; ensureCapacityInternal(count + len); s.getChars(start, end, value, count); // 這句代替上面的for 循環 count += len; return this; }
優化後:
StringBuilder str = new StringBuilder(length); str.append(value, 0, 2).append(DESENSITIZE_CODE).append(DESENSITIZE_CODE).append(DESENSITIZE_CODE).append(value, length - 2, length);
public static String leftPad(final String str, final int size, String padStr) { if (str == null) { return null; } if (isEmpty(padStr)) { padStr = SPACE; } final int padLen = padStr.length(); final int strLen = str.length(); final int pads = size - strLen; if (pads <= 0) { return str; // returns original String when possible } if (padLen == 1 && pads <= PAD_LIMIT) { return leftPad(str, size, padStr.charAt(0)); } if (pads == padLen) { return padStr.concat(str); } else if (pads < padLen) { return padStr.substring(0, pads).concat(str); } else { final char[] padding = new char[pads]; final char[] padChars = padStr.toCharArray(); for (int i = 0; i < pads; i++) { padding[i] = padChars[i % padLen]; } return new String(padding).concat(str); } }
在可預知字符串長度的狀況下,儘可能給StringBuilder設置容量大小,若是字符串長度比默認容量小的話,能夠減小內存分配,若是字符串長度比默認容量大的話能夠減小StringBuilder 內部char數組擴容帶性能損耗。
public class DesensitizeUtils { private static final char DESENSITIZE_CODE = '*'; /** * 根據value長度取值(切分) * * @param value * @return 返回值長度等於入參長度 */ public static String desensitizeByLength(String value) { if (StringUtils.isBlank(value)) { return StringUtils.EMPTY; } int length = value.length(); if (length == 1) { return value; } StringBuilder str = new StringBuilder(length); switch (length) { case 2: str.append(value, 0, 1).append(DESENSITIZE_CODE); break; case 3: str.append(value, 0, 1).append(DESENSITIZE_CODE).append(value, length - 1, length); break; case 4: case 5: str.append(value, 0, 1).append(DESENSITIZE_CODE).append(DESENSITIZE_CODE).append(value, length - 2, length); break; case 6: case 7: str.append(value, 0, 2).append(DESENSITIZE_CODE).append(DESENSITIZE_CODE).append(DESENSITIZE_CODE).append(value, length - 2, length); break; default: str.append(value, 0, 3); for (int i = 0; i < length - 6; i++) { str.append(DESENSITIZE_CODE); } str.append(value, length - 3, length); break; } return str.toString(); } /** * 中文名稱脫敏策略: * 0. 少於等於1個字 直接返回 * 1. 兩個字 隱藏姓 * 2. 三個及其以上 只保留第一個和最後一個 其餘用星號代替 * * @param fullName * @return */ public static String desensitizeChineseName(final String fullName) { if (StringUtils.isBlank(fullName)) { return StringUtils.EMPTY; } int length = fullName.length(); switch (length) { case 1: return fullName; case 2: StringBuilder str = new StringBuilder(2); return str.append(DESENSITIZE_CODE).append(fullName, length - 1, length).toString(); default: str = new StringBuilder(length); str.append(fullName, 0, 1); for (int i = 0; i < length - 2; i++) { str.append(DESENSITIZE_CODE); } str.append(fullName, length - 1, length); return str.toString(); } } }
測試代碼:
private static final String testString = "akkadmmajkkakkajjk"; @Benchmark public void testDesensitizeByLengthOld() { desensitizeByLengthOld(testString); } @Benchmark public void testDesensitizeChineseNameOld() { desensitizeChineseNameOld(testString); } @Benchmark public void testDesensitizeByLength() { desensitizeByLength(testString); } @Benchmark public void testDesensitizeChineseName() { desensitizeChineseName(testString); } public static void main(String[] args) throws RunnerException { Options options = new OptionsBuilder() .include(DesensitizeUtilsBenchmark.class.getSimpleName()) .output("./DesensitizeUtilsBenchmark.log") .build(); new Runner(options).run(); }
測試結果:
Benchmark Mode Cnt Score Error Units DesensitizeUtilsBenchmark.testDesensitizeByLength thrpt 20 61460.601 ± 7262.830 ops/ms DesensitizeUtilsBenchmark.testDesensitizeByLengthOld thrpt 20 11700.417 ± 1402.169 ops/ms DesensitizeUtilsBenchmark.testDesensitizeChineseName thrpt 20 117560.449 ± 731.851 ops/ms DesensitizeUtilsBenchmark.testDesensitizeChineseNameOld thrpt 20 39682.513 ± 463.306 ops/ms
上面的測試用例比較少,不能覆蓋全部狀況,並且現有Benchmark工具不能看出代碼優化先後對GC的影響,這裏只是提供一些思路以供參考。