lang3 的 split 方法誤用

lang3 的 split 方法誤用-原文連接
apache 的 lang3 是咱們開發經常使用到的三方工具包,然而對這個包不甚瞭解的話,會產生莫名其秒的 bug ,在這裏作下記錄。java

誤用示例

public class TestDemo {

    @Test
    public void test() throws IOException {
        String sendMsg = "{\"expiredTime\":\"20190726135831\",\"drives\":\"androidgetui\",\"msgBody\":\"{\\\"serialNumber\\\":\\\"wow22019072611349502\\\",\\\"push_key\\\":\\\"appactive#549110277\\\",\\\"title\\\":\\\"\\xe6\\x9c\\x89\\xe4\\xba\\xba@\\xe4\\xbd\\xa0\\\",\\\"message\\\":\\\"\\xe4\\xbb\\x8a\\xe5\\xa4\\xa9\\xe5\\x87\\xa0\\xe7\\x82\\xb9\\xe5\\x87\\xba\\xe5\\x8f\\x91\\xef\\xbc\\x9f\\\",\\\"link\\\":\\\"chelaile://homeTab/home?select=3\\\",\\\"open_type\\\":0,\\\"expireDays\\\":\\\"30\\\",\\\"type\\\":14}\",\"clients\":[\"13065ffa4e25c4a7c68\"]}CHELAILE_PUSH{\"cityId\":\"007\",\"gpsTime\":\"2019-07-24 21:33:06\",\"lat\":\"30.605916\",\"lng\":\"103.980439\",\"s\":\"android\",\"sourceUdid\":\"a4419b93-fb0e-43c7-98fa-5b7c18255660\",\"token\":\"13065ffa4e25c4a7c68\",\"tokenType\":\"3\",\"udid\":\"UDID2TOKEN#a4419b93-fb0e-43c7-98fa-5b7c18255660\",\"userCreateTime\":\"2018-04-20 08:13:32\",\"userLastActiveTime\":\"2019-07-24 21:33:06\",\"vc\":\"150\"}";
        String[] dataArr = StringUtils.split(sendMsg,"CHELAILE_PUSH");
        Assert.assertEquals(dataArr.length,2);
    }
}
複製代碼

分析緣由

經過分析字符串的拆分結果,發現該方法並非將分隔符去截取字符串,而是將分隔符的每個字符都當成分隔符去截取字符串,當咱們的分隔符是一個字符的時候通常不會出現上面示例中出現的問題,若是分隔符是多個字符的時候這個問題就顯現出來了。android

查看 StringUtils 源碼

/** * * <pre> * StringUtils.split(null, *) = null * StringUtils.split("", *) = [] * StringUtils.split("abc def", null) = ["abc", "def"] * StringUtils.split("abc def", " ") = ["abc", "def"] * StringUtils.split("abc def", " ") = ["abc", "def"] * StringUtils.split("ab:cd:ef", ":") = ["ab", "cd", "ef"] * </pre> * * @param str 要解析的字符串,可能爲空 * @param separatorChars 用作分割字符的字符們(注意是字符串們哦!),當 separatorChars 傳入的值爲空的時候則用空格來作分隔符 */
    public static String[] split(final String str, final String separatorChars) {
        return splitWorker(str, separatorChars, -1, false);
    }
    
    /** * Performs the logic for the {@code split} and * {@code splitPreserveAllTokens} methods that return a maximum array * length. * * @param str the String to parse, may be {@code null} * @param separatorChars the separate character * @param max the maximum number of elements to include in the * array. A zero or negative value implies no limit. * @param preserveAllTokens if {@code true}, adjacent separators are * treated as empty token separators; if {@code false}, adjacent * separators are treated as one separator. * @return an array of parsed Strings, {@code null} if null String input */
    private static String[] splitWorker(final String str, final String separatorChars, final int max, final boolean preserveAllTokens) {
        // Performance tuned for 2.0 (JDK1.4)
        // Direct code is quicker than StringTokenizer.
        // Also, StringTokenizer uses isSpace() not isWhitespace()

        if (str == null) {
            return null;
        }
        final int len = str.length();
        if (len == 0) {
            return ArrayUtils.EMPTY_STRING_ARRAY;
        }
        final List<String> list = new ArrayList<>();
        int sizePlus1 = 1;
        int i = 0, start = 0;
        boolean match = false;
        boolean lastMatch = false;
        if (separatorChars == null) {
            
            // 用空格做爲分隔符切割字符串
            while (i < len) {
                if (Character.isWhitespace(str.charAt(i))) {
                    if (match || preserveAllTokens) {
                        lastMatch = true;
                        if (sizePlus1++ == max) {
                            i = len;
                            lastMatch = false;
                        }
                        list.add(str.substring(start, i));
                        match = false;
                    }
                    start = ++i;
                    continue;
                }
                lastMatch = false;
                match = true;
                i++;
            }
        } else if (separatorChars.length() == 1) {
            // 分隔符的字符數爲 1 的時候,切割字符串的邏輯
            final char sep = separatorChars.charAt(0);
            while (i < len) {
                if (str.charAt(i) == sep) {
                    if (match || preserveAllTokens) {
                        lastMatch = true;
                        if (sizePlus1++ == max) {
                            i = len;
                            lastMatch = false;
                        }
                        list.add(str.substring(start, i));
                        match = false;
                    }
                    start = ++i;
                    continue;
                }
                lastMatch = false;
                match = true;
                i++;
            }
        } else {
            // 當分隔符的字符數爲多個的時候,分割字符串的邏輯
            // 示例:分隔字符串 abc,分割字符串的分隔符能夠是 a,ab,abc
            while (i < len) {
                if (separatorChars.indexOf(str.charAt(i)) >= 0) {
                    if (match || preserveAllTokens) {
                        lastMatch = true;
                        if (sizePlus1++ == max) {
                            i = len;
                            lastMatch = false;
                        }
                        list.add(str.substring(start, i));
                        match = false;
                    }
                    start = ++i;
                    continue;
                }
                lastMatch = false;
                match = true;
                i++;
            }
        }
        if (match || preserveAllTokens && lastMatch) {
            list.add(str.substring(start, i));
        }
        return list.toArray(new String[list.size()]);
    }
複製代碼

小結

平時只知道調用api,在使用三方包的時候,沒有認真查看api文檔,對於三方包的方法,使用處於想固然的狀態,這裏應該作好檢討。apache

相關文章
相關標籤/搜索