Java筆記之java.lang.String#trim

 

String的trim()方法是使用頻率頻率很高的一個方法,直到不久前我不肯定trim去除兩端的空白符時對換行符是怎麼處理的點進去看了下源碼的實現,才發現String#trim的實現跟我想像的徹底不同,原來一直以來我對這個函數存在着很深的誤解。html

我想的trim方法是相似於下面這樣的:java

package cc11001100.trimStudy;

/**
 * @author CC11001100
 */
public class CustomString {

	private char[] values;

	public CustomString(char[] values) {
		this.values = values;
	}

	// ...

	public CustomString trim() {
		char[] localValues = values;
		int left = 0, right = localValues.length;
		while (left < right && isBlankChar(localValues[left])) {
			left++;
		}
		while (right > left && isBlankChar(localValues[right - 1])) {
			right--;
		}
		if (left != 0 || right != localValues.length) {
			char[] newValue = new char[right - left];
			System.arraycopy(localValues, left, newValue, 0, newValue.length);
			return new CustomString(newValue);
		} else {
			return this;
		}
	}

	private boolean isBlankChar(char c) {
		return c == ' ' || c == '\t' || c == '\r' || c == '\n';
	}

	@Override
	public String toString() {
		return new java.lang.String(values);
	}

	// ...

}

即去除字符串兩邊的回車換行、製表符、回車換行符等等,然而String#trim的實際實現是這樣的:編程

/**
 * Returns a string whose value is this string, with any leading and trailing
 * whitespace removed.
 * <p>
 * If this {@code String} object represents an empty character
 * sequence, or the first and last characters of character sequence
 * represented by this {@code String} object both have codes
 * greater than {@code '\u005Cu0020'} (the space character), then a
 * reference to this {@code String} object is returned.
 * <p>
 * Otherwise, if there is no character with a code greater than
 * {@code '\u005Cu0020'} in the string, then a
 * {@code String} object representing an empty string is
 * returned.
 * <p>
 * Otherwise, let <i>k</i> be the index of the first character in the
 * string whose code is greater than {@code '\u005Cu0020'}, and let
 * <i>m</i> be the index of the last character in the string whose code
 * is greater than {@code '\u005Cu0020'}. A {@code String}
 * object is returned, representing the substring of this string that
 * begins with the character at index <i>k</i> and ends with the
 * character at index <i>m</i>-that is, the result of
 * {@code this.substring(k, m + 1)}.
 * <p>
 * This method may be used to trim whitespace (as defined above) from
 * the beginning and end of a string.
 *
 * @return  A string whose value is this string, with any leading and trailing white
 *          space removed, or this string if it has no leading or
 *          trailing white space.
 */
public String trim() {
    int len = value.length;
    int st = 0;
    char[] val = value;    /* avoid getfield opcode */

    while ((st < len) && (val[st] <= ' ')) {
        st++;
    }
    while ((st < len) && (val[len - 1] <= ' ')) {
        len--;
    }
    return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
}

會將字符串兩側小於空格的字符都去除掉,這裏能夠簡單的將\u005Cu0020理解爲ASCII 0x20,即十進制的32,在ASCII碼錶中小於等於32的字符都將被去除:app

image 

先來看一下trim必需要去除的幾個字符:ide

\t是9函數

\r是13測試

\n是10ui

這幾個字符卻是都小於空格,並且前31位都是不可見字符,32是空格,這樣作的話好像也沒有太大的毛病,只是之後再使用trim的時候要想一下本身的數據有沒有可能出現小於32不是空格製表符換行之類又須要保留的。this

 

下面是對String#trim的一個簡單測試:spa

package cc11001100.trimStudy;

/**
 * @author CC11001100
 */
public class TrimStudy {

	public static void main(String[] args) {

		StringBuilder sb = new StringBuilder();
		for (int i = 0; i < 128; i++) {
			sb.append((char) i);
		}
		String s = sb.toString().trim();
		// trim效果
		System.out.println("-" + s + "-");
		// trim以後第一個字符的ASCII碼
		System.out.println((int) s.charAt(0));
		// 刪除
		System.out.println((char) 127);
		// 查看其它空白字符的打印效果
		System.out.println(sb.toString());

	}

}

運行結果:
image 

注意ASCII 127刪除字符應該也能夠算做是不可見的空白字符。

 

後來我不死心,又去找了被依賴超屢次數的Apache commons-lang中StringUtils#trim的實現:

/**
 * <p>Removes control characters (char &lt;= 32) from both
 * ends of this String, handling <code>null</code> by returning
 * <code>null</code>.</p>
 *
 * <p>The String is trimmed using {@link String#trim()}.
 * Trim removes start and end characters &lt;= 32.
 * To strip whitespace use {@link #strip(String)}.</p>
 *
 * <p>To trim your choice of characters, use the
 * {@link #strip(String, String)} methods.</p>
 *
 * <pre>
 * StringUtils.trim(null)          = null
 * StringUtils.trim("")            = ""
 * StringUtils.trim("     ")       = ""
 * StringUtils.trim("abc")         = "abc"
 * StringUtils.trim("    abc    ") = "abc"
 * </pre>
 *
 * @param str  the String to be trimmed, may be null
 * @return the trimmed string, <code>null</code> if null String input
 */
public static String trim(String str) {
    return str == null ? null : str.trim();
}

然而也只是調用了String#trim,也不是我想象的那樣….

 

看來我一直以來都對trim有着很深的誤解,trim是編程中對字符串處理的一個比較通用的概念,也不知道其它語言的具體實現是怎樣的。

 

.

相關文章
相關標籤/搜索