該方法用於返回指定字符在此字符串中最後一次出現處的索引,有多種方法參數。可傳入 int 類型,也可傳入 String 類型,另外還能傳入開始位置。根據編碼的不一樣分別用 Latin1 和 UTF16 兩種方式處理。mysql
public int lastIndexOf(int ch) {
return lastIndexOf(ch, length() - 1);
}
public int lastIndexOf(int ch, int fromIndex) {
return isLatin1() ? StringLatin1.lastIndexOf(value, ch, fromIndex)
: StringUTF16.lastIndexOf(value, ch, fromIndex);
}
public int lastIndexOf(String str) {
return lastIndexOf(str, length());
}
public int lastIndexOf(String str, int fromIndex) {
return lastIndexOf(value, coder(), length(), str, fromIndex);
}
static int lastIndexOf(byte[] src, byte srcCoder, int srcCount,
String tgtStr, int fromIndex) {
byte[] tgt = tgtStr.value;
byte tgtCoder = tgtStr.coder();
int tgtCount = tgtStr.length();
int rightIndex = srcCount - tgtCount;
if (fromIndex > rightIndex) {
fromIndex = rightIndex;
}
if (fromIndex < 0) {
return -1;
}
if (tgtCount == 0) {
return fromIndex;
}
if (srcCoder == tgtCoder) {
return srcCoder == LATIN1
? StringLatin1.lastIndexOf(src, srcCount, tgt, tgtCount, fromIndex)
: StringUTF16.lastIndexOf(src, srcCount, tgt, tgtCount, fromIndex);
}
if (srcCoder == LATIN1) {
return -1;
}
return StringUTF16.lastIndexOfLatin1(src, srcCount, tgt, tgtCount, fromIndex);
}
複製代碼
Latin1 編碼的邏輯爲,sql
Math.min(fromIndex, value.length - 1)
取偏移值。public static int lastIndexOf(final byte[] value, int ch, int fromIndex) {
if (!canEncode(ch)) {
return -1;
}
int off = Math.min(fromIndex, value.length - 1);
for (; off >= 0; off--) {
if (value[off] == (byte)ch) {
return off;
}
}
return -1;
}
複製代碼
相似地,對於 UTF16 編碼也作相似處理,但由於 unicode 包含了基本多語言平面(Basic Multilingual Plane,BMP)外,還存在補充平面。而傳入的值爲 int 類型(4字節),因此若是超出 BMP 平面,此時須要4個字節,分別用來保存 High-surrogate 和 Low-surrogate,此時就須要對比4個字節。數組
另外,若是查找子字符串則是從子字符串第一個字符開始匹配直到子字符串徹底被匹配成功。bash
該方法用於獲取字符串的指定子字符串。有兩個方法,一個是隻傳入開始索引,第二個是出傳入開始索引和結束索引。邏輯經過源碼已經很清晰了,先計算截取的子字符串長度,而後分別根據 Latin1 和 UTF16 兩種方式生成新的 String 對象。網絡
public String substring(int beginIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
int subLen = length() - beginIndex;
if (subLen < 0) {
throw new StringIndexOutOfBoundsException(subLen);
}
if (beginIndex == 0) {
return this;
}
return isLatin1() ? StringLatin1.newString(value, beginIndex, subLen)
: StringUTF16.newString(value, beginIndex, subLen);
}
public String substring(int beginIndex, int endIndex) {
int length = length();
checkBoundsBeginEnd(beginIndex, endIndex, length);
int subLen = endIndex - beginIndex;
if (beginIndex == 0 && endIndex == length) {
return this;
}
return isLatin1() ? StringLatin1.newString(value, beginIndex, subLen)
: StringUTF16.newString(value, beginIndex, subLen);
}
public static String newString(byte[] val, int index, int len) {
return new String(Arrays.copyOfRange(val, index, index + len),
LATIN1);
}
public static String newString(byte[] val, int index, int len) {
if (String.COMPACT_STRINGS) {
byte[] buf = compress(val, index, len);
if (buf != null) {
return new String(buf, LATIN1);
}
}
int last = index + len;
return new String(Arrays.copyOfRange(val, index << 1, last << 1), UTF16);
}
複製代碼
等同substring
方法。併發
public CharSequence subSequence(int beginIndex, int endIndex) {
return this.substring(beginIndex, endIndex);
}
複製代碼
該方法用於將指定的字符串參數鏈接到字符串上。邏輯爲,app
System.arraycopy
進行拷貝並返回新的 String 對象。public String concat(String str) {
int olen = str.length();
if (olen == 0) {
return this;
}
if (coder() == str.coder()) {
byte[] val = this.value;
byte[] oval = str.value;
int len = val.length + oval.length;
byte[] buf = Arrays.copyOf(val, len);
System.arraycopy(oval, 0, buf, val.length, oval.length);
return new String(buf, coder);
}
int len = length();
byte[] buf = StringUTF16.newBytesFor(len + olen);
getBytes(buf, 0, UTF16);
str.getBytes(buf, len, UTF16);
return new String(buf, UTF16);
}
複製代碼
該方法用於替換字符串中指定的字符,分兩種編碼處理。機器學習
public String replace(char oldChar, char newChar) {
if (oldChar != newChar) {
String ret = isLatin1() ? StringLatin1.replace(value, oldChar, newChar)
: StringUTF16.replace(value, oldChar, newChar);
if (ret != null) {
return ret;
}
}
return this;
}
複製代碼
public static String replace(byte[] value, char oldChar, char newChar) {
if (canEncode(oldChar)) {
int len = value.length;
int i = -1;
while (++i < len) {
if (value[i] == (byte)oldChar) {
break;
}
}
if (i < len) {
if (canEncode(newChar)) {
byte buf[] = new byte[len];
for (int j = 0; j < i; j++) {
buf[j] = value[j];
}
while (i < len) {
byte c = value[i];
buf[i] = (c == (byte)oldChar) ? (byte)newChar : c;
i++;
}
return new String(buf, LATIN1);
} else {
byte[] buf = StringUTF16.newBytesFor(len);
inflate(value, 0, buf, 0, i);
while (i < len) {
char c = (char)(value[i] & 0xff);
StringUTF16.putChar(buf, i, (c == oldChar) ? newChar : c);
i++;
}
return new String(buf, UTF16);
}
}
}
return null;
}
複製代碼
public String replace(CharSequence target, CharSequence replacement) {
String tgtStr = target.toString();
String replStr = replacement.toString();
int j = indexOf(tgtStr);
if (j < 0) {
return this;
}
int tgtLen = tgtStr.length();
int tgtLen1 = Math.max(tgtLen, 1);
int thisLen = length();
int newLenHint = thisLen - tgtLen + replStr.length();
if (newLenHint < 0) {
throw new OutOfMemoryError();
}
StringBuilder sb = new StringBuilder(newLenHint);
int i = 0;
do {
sb.append(this, i, j).append(replStr);
i = j + tgtLen;
} while (j < thisLen && (j = indexOf(tgtStr, j + tgtLen1)) > 0);
return sb.append(this, i, thisLen).toString();
}
複製代碼
都是用正則去實現。佈局
public String replaceFirst(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceFirst(replacement);
}
public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}
複製代碼
該方式用於切分字符串,實現中首先會判斷能不能不用正則引擎,若是能夠則直接切分,不然採用正則引擎切分。學習
public String[] split(String regex) {
return split(regex, 0);
}
public String[] split(String regex, int limit) {
char ch = 0;
if (((regex.length() == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList<String> list = new ArrayList<>();
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else {
int last = length();
list.add(substring(off, last));
off = last;
break;
}
}
if (off == 0)
return new String[]{this};
if (!limited || list.size() < limit)
list.add(substring(off, length()));
int resultSize = list.size();
if (limit == 0) {
while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
resultSize--;
}
}
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}
複製代碼
用某個分隔符將字符串數組鏈接起來。主要是經過 StringJoiner 類來實現。
public static String join(CharSequence delimiter, CharSequence... elements) {
Objects.requireNonNull(delimiter);
Objects.requireNonNull(elements);
StringJoiner joiner = new StringJoiner(delimiter);
for (CharSequence cs: elements) {
joiner.add(cs);
}
return joiner.toString();
}
public static String join(CharSequence delimiter,
Iterable<? extends CharSequence> elements) {
Objects.requireNonNull(delimiter);
Objects.requireNonNull(elements);
StringJoiner joiner = new StringJoiner(delimiter);
for (CharSequence cs: elements) {
joiner.add(cs);
}
return joiner.toString();
}
複製代碼
StringJoiner 的 add 方法和 toString 方法以下。add 方法主要邏輯就是將每一個字符串賦值到字符串數組,而且要將分隔符的長度累加起來。toString 方法主要是將字符串數組和分隔符鏈接起來並返回最終的字符串。
public StringJoiner add(CharSequence newElement) {
final String elt = String.valueOf(newElement);
if (elts == null) {
elts = new String[8];
} else {
if (size == elts.length)
elts = Arrays.copyOf(elts, 2 * size);
len += delimiter.length();
}
len += elt.length();
elts[size++] = elt;
return this;
}
複製代碼
public String toString() {
final String[] elts = this.elts;
if (elts == null && emptyValue != null) {
return emptyValue;
}
final int size = this.size;
final int addLen = prefix.length() + suffix.length();
if (addLen == 0) {
compactElts();
return size == 0 ? "" : elts[0];
}
final String delimiter = this.delimiter;
final char[] chars = new char[len + addLen];
int k = getChars(prefix, chars, 0);
if (size > 0) {
k += getChars(elts[0], chars, k);
for (int i = 1; i < size; i++) {
k += getChars(delimiter, chars, k);
k += getChars(elts[i], chars, k);
}
}
k += getChars(suffix, chars, k);
return jla.newStringUnsafe(chars);
}
private void compactElts() {
if (size > 1) {
final char[] chars = new char[len];
int i = 1, k = getChars(elts[0], chars, 0);
do {
k += getChars(delimiter, chars, k);
k += getChars(elts[i], chars, k);
elts[i] = null;
} while (++i < size);
size = 1;
elts[0] = jla.newStringUnsafe(chars);
}
}
複製代碼
用於轉成小寫,須要分兩種編碼處理,下面看 Latin1 編碼的處理邏輯,
public String toLowerCase(Locale locale) {
return isLatin1() ? StringLatin1.toLowerCase(this, value, locale)
: StringUTF16.toLowerCase(this, value, locale);
}
public String toLowerCase() {
return toLowerCase(Locale.getDefault());
}
複製代碼
Character.toLowerCase
檢查是否所有爲小寫,若是是則直接返回字符串自己。(lang == "tr" || lang == "az" || lang == "lt")
三者語言,則額外處理,由於不在 Latin1 編碼中。System.arraycopy
賦值一個新數組,而後經過遍歷源數組,將一個個字符轉成小寫再賦給新數組。public static String toLowerCase(String str, byte[] value, Locale locale) {
if (locale == null) {
throw new NullPointerException();
}
int first;
final int len = value.length;
for (first = 0 ; first < len; first++) {
int cp = value[first] & 0xff;
if (cp != Character.toLowerCase(cp)) {
break;
}
}
if (first == len)
return str;
String lang = locale.getLanguage();
if (lang == "tr" || lang == "az" || lang == "lt") {
return toLowerCaseEx(str, value, first, locale, true);
}
byte[] result = new byte[len];
System.arraycopy(value, 0, result, 0, first);
for (int i = first; i < len; i++) {
int cp = value[i] & 0xff;
cp = Character.toLowerCase(cp);
if (!canEncode(cp)) {
return toLowerCaseEx(str, value, first, locale, false);
}
result[i] = (byte)cp;
}
return new String(result, LATIN1);
}
複製代碼
用於將字符串成轉成大寫,實現邏輯與上面的轉小寫同樣。
public String toUpperCase(Locale locale) {
return isLatin1() ? StringLatin1.toUpperCase(this, value, locale)
: StringUTF16.toUpperCase(this, value, locale);
}
public String toUpperCase() {
return toUpperCase(Locale.getDefault());
}
複製代碼
用於刪除字符串的頭尾空白符,分兩種編碼處理,以 Latin1 爲例,
public String trim() {
String ret = isLatin1() ? StringLatin1.trim(value)
: StringUTF16.trim(value);
return ret == null ? this : ret;
}
複製代碼
public static String trim(byte[] value) {
int len = value.length;
int st = 0;
while ((st < len) && ((value[st] & 0xff) <= ' ')) {
st++;
}
while ((st < len) && ((value[len - 1] & 0xff) <= ' ')) {
len--;
}
return ((st > 0) || (len < value.length)) ?
newString(value, st, len - st) : null;
}
複製代碼
直接返回 this。
public String toString() {
return this;
}
複製代碼
將字符串轉成 char 數組,分兩種編碼處理,以 Latin1 爲例,核心就在(char)(src[srcOff++] & 0xff)
。
public char[] toCharArray() {
return isLatin1() ? StringLatin1.toChars(value)
: StringUTF16.toChars(value);
}
複製代碼
public static char[] toChars(byte[] value) {
char[] dst = new char[value.length];
inflate(value, 0, dst, 0, value.length);
return dst;
}
public static void inflate(byte[] src, int srcOff, char[] dst, int dstOff, int len) {
for (int i = 0; i < len; i++) {
dst[dstOff++] = (char)(src[srcOff++] & 0xff);
}
}
複製代碼
格式化字符串,經過 Formatter 來實現。
public static String format(String format, Object... args) {
return new Formatter().format(format, args).toString();
}
public static String format(Locale l, String format, Object... args) {
return new Formatter(l).format(format, args).toString();
}
複製代碼
用於將傳入的對象轉成 String 對象,可傳入多種類型參數。
obj.toString()
。Integer.toString(i)
。Long.toString(l)
。Float.toString(f)
。Double.toString(d)
。public static String valueOf(Object obj) {
return (obj == null) ? "null" : obj.toString();
}
public static String valueOf(char data[]) {
return new String(data);
}
public static String valueOf(char data[], int offset, int count) {
return new String(data, offset, count);
}
public static String valueOf(boolean b) {
return b ? "true" : "false";
}
public static String valueOf(char c) {
if (COMPACT_STRINGS && StringLatin1.canEncode(c)) {
return new String(StringLatin1.toBytes(c), LATIN1);
}
return new String(StringUTF16.toBytes(c), UTF16);
}
public static String valueOf(int i) {
return Integer.toString(i);
}
public static String valueOf(long l) {
return Long.toString(l);
}
public static String valueOf(float f) {
return Float.toString(f);
}
public static String valueOf(double d) {
return Double.toString(d);
}
複製代碼
一個 native 方法,具體實現可看前面的文章《深刻談談String.intern()在JVM的實現》
public native String intern();
複製代碼
用於複製指定的字節數組,主要是經過System.arraycopy
來實現。但若是目標數組爲 UTF16 編碼,則需將高位和低位都賦值到字節數組。
void getBytes(byte dst[], int dstBegin, byte coder) {
if (coder() == coder) {
System.arraycopy(value, 0, dst, dstBegin << coder, value.length);
} else {
StringLatin1.inflate(value, 0, dst, dstBegin, value.length);
}
}
public static void inflate(byte[] src, int srcOff, byte[] dst, int dstOff, int len) {
checkBoundsOffCount(dstOff, len, dst);
for (int i = 0; i < len; i++) {
putChar(dst, dstOff++, src[srcOff++] & 0xff);
}
}
static void putChar(byte[] val, int index, int c) {
index <<= 1;
val[index++] = (byte)(c >> HI_BYTE_SHIFT);
val[index] = (byte)(c >> LO_BYTE_SHIFT);
}
複製代碼
獲取字符串的編碼,若是使用非緊湊佈局則必定爲 UTF16,不然可能爲 Latin1 或 UTF16。
byte coder() {
return COMPACT_STRINGS ? coder : UTF16;
}
複製代碼
判斷是否爲 Latin1 編碼。必須是緊湊佈局且爲 LATIN1 才屬於 Latin1 編碼。
private boolean isLatin1() {
return COMPACT_STRINGS && coder == LATIN1;
}
複製代碼
-------------推薦閱讀------------
個人開源項目彙總(機器&深度學習、NLP、網絡IO、AIML、mysql協議、chatbot)
跟我交流,向我提問:
歡迎關注: