Java6 String.substring()方法的內存泄露

substring(start,end)在Java編程裏面常常使用,沒想到若是使用不當,會出現內存泄露。html

 

要了解substring(),最好的方法即是查看源碼(jdk6):java

 1  /**
 2      * <blockquote><pre>
 3      * "hamburger".substring(4, 8) returns "urge"
 4      * "smiles".substring(1, 5) returns "mile"
 5      * </pre></blockquote>
 6      *
 7      * @param      beginIndex   the beginning index, inclusive.
 8      * @param      endIndex     the ending index, exclusive.
 9      * @return     the specified substring.
10      * @exception  IndexOutOfBoundsException  if the
11      *             <code>beginIndex</code> is negative, or
12      *             <code>endIndex</code> is larger than the length of
13      *             this <code>String</code> object, or
14      *             <code>beginIndex</code> is larger than
15      *             <code>endIndex</code>.
16      */
17     public String substring(int beginIndex, int endIndex) {
18     if (beginIndex < 0) {
19         throw new StringIndexOutOfBoundsException(beginIndex);
20     }
21     if (endIndex > count) {
22         throw new StringIndexOutOfBoundsException(endIndex);
23     }
24     if (beginIndex > endIndex) {
25         throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
26     }
27     return ((beginIndex == 0) && (endIndex == count)) ? this :
28         new String(offset + beginIndex, endIndex - beginIndex, value);
29     }

 

插一句,這段substring()的源代碼,爲如何編寫api提供了很好的一個例子,讓我想起了老趙的一篇文章,對參數的判斷,異常的處理,思路上有點接近。編程

值得注意的是,若是調用substring(i,i)的話(即beginIndex==endIndex)或者是substring(stringLength)(便是beginIndex==字符串長度),並不會拋出異常,而是會返回一個空的字符串,由於new String(offset + beginIndex , 0 , value)。api

 

言歸正傳,真正建立字符串的,是一個String(int,in,char[])的構造函數,源代碼以下:數組

1 // Package private constructor which shares value array for speed.
2     String(int offset, int count, char value[]) {
3     this.value = value;
4     this.offset = offset;
5     this.count = count;
6     }

 

Java裏的字符串,實際上是由三個私有變量定義:數據結構

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence
{
    /** The value is used for character storage. */
    private final char value[];

    /** The offset is the first index of the storage that is used. */
    private final int offset;

    /** The count is the number of characters in the String. */
    private final int count;
}

 

當爲字符串分配內存時,char數組存儲字符,offset=0,count=字符串長度。問題在於,由substring(start,end)調用構造函數String(int,in,char[])時,其實是改變offset和count的位置達到取得子字符串的目的,而子字符串裏的value[]數組,仍然指向原字符串。假設原字符串s有1GB,且咱們須要的是s.substring(1,10)這樣一段小的字符串,但因爲substring()裏的value[]數組仍然指向1GB的原字符串,致使原字符串沒法在GC中釋放,從而產生了內存泄露。less

 

但爲何要這樣設計呢?因爲String是不可變的(immutable),基於這種共享同一個字符數組的設計有如下好處:ide

調用substring()時無需複製數組,可重用value[]數組;且substring()的運行是常數時間,非線性,性能獲得提升(這也是第二段代碼註釋的意思:share values for speed)。wordpress

而劣勢,即是可能會產生內存泄露(實際上,Oracle早有人提出這個bug:http://bugs.sun.com/view_bug.do?bug_id=4513622)。函數

 

如何避免這個問題呢?有一個變通的方案,經過一個構造函數,複製一段數組:

 1 /**
 2      * Initializes a newly created {@code String} object so that it represents
 3      * the same sequence of characters as the argument; in other words, the
 4      * newly created string is a copy of the argument string. Unless an
 5      * explicit copy of {@code original} is needed, use of this constructor is
 6      * unnecessary since Strings are immutable.
 7      *
 8      * @param  original
 9      *         A {@code String}
10      */
11     public String(String original) {
12     int size = original.count;
13     char[] originalValue = original.value;
14     char[] v;
15       if (originalValue.length > size) {
16          // The array representing the String is bigger than the new
17          // String itself.  Perhaps this constructor is being called
18          // in order to trim the baggage, so make a copy of the array.
19             int off = original.offset;
20             v = Arrays.copyOfRange(originalValue, off, off+size);
21      } else {
22          // The array representing the String is the same
23          // size as the String, so no point in making a copy.
24         v = originalValue;
25      }
26     this.offset = 0;
27     this.count = size;
28     this.value = v;
29     }
30 
31 //smalStr no longer holds the value[] of 1GB
32 String smallStr = new String(s.substring(1,10));

 

上面的構造方法,從新複製了一段數組給v,而後再將v給字符串的數組,從而避免內存泄露。

 

在Java7裏,String的實現已經改變,substring()方法的實現,由原來的共享數組變成了傳統的拷貝,杜絕了內存泄露的同時也將運行時間由常數變成了線性:

 1 public String substring(int beginIndex, int endIndex) {
 2         if (beginIndex < 0) {
 3             throw new StringIndexOutOfBoundsException(beginIndex);
 4         }
 5         if (endIndex > value.length) {
 6             throw new StringIndexOutOfBoundsException(endIndex);
 7         }
 8         int subLen = endIndex - beginIndex;
 9         if (subLen < 0) {
10             throw new StringIndexOutOfBoundsException(subLen);
11         }
12         return ((beginIndex == 0) && (endIndex == value.length)) ? this
13                 : new String(value, beginIndex, subLen);
14     }
/**
     * Allocates a new {@code String} that contains characters from a subarray
     * of the character array argument. The {@code offset} argument is the
     * index of the first character of the subarray and the {@code count}
     * argument specifies the length of the subarray. The contents of the
     * subarray are copied; subsequent modification of the character array does
     * not affect the newly created string.
     *
     * @param  value
     *         Array that is the source of characters
     *
     * @param  offset
     *         The initial offset
     *
     * @param  count
     *         The length
     *
     * @throws  IndexOutOfBoundsException
     *          If the {@code offset} and {@code count} arguments index
     *          characters outside the bounds of the {@code value} array
     */
    public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count < 0) {
            throw new StringIndexOutOfBoundsException(count);
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

 

這個構造函數,每次都會複製數組,實現與Java6並不同。至於哪一個好哪一個壞,其實很難說清楚。

聽說有一種Rope的數據結構,能夠更加高效地處理字符串,得好好看看。

 

參考:

http://javarevisited.blogspot.hk/2011/10/how-substring-in-java-works.html

http://eyalsch.wordpress.com/2009/10/27/stringleaks/

http://blog.zhaojie.me/2013/03/string-and-rope-1-string-in-dotnet-and-java.html

http://www.transylvania-jug.org/archives/5530

相關文章
相關標籤/搜索