編程題:截取字符串的函數

時間 2019-12-23

標籤編程截取字符串函數简体版

原文原文鏈接

編寫一個截取字符串的函數，輸入爲一個字符串和字節數，輸出爲按字節截取的字符串。可是要保證漢字不被截半個，如「我ABC」4，應該截爲「我AB」，輸入「我ABC漢DEF」，6，應該輸出爲「我ABC」而不是「我ABC+漢的半個」。java

這道題目的關鍵點有兩個：數組

一、漢字按照2字節，英文字母按照1字節進行截取（須要找到對應的編碼格式）app

二、如何判斷哪一個是漢字，哪一個是英文字母（須要找到區分漢字與字母的方法）函數

1.import java.io.UnsupportedEncodingException;  
2.  
3.public class EncodeTest {  
4.    /** 
5.     * 打印字符串在指定編碼下的字節數和編碼名稱到控制檯 
6.     *  
7.     * @param s 
8.     *            字符串 
9.     * @param encodingName 
10.     *            編碼格式 
11.     */  
12.    public static void printByteLength(String s, String encodingName) {  
13.        System.out.print("字節數：");  
14.        try {  
15.            System.out.print(s.getBytes(encodingName).length);  
16.        } catch (UnsupportedEncodingException e) {  
17.            e.printStackTrace();  
18.        }  
19.        System.out.println(";編碼：" + encodingName);  
20.    }  
21.  
22.    public static void main(String[] args) {  
23.        String en = "A";  
24.        String ch = "人";  
25.  
26.        // 計算一個英文字母在各類編碼下的字節數  
27.        System.out.println("英文字母：" + en);  
28.        EncodeTest.printByteLength(en, "GB2312");  
29.        EncodeTest.printByteLength(en, "GBK");  
30.        EncodeTest.printByteLength(en, "GB18030");  
31.        EncodeTest.printByteLength(en, "ISO-8859-1");  
32.        EncodeTest.printByteLength(en, "UTF-8");  
33.        EncodeTest.printByteLength(en, "UTF-16");  
34.        EncodeTest.printByteLength(en, "UTF-16BE");  
35.        EncodeTest.printByteLength(en, "UTF-16LE");  
36.  
37.        System.out.println();  
38.  
39.        // 計算一箇中文漢字在各類編碼下的字節數  
40.        System.out.println("中文漢字：" + ch);  
41.        EncodeTest.printByteLength(ch, "GB2312");  
42.        EncodeTest.printByteLength(ch, "GBK");  
43.        EncodeTest.printByteLength(ch, "GB18030");  
44.        EncodeTest.printByteLength(ch, "ISO-8859-1");  
45.        EncodeTest.printByteLength(ch, "UTF-8");  
46.        EncodeTest.printByteLength(ch, "UTF-16");  
47.        EncodeTest.printByteLength(ch, "UTF-16BE");  
48.        EncodeTest.printByteLength(ch, "UTF-16LE");  
49.    }  
50.}  
 
 
 
運行結果以下：
 1.英文字母：A 
2.字節數：1;編碼：GB2312 
3.字節數：1;編碼：GBK 
4.字節數：1;編碼：GB18030 
5.字節數：1;編碼：ISO-8859-1 
6.字節數：1;編碼：UTF-8 
7.字節數：4;編碼：UTF-16 
8.字節數：2;編碼：UTF-16BE 
9.字節數：2;編碼：UTF-16LE 
10.中文漢字：人 
11.字節數：2;編碼：GB2312 
12.字節數：2;編碼：GBK 
13.字節數：2;編碼：GB18030 
14.字節數：1;編碼：ISO-8859-1 
15.字節數：3;編碼：UTF-8 
16.字節數：4;編碼：UTF-16 
17.字節數：2;編碼：UTF-16BE 
18.字節數：2;編碼：UTF-16LE 

可知，GB23十二、GBK、GB18030三種編碼格式都符合題目要求
 
 
 
如何判斷哪一個字符是中文，哪一個是字母，可能有不少種方法，仁者見仁吧
 
一種，能夠將字符串轉化爲字符數組，分別檢查字符的GBK形式的字節長度
 
另外一種，能夠按照指定的字節數截取對應長度的字符串，而後判斷子串的字節長度是否等於指定截取的字節長度，等於的話，說明子串沒有中文，不等於的話，說明有中文字符。
 
請看相關代碼：
 


Java代碼  
1./**   
2.   * 判斷是不是一箇中文漢字   
3.   *    
4.    * @param c   
5.   *            字符   
6.    * @return  true表示是中文漢字，false表示是英文字母   
7.    * @throws UnsupportedEncodingException   
8.    *             使用了JAVA不支持的編碼格式   
9.    */    
10.   public static boolean isChineseChar(char c)     
11.           throws UnsupportedEncodingException {     
12.      // 若是字節數大於1，是漢字     
13.      // 以這種方式區別英文字母和中文漢字並非十分嚴謹，但在這個題目中，這樣判斷已經足夠了     
14.      return String.valueOf(c).getBytes("GBK").length > 1;     
15.   }     
 
 
 


Java代碼  
1./** 
2.    * 將給定的字符串按着給定的截取長度截取 
3.    * <br> 
4.    * 注意一個漢字佔2個字節 
5.    * @param str 
6.    * @param subSLength 
7.    * @return  截取後的字符串 
8.    * @throws UnsupportedEncodingException  
9.    */  
10.   public static String subStr(String str, int subSLength)  
11.           throws UnsupportedEncodingException  
12.   {  
13.         
14.       if (str == null)  
15.           return null;  
16.       else  
17.       {  
18.           int tempSubLength = subSLength;//截取字節數  
19.             
20.           String subStr = str.substring(0, subSLength);//截取的子串  
21.             
22.           int subStrByetsL = subStr.getBytes("GBK").length;//截取子串的字節長度  
23.             
24.           // 說明截取的字符串中包含有漢字  
25.           while (subStrByetsL > tempSubLength)  
26.           {  
27.               subStr = str.substring(0, --subSLength);  
28.               subStrByetsL = subStr.getBytes("GBK").length;  
29.           }  
30.           return subStr;  
31.       }  
32.         
33.   }

方法二：ui

public static String testSub(String input, int desiredLength, Charset cs) {
		if (input == null || desiredLength < 0
				|| desiredLength > input.length()) {
			throw new IllegalArgumentException("Input string in no valid");
		}
		char[] chars = input.toCharArray();

		String retval = null;
		int actualLength = desiredLength;
		while (retval == null && actualLength > 0) {
			
			char[] subChar = new char[actualLength];
			System.arraycopy(chars, 0, subChar, 0, actualLength);
			String temp = String.valueOf(subChar);
			
			if (temp.getBytes(cs).length > desiredLength) {
				--actualLength;
			} else {
				retval = temp;
			}
		}
		return retval;

	}

方法三：

public static String substr(String content,int length) throws UnsupportedEncodingException{
		if (content == null) return null;  
		    StringBuilder buf = new StringBuilder();  
		    int i = 0;  
		    for (char ch : content.toCharArray()) {  
		        i += String.valueOf(ch).getBytes("GBK").length;  
		        if (i > length) break;  
		        buf.append(ch);  
		    }  
		    return buf.toString();  
	}

方法四：。。。。。。編碼