ruby 字符串經常使用方法學習

時間 2019-11-09

原文原文鏈接

引用連接：http://www.blogjava.net/nkjava/archive/2010/01/03/308088.htmlhtml

1，切片：silce, [ ]-----------------[ ]是silce的別名，因此二者是徹底相同的
操做1：斷定字符串中是否含有字串/子模式
string[substring]
string[/pattern/]
string[/pattern/, position] #position以後的子串中是否含有/pattern/
若是存在返回子串/子模式串，不然返回nil
「hello world"["hello"]==="hello"
"hello world"[/.*lo/]==="hello"
"hello world"[/en/]===nil

操做2：使用索引截取子串
string[position] #注意返回的是ASCII碼而不是字符
string[start, length]
string[start..end]
string[start...end]

2，比較：
== #比較字符串是否相等
eql? #？？好像沒有區別
<=> #用來比較字符串的大小，大於返回 1，小於返回 -1，不然返回0

3，字符串的運算
downcase #改變字符串爲所有小寫
upcase #改變字符串爲所有大寫
swapcase#反寫
capitalize #改變字符串爲首字母大寫
* #重複字符串
insert num, string #在num位置插入串string(insert沒有！，由於insert會直接改變原串）
delete(!) string1 (,string2) #刪除string1交string2的字符
gsub find, replace #將串中的find，替換爲replace. find能夠是正則表達式，replace很顯然不能夠。注意：是全部相同的都替換，至關與sed中的s/pattern/string/g
replace string #將字符串替換爲string, 這是對象沒有變，只是其中的內容發生了變化。

利用切片來改變字符串(silce!, [ ]=)
"hello"["ello"]= "w" # "hw"
"hello"[1]="wan" # "hwanllo"
「hello"[1..3]= "wrd" #"hwrdo"
"hello"[1...3]= "wr" #"hwrlo"
"hello"[1,3]="wrd" #"hwrdo"
"hello"[/el/]= "wr" #"hwrlo"

chomp(!) 用來摘除字符串末尾的換行符（若是不是換行符返回空）#注意只能是換行符，空格都不行
chop(!)用來摘除字符串末尾的最後一個字符
reverse(!)首尾倒置
split(/pattern/)將字符串分割成數組，分隔符是/pattern/（注意帶！的方法不能用來改變類，因此split沒有！）

字符串長度
string.length
string.size

字符串對齊
string.ljust num, char #用char來填充string,不足num的部分。注意左對齊是右填充。若是字符串長度比char大，忽略
string.rjust num, char
string.center num, char

string.lstrip #trim字符串，去除左邊空格
string.rstrip
string.strip #去掉字符串的先後空格
..........那麼如何去掉全部的空格呢？很簡單，使用gsub，進行替換

string.next/succ #string+1 不是+1這麼簡單。"a".next == "zz"
string1.upto(stringn) #string1, string2 ....stringn

字符串遍歷：
string.each #分割不一樣項的必須是\n "hello\nworld".each {|e| puts e << ","}===
hello,
world,
"hello world".each{|e| puts e << ","}===
hello world,
string.each_byte #以字節爲單位遍歷

求字串的索引位置
string.index substring #正則表達式也能夠

正則表達式專用
string.grep /pattern/ #若是不是正則表達式搜索不到任何東西，若是是且匹配返回包含整個字符串的一個數組
string =~ /pattern/ #pattern第一次出現的位置
string !~ /pattern/ #若是沒有找到/pattern返回true(注意！)

uby很強大，但是相關資料少而不詳細。本文是我的學習總結，測試環境是windows xp sp3 + NetBeans6.7.1(JRuby 1.2.0),主要結論來自於互聯網、"Programming Ruby"2e、對於源代碼的分析和實測代碼。java

雙引號字符串和單引號字符串

都能表示字符串對象，區別在於雙引號字符串可以支持更多的轉義字符。下面的代碼在字符串中增長了'符號。
str=‘he'lo’
puts str
顯示結果爲he'lo。

單引號僅支持\\ => \ 和 \' => '

下表是ruby中雙引號字符串支持的轉義字符：c++

分界符

全部不是字母或者數字的單字節字符均可以成爲String的分界符。注意，一般他們都是成對出現的，好比<和>,!和!,{和}等。正則表達式

構造字符串字面量

方法一：
最簡單的使用單引號或者雙引號括起來的字符串，好比"hello"。

方法二：
使用%q配合分界符，%q表明單引號
str=%q!he\lo!

方法三：
使用%Q配合分界符，%Q表明雙引號
str=%Q{he\lo}

方法四：
here document構建字符串，該方法比較適合用於多行字符串的建立。由<<和邊界字符串做爲開頭，由邊界字符串做爲結尾，好比下列代碼：
str = <<END_OF_STRING1
We are here now,
where are you?
END_OF_STRING1
puts str
輸出結果爲：
We are here now,
where are you?

較爲複雜的是容許多個邊界字符串對出現。
str = <<END_OF_STRING1,<<END_OF_STRING2
We are here now,
where are you?
END_OF_STRING1
I will leave now,
would you like to go with me?
END_OF_STRING2

puts str
輸出結果爲：
We are here now,
where are you?
I will leave now,
would you like to go with me?算法

字面量與copy-on-write技術

在Java中，若是兩個String對象a和b的值都是"abcdef",以下：
String a="abcdef";
String b="abcdef";
那麼，JVM只會建立一個常量對象"abcdef",讓a和b都指向它。可是在ruby中，採用了智能指針（熟悉c++的朋友清楚）的一個高級技術 copy-on-write，一開始也是共享同一個字符常量，可是一旦以後某個對象（好比b對象)進行了修改操做，則"abcdef"將產生一個副本，b 的修改操做在這個副本上進行。
更詳細的討論請參考http://developer.51cto.com/art/200811/98630.htm。windows

和Java的一些其餘區別

Java的String每次執行修改操做，都不會改變自身，而是建立一個新的String對象，而Ruby每次的修改操做都會修改自身。api

計算長度

puts "hello".length
該句輸出5，是字符個數，不要和C函數搞混，C函數常常用0結束字符串，所以長度常常爲實際字符個數+1，Ruby中沒有這個習慣。數組

查找

從左向右查找第一個

    index方法有三種重載，分別是：
str.index(substring [, offset]) => fixnum or nil
str.index(fixnum [, offset]) => fixnum or nil
str.index(regexp [, offset]) => fixnum or nil
    第二個參數offset是可選參數，不用的話則從索引0的字符開始查找。
puts "hello".index("el") 輸出爲1 ，注意這裏的'el'也能夠。也能夠只查一個字符比，如puts "hello".index(101) 輸出爲1，這時候第一個參數爲'e'的二進制碼。
也可使用正則表達式進行查找，好比puts "hello".index(/[az]/) 輸出爲nil，由於"hello"不包含a或者z。[]是正則表達式的運算符，表明裏面的a和z有一個找到便可。
puts "hello".index(/lo/) 這個沒有[]符號，所以是查找子字符串lo，結果爲3.
    我我的以爲儘可能熟練使用正則表達式查找是最好的選擇，既能夠完成簡單查找，也能夠完成難度查找。不過須要付出很多努力去學習。
    下面這個例子puts "hello".index('o', -1) 證實了第二個參數能夠爲負數，雖然這沒有什麼意義，由於功能和爲0等價。
    若是查找不到，返回nil。ruby

逆向查找（從左向右查找最後一個仍是從右向左查找第一個）

str.rindex(substring [, fixnum]) => fixnum or nil
str.rindex(fixnum [, fixnum]) => fixnum or nil
str.rindex(regexp [, fixnum]) => fixnum or nil
    第一個參數和index相同，第二個參數是可選，若是不用則默認爲字符串尾部。若是爲0呢？則從第一個字符開始向右查找。若是爲負數呢？這時候很奇怪，居然能查到。經過看C的實現代碼，發現當fixnum<0時，會執行這個運算：fixnum+=substring.length，而後就能找到。邏輯上能夠理解爲當fixnum<0時，將從最右邊開始向左移動abs(fixnum)-1個位置，並做爲最後查找範圍，而後開始從左至右進行查找。字符串最右邊的字符的位置被-1表明。
下面兩行代碼結果都是nil:
puts "hlloe".rindex('e', -2)
puts "hlloe".rindex('e', 3)

下面兩行代碼結果都是1：
puts "hello".rindex('e', -2)
puts "hello".rindex('e', 3)

    注意，以上的代碼理解是我我的觀察代碼後的猜想，由於我還不會調試運行ruby的C代碼，因此不必定正確。代碼摘錄以下：（代碼是ruby網站公佈的C代碼，可是我所用的平臺其實NetBeans6.7.1，所以真正代碼應該是Java實現的JRuby1.2.0，這裏的C代碼僅供參考）
static VALUE
rb_str_rindex_m(argc, argv, str)
    int argc;
    VALUE *argv;
    VALUE str;
{
    VALUE sub;
    VALUE position;
    long pos;

    if (rb_scan_args(argc, argv, "11", ⊂, &position) == 2) {
        pos = NUM2LONG(position);
        if (pos < 0) {
            pos += RSTRING(str)->len;
            if (pos < 0) {
                if (TYPE(sub) == T_REGEXP) {
                    rb_backref_set(Qnil);
                }
                return Qnil;
            }
        }
        if (pos > RSTRING(str)->len) pos = RSTRING(str)->len;
    }
    else {
        pos = RSTRING(str)->len;
    }

    switch (TYPE(sub)) {
      case T_REGEXP:
        if (RREGEXP(sub)->len) {
            pos = rb_reg_adjust_startpos(sub, str, pos, 1);
            pos = rb_reg_search(sub, str, pos, 1);
        }
        if (pos >= 0) return LONG2NUM(pos);
        break;

      case T_STRING:
        pos = rb_str_rindex(str, sub, pos);
        if (pos >= 0) return LONG2NUM(pos);
        break;

      case T_FIXNUM:
      {
          int c = FIX2INT(sub);
          unsigned char *p = (unsigned char*)RSTRING(str)->ptr + pos;
          unsigned char *pbeg = (unsigned char*)RSTRING(str)->ptr;

          if (pos == RSTRING(str)->len) {
              if (pos == 0) return Qnil;
              --p;
          }
          while (pbeg <= p) {
              if (*p == c) return LONG2NUM((char*)p - RSTRING(str)->ptr);
              p--;
          }
          return Qnil;
      }

一般咱們理解爲從右邊開始查找，可是註釋卻代表是從左向右查找，並返回最後一個找到的目標的位置。究竟內幕如何，只能看代碼。
01161 static long
01162 rb_str_rindex (str, sub, pos)
01163 VALUE str, sub;
01164 long pos;
01165 {
01166 long len = RSTRING (sub)->len;
01167 char *s, *sbeg, *t;
01168
01169 /* substring longer than string */
01170 if (RSTRING (str)->len < len) return -1;
01171 if (RSTRING (str)->len - pos < len) {
01172 pos = RSTRING (str)->len - len;
01173 }
01174 sbeg = RSTRING (str)->ptr;
01175 s = RSTRING (str)->ptr + pos;
01176 t = RSTRING (sub)->ptr;
01177 if (len) {
01178 while (sbeg <= s) {
01179 if ( rb_memcmp (s, t, len) == 0) {
01180 return s - RSTRING (str)->ptr;
01181 }
01182 s--;
01183 }
01184 return -1;
01185 }
01186 else {
01187 return pos;
01188 }
01189 }

    經過看代碼，發現s--;所以，是從右向左進行匹配，找到的第一個就返回。寫註釋的人應該槍斃!雖然看上去意思同樣，可是算法的時間複雜度大不同。從左到右的查找老是O(n),而從右到左的最壞事件複雜度纔是O(n)。
函數

大小寫不區分查找

puts "hello".upcase.index("H")，利用downcase或者upcase所有轉換成小寫或者大寫，而後再查找。

正則表達式匹配查找

operator =~ 將返回匹配的模式開始位置，若是沒有找到則返回nil。
puts "abcde789" =~ /d/
輸出5.

提取子字符串

str="hello"
puts str[0,2]
第一個參數是子字符串首字母的Index，第二個是長度（不能爲負數）。
結果爲he。
第一個參數能夠爲負數，會把最右邊的字符做爲-1，而後向左增長-1的方式查找起始位置，好比：
str="hello"
puts str[-2,2]
輸出爲lo，這種狀況咱們在rindex方法中已經看到過了。

也可使用正則表達式進行提取，這真的很強大。
str="hello"
puts str[/h..l/]
輸出爲hell。

符號.表明一個字符，兩個.表明兩個字符。兩個/裏面的內容就是正則表達式。.*表明能夠有無數個字符，好比
str="hello"
puts str[/h.*o/]
輸出爲hello。

字符計數

String#count用來計算咱們參數中給出的字符集中字符出現的總次數，好比最簡單的狀況：
str = "hello,world"
puts str.count "w"
「w" 參數表明的是一個字符結合，裏面只有一個字符w，count方法計算出w出如今"hello,world"的次數是1，所以輸出爲1。
下面咱們的參數裏面包含了三個字符：
str = "hello,world"
puts str.count "wld"
輸出爲5，w出現1次，l出現3次，d出現1次，正好5次。

也能夠傳遞多個參數，每一個參數表明一個字符集合，這時候這些字符集合的交集做爲count計算的條件：
str = "hello,world"
puts str.count "lo","o"
輸出爲2。
str = "hello,world"
puts str.count "lo","o"," "
輸出爲0，由於三個集合的交集爲空，因此計算結果爲0.

注意，若是參數^o,表明o出現的次數不計算。

刪除末尾分隔符

String#chomp方法有一個字符串參數，指定了要在末尾刪除的子字符串。若是不用這個參數，則會將字符串末尾的n,r和rn刪除（若是有的話）。

壓縮重複字符

String#squeeze方法若是不用參數，則會將字符串中的任何連續重複字符變成單一字符，以下：
str = "helllloo"
puts str.squeeze
輸出：helo。
若是傳遞字符串參數，含義同count方法的參數同樣，表明了一個字符集合，則將符合條件（1，在字符集合中出現；2，在字符串中連續出現）的子字符串壓縮成的單一字符
實例代碼以下：
str = "helllloo"
puts str.squeeze('l')
puts str.squeeze('a-l')
puts str.squeeze('lo')
輸出爲：
heloo
heloo
helo

參數也能夠用a-z方式表示在某個字符集合區間內。

一個很經常使用的功能是利用squeeze(" ")對字符串內重複的空白字符進行壓縮。

字符串刪除

delete方法

能夠接收多個參數，每一個參數表明一個字符集合，相似count方法。若是有多個參數，取交集，而後從字符串中刪除全部出如今交集中的字符。
"hello".delete "l","lo" #=> "heo"
"hello".delete "lo" #=> "he"
"hello".delete "aeiou", "^e" #=> "hell"
"hello".delete "ej-m" #=> "ho"

利用sub和gsub

參見後面的sub用法，使用''進行替換便可。

字符串拆分

String#split接收兩個參數，第一個參數老是被做爲間隔符來拆分字符串，而且不會出如今結果中。
第一個參數若是是正則表達式的話，若是爲空，則每一個字符都被拆開，返回一個字符數組。例子代碼以下：
str = "hello"
puts str.split(//)
輸出爲：
h
e
l
l
o

若是正則表達式不爲空，則根據匹配的狀況進行拆分。例子代碼以下：
str = "hello"
puts str.split(/h/)
結果爲：

ello

拆分紅了兩個數組，第一個爲""，第二個爲ello，用h進行拆分的。
第一個參數的另外一種用法很簡單，只是一個字符串，用於做爲間隔符進行拆分，就不舉例子了。我更傾向於使用強大的正則表達式。

第二個參數是一個整數，用於對拆分的結果數組的元素個數進行限制，這個功能有多大用處，我如今到沒有體會，通常狀況下不用便可。

大小寫轉換

如前面出現的，利用downcase或者upcase方法便可。

數組操做

使用[]，裏面填上Index，就能夠獲取第Index個元素。

和數值類型的相互轉換

獲取單字節字符的二進制碼
puts ?e
？運算符用於中文是非法的。

字符串迭代

Ruby迭代器的設計不在這裏討論，我會專門有一篇文章描述。

each_char

迭代每一個字符，下面是示例代碼：
require 'jcode' #NetBeans6.7.1和JRuby1.2.0須要，不然下面代碼找不到方法
"hello".each_char(){ |c| print c,' ' } #()能夠不寫

|c| 表明字符串中的當前字符。

each

迭代每一個子字符串，若是不傳遞seperator參數，則默認用n做爲seperator。
"hellonworld".each { |c| puts c }
輸出爲：
hello
world

若是傳遞了有效的字符串做爲seperator參數，那麼就以這個seperator代替n進行子字符串的迭代：
"hellonworld".each('l') { |s| p s }
輸出爲：
"hel"
"l"
"onworl"
"d"

each_byte

用法和each_char相似，不過迭代的對象是char，所以輸出的是二進制數值。
"hellonworld".each_byte { |s| print s," " }
輸出：
104 101 108 108 111 10 119 111 114 108 100

each_line

用法和前面相同，只是用換行符分割子字符串進行迭代：
"hellonworld".each_line do |s|
print s
end
注意，這是另外一種寫法，用do/end替換了{/}對。
輸出爲：
hello
world
只因此輸出爲兩行，是由於第一個子字符串是"hellon"輸出後自動換行。

字符串拼接

使用operator +操做

str1="hello,"
str2="world"
str3=str1+str2
puts str3
輸出爲hello,world

使用operator <<操做

str1="hello,"
str2="world"
str1<
puts str1
輸出爲hello,world

concat方法

concat方法能夠在字符串後面加上一個二進制值爲[0,255]的字符，用法以下：
str1="hello,world"
str1.concat(33)#33是!的二進制值
puts str1
輸出爲hello,world!

concat也能夠接一個object，好比另外一個String對象

是否爲空

String#empty? 方法若是爲空返回true，不然返回false

字符串比較

operator<=>操做

str1<=>str2
若是str1小於str2，返回-1；
若是str1等於str2，返回0；
若是str1大於str2，返回1。

官方註釋寫反了。

operator==操做

兩個比較對象必須都爲String，不然返回false;
若是都是String對象，則調用operator <=> 操做符進行比較，比較結果爲0時，返回true，不然返回false

字符串替換

replace方法

和operator = 功能相同，字符串內容的徹底替換，沒什麼做用。

sub方法

str.sub(pattern, replacement) => new_str
str.sub(pattern) {|match| block } => new_str

在str副本上將找到的第一個匹配字符（串）用replacement替換，並返回。好比：
puts "abcde789".sub(/d/, "000")
輸出爲：abcde00089

第二種重載形式容許執行一段代碼，好比：
puts "abcde789".sub(/d/){|c| 'a'}
找到的字符用|c|表示，能夠替換成a字符
輸出爲：abcdea89