newLISP你也行 --- 字符串

時間 2019-11-12

標籤 newlisp 也行字符串简体版

原文原文鏈接

#############################################################################
# Name:         newLISP你也行 --- 流
# Author:       黃登(winger)
# Project:      http://code.google.com/p/newlisp-you-can-do
# Gtalk:        free.winger@gmail.com
# Gtalk-Group: zen0code@appspot.com
# Blog:         http://my.opera.com/freewinger/blog/
# QQ-Group:     31138659
# 大道至簡 -- newLISP
#
# Copyright 2012 黃登(winger) All rights reserved.
# Permission is granted to copy, distribute and/or
# modify this document under the terms of the GNU Free Documentation License,
# Version 1.2 or any later version published by the Free Software Foundation;
# with no Invariant Sections, no Front-Cover Texts,and no Back-Cover Texts.
#############################################################################

        自由固不是錢所買到的，但可以爲錢而賣掉。        --- 魯迅

    現實中, 在人和計算機交互中, 涉及到最多的就是字符串了.
    以致於大部分的數據輸入都被當作字符串來處理.
    若是說列表是天地, 那字符串就必定是這天地間的橫流.

一. newLISP中的字符串
    Strings in newLISP code

    newLISP 處理字符串的能力無疑是強大的, 各類方便的刀具都給你備齊了, 每一把都
是居家宅男, 殺碼越貨, 的必備神器.

    廣告完畢, 言歸正傳.~_~~

    在nl裏有三種方法能夠表示字符串:

    用雙引號圍起來 ;優勢按鍵更少, 並且轉義字符有效, 好比"\n"
    (set 's "this is a string")

    用花括號圍起來 ;優勢過濾一切轉義字符
    (set 's {this is a string})

    用專門的標識碼圍起來 ;除了上面的優勢外,他還能夠構造大於2048字節的字符串
    (set 's [text]this is a string[/text])

    第一和第二中方法構建的字符串不能超過 2048 個字節.
    不少人會以爲既然有了第二種, 爲何還要有第一種?
    讓咱們測試下下面的代碼

> {\{}

ERR: string token too long : "\\{}"

> "\""
"\""

    看到沒, 花括號的好處就是過濾一切的轉義字符, 轉義字符到了裏面沒有任何做用.
若是你要print 一個字符串:

> (print {\n road to freedom})
\n road to freedom"\\n road to freedom"
> (print "\n road to freedom")

road to freedom"\n road to freedom"

    花括號內內的轉義字符沒效了, 根本沒換行. 這三種方法就第一種方法, 能夠在內部
使用本身的TAG 雙引號.

    第二種方法, 花括號, 這種方法我是很是鼓勵使用的, 爲何, 方便啊, 不用在轉義
字符前加個反斜槓了, 在構造正則表達式的時候尤爲好用.

> (println "\t45")
        45
"\t45"
> (println "\\t45")
\t45
"\\t45"
> (println {\t45})
\t45
"\\t45"

> (regex "\\d" "a9b6c4")
("9" 1 1)

> (regex {\d} "a9b6c4")
("9" 1 1)

    字符串一般支持如下幾種轉義字符:

character   description
\"          for a double quote inside a quoted string
\n          for a line-feed character (ASCII 10)
\r          for a return character (ASCII 13)
\t          for a TAB character (ASCII 9)
\nnn        for a three-digit ASCII number (nnn format between 000 and 255)
\xnn        for a two-digit-hex ASCII number (xnn format between x00 and xff)

(set 's "this is a string \n with two lines")
(println s)

this is a string
with two lines

(println "\110\101\119\076\073\083\080") ; 十進制 ASCII
newLISP

(println "\x6e\x65\x77\x4c\x49\x53\x50") ; 十六進制 ASCII
newLISP

    若是要你反過來把字符串寫成上面的各類數字字符串, 該怎麼呢?
    提示: 用 format 和 unpack .

    第三種[text] [\text] 一般用來處理超長的字符串數據(大於 2048 字節), 好比web
頁面. nL 在傳遞長字符串的時候, 也會自動使用這種格式.

(set 'novel (read-file {my-latest-novel.txt}))
;->
[text]
It was a dark and "stormy" night...
...
The End.
[/text]

    使用 length 能夠獲得字符串的長度:

(length novel)
;-> 575196

    newLISP 能夠高效的處理數百萬的字符串.
    若是要統計unicode 字符串的長度, 必須使用utf8 版本的 newLISP:

(utf8len (char 955))
;-> 1
(length (char 955))
;-> 2
> (utf8len "個")
4
> (length "個")
2

    cmd.exe 在處理非ascii 字符的時候會產生不少問題, 幾乎沒法解決, 可是非Win32
的 console 沒這個問題.

二. 構造字符串
    Making strings

    有N種方法構造字符串. 處處都是字符串. 遍地都是字符串...
    若是想一個一個字符的構造的話能夠用 char :

(char 33)
;-> "!"

> (char "a")
97

> (char 0x61)
"a"

> (char 97)
"a"

    char 只能處理一個字符, 他能夠將字符轉換成數字, 也能夠將數字轉換成字符.

(join (map char (sequence (char "a") (char "z"))))
;-> "abcdefghijklmnopqrstuvwxyz"

    char 得到 "a" 和 "z" ascii碼, 而後用sequence 產生一個數字序列, 接着用map
映射 char 函數到每一個數字, 產生數字相對應的字符. 最後join 將整個列表合成一個字
符串.

    咱們也能夠給 join 傳遞一個參數, 作分隔符.

(join (map char (sequence (char "a") (char "z"))) "-")
;-> "a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z"

    和 join 相似 append 也能夠鏈接字符串. (大部分的列表函數也可用於字符串)

(append "con" "cat" "e" "nation")
;-> "concatenation"

    構造列表的時候咱們用list , 構造字符串咱們用string .
    string 能夠將各類參數組合成, 一個字符串.

(define x 42)
(string {the value of } 'x { is } x)
;-> "the value of x is 42"

    更精細的字符串輸出可使用format , 稍後就會見到.
    dup 能夠複製字符串:

> (dup "帥鍋" 5)
"帥鍋帥鍋帥鍋帥鍋帥鍋"

    date 會產生一個包含當前時間信息的字符串.

> (date)
"Mon May 14 15:50:34 2012"

> (date 1234567890)
"Sat Feb 14 07:31:30 2009"

三. 字符串手術
    String surgery

    這裏不知道怎麼翻譯鳥, 手術啊. 聽起來很恐怖. 其實就是永久性改變.

-     不少函數均可以操做字符串, 部分是具備破壞性的(destructive 這些函數在手冊
裏, 都有一個 ! 標誌).

(set 't "a hypothetical one-dimensional subatomic particle")
(reverse t)
;-> "elcitrap cimotabus lanoisnemid-eno lacitehtopyh a"
t
;-> "elcitrap cimotabus lanoisnemid-eno lacitehtopyh a"

    以前已經說過要用這些函數又不想破壞原來的數據, 就要用 copy.

(reverse (copy t))
;-> "elcitrap cimotabus lanoisnemid-eno lacitehtopyh a"
t
;-> "a hypothetical one-dimensional subatomic particle"

    上面的reverse 永久性的改變了 t. 可是下面的大小寫轉換函數, 卻不會改變原字符
串.

(set 't "a hypothetical one-dimensional subatomic particle")
(upper-case t)
;-> "A HYPOTHETICAL ONE-DIMENSIONAL SUBATOMIC PARTICLE"
(lower-case t)
;-> "a hypothetical one-dimensional subatomic particle"
(title-case t)
;-> "A hypothetical one-dimensional subatomic particle"
t
;-> "a hypothetical one-dimensional subatomic particle"

四. 子串
    Substrings

    若是須要抽取字符串中的一部分能夠用如下的方法:

(set 't "a hypothetical one-dimensional subatomic particle")
(first t)
;-> "a"
(rest t)
;-> " hypothetical one-dimensional subatomic particle"
(last t)
;-> "e"
(t 2)
;-> "h"

    你會發現這和上一章介紹的列表操做好像. 在nL裏頭大部分的列表操做函數, 也一樣
能夠操做字符串. 其中就包括各類選取函數.

1: 字符串分片
    String slices

    slice 能夠將從一個現存的字符串中, 分割出一個新的字符串.

(set 't "a hypothetical one-dimensional subatomic particle")
(slice t 15 13) ;從第15個位置開始, 提取出出13個字符
;-> "one-dimension"
(slice t -8 8) ;從倒數第8個位置開始, 提取出8個字符
;-> "particle"
(slice t 2 -9) ;從第2個位置開始, 提取到倒數第9個字符爲止(第9個字符不算)
;-> "hypothetical one-dimensional subatomic"
(slice "schwarzwalderkirschtorte" 19 -1) ;同上, 最後一個字符不取
;-> "tort"

    固然, 字符串也能夠用隱式操做.

(15 13 t)
;-> "one-dimension"
(0 14 t)
;-> "a hypothetical"

    上面提取的字符串都是連續的. 若是要抽取出分散的字符. 就得用 select :

(set 't "a hypothetical one-dimensional subatomic particle")
(select t 3 5 24 48 21 10 44 8)
;-> "yosemite"
(select t (sequence 1 49 12)) ; 從第1個字符開始, 每隔12個提取出一個字符
;-> " lime"

> (help select)
syntax: (select <string> <list-selection>)
syntax: (select <string> [<int-index_i> ... ])

     <list-selection> 列表中包含了要提取的字符的位置.

2: 改變字符串的首位
    Changing the ends of strings

    chop 和 trim 能夠給字符串作收尾切除術, 他們都具破壞性.
    切切切...

    chop 只能切除一個指定位置的字符...

(chop t) ; 默認是最後一個字符
;-> "a hypothetical one-dimensional subatomic particl"
(chop t 9) ; 切除第9個字符
;-> "a hypothetical one-dimensional subatomic"

    trim 修剪掉存在於字符串頭尾的指定字符.

(set 's " centred ")
(trim s) ; defaults to removing spaces
;-> "centred"

(set 's "------centred------")
(trim s "-")
;-> "centred"

(set 's "------centred********")
(trim s "-" "*") ;能夠分別指定須要修剪的頭和尾 "字符"
;-> "centred"

3: push 和 pop 字符串
    push and pop work on strings too

    push 能夠將元素壓入指定字符串的指定位置. pop 相反.
    若是沒有指定位置, 默認爲字符串的第一個位置.

(set 't "some ")
(push "this is " t)
(push "text " t -1)
;-> t is now "this is some text"

    push 和 pop 都返回壓入或者彈出的元素, 而不是目標字符串. 這樣操做大的字符串
時, 就會更快. 不然你就得用slice 屏蔽輸出了.

>(help pop)
syntax: (pop <str> [<int-index> [<int-length>]])

    能夠指定pop字符的數量, [<int-length>] .

(set 'version-string (string (sys-info -2)))
; eg: version-string is "10402"
(set 'dev-version (pop version-string -2 2)) ; 老是兩個數字
; version-string is now "02"
(set 'point-version (pop version-string -1)) ; 老是一個數字
; version-string is now "4"
(set 'version version-string) ; 一位或者兩位 99?
(println version "." point-version "." dev-version " on " ostype)
10.4.02 on Win32
"Win32"

    ostype 返回操做系統類型.

五. 修改字符串
    Modifying strings

    有兩種方法修改字符串, 一種, 指定具體的位置. 第二種指定特定的內容.

1: 經過索引修改字符串
    Using index numbers in strings

    很久之前是有nth-set 和 set-nth 的, 不過鑑於各類 set 和被 set , 其操做方法
和返回值的複雜性. 在現今的版本中, 他們都已經消失不見了. 不過咱們可使用隱式索
引, 操做訪問指定位置的元素.

> (set 'str "thinking newLISP !")
"thinking newLISP !"
> (setf (str 0) "I t")
"I T"
> str
"I Thinking newLISP !"

2: 改變字符串的子串
    Changing substrings

    不少時候, 你沒法確切的知道, 須要操做的字符的索引, 或者找出來的代價太大.\
    這時候就能夠用replace 替換全部符合本身要求的字符串部分...

> (help replace)
syntax: (replace <str-key> <str-data> <exp-replacement>)
syntax: (replace <str-pattern> <str-data> <exp-replacement> <int-regex-option>)

(replace old-string source-string replacement)
So:
(set't "a hypothetical one-dimensional subatomic particle")
(replace "hypoth" t "theor") ;將字符串中全部的hypoth替換成theor
;-> "a theoretical one-dimensional subatomic particle"

replace 是破壞性函數, 若是你不想改變原來的字符串, 可使用copy 或者 string :

(set't "a hypothetical one-dimensional subatomic particle")
(replace "hypoth" (string t) "theor")
;-> "a theoretical one-dimensional subatomic particle"
t
;-> "a hypothetical one-dimensional subatomic particle"

3: 使用正則表達式替換字符串內容
    Regular expressions

    若是你翻閱過手冊, 會發現不少語法裏都會加上一個可選參數, <int-regex-option>
. 這個參數就是正則表達式數字選項. 具體的數字意義, 能夠在手冊中搜索 PCRE name .
最經常使用的是0 (大小寫不敏感) 好 1 (大小寫敏感).

    nL使用的是Perl-compatible Regular Expressions (PCRE), Perl兼容的正則表達
式. 除了replace 外, directory, find, find-all, parse, search starts-with,
ends-with, 都接受正則表達式.

(set 't "a hypothetical one-dimensional subatomic particle")
(replace {h.*?l(?# h followed by l but not too greedy)} t {} 0)
;-> "a one-dimensional subatomic particle"

    在構建正則表達式的時候, 你能夠選用雙引號, 或者花括號, 二者的區別以前已經講
過了. 我的仍是推薦花括號...

(set'str "\s")
(replace str "this is a phrase" "|" 0) ; 並無搜索替換 \s (空白符)
;-> thi| i| a phra|e ; 只替換了字符 s

(set'str "\\s")
(replace str "this is a phrase" "|" 0)
;-> this|is|a|phrase ; 成功替換!

(set'str {\s})
(replace str "this is a phrase" "|" 0)
;-> this|is|a|phrase ; better!

六: 系統變量: $0, $1 ...
    System variables: $0, $1 ...

    凡是使用 regex 的函數, 都會將匹配的結果綁定到系統變量: $0 $1 ... $15 , 可
以直接使用他們, 也可使用$ 函數來引用他們.
    若是你是正則表達式初學者, 建議搜索pcre 教程. 下面的代碼看的迷糊的不用建議.
還有手冊, 還有code-pattern, 再不濟還有"狗狗" , 通往nL的路不止一條.
    個人觀點一貫是夠用就好, 因此若是看的不太懂, 能夠跳下去. 等你用多了, 天然就
會了. 業精於勤荒於嬉.

- (set 'quotation {"I cannot explain." She spoke in a low, eager voice,
with a curious lisp in her utterance. "But for God's sake do what I ask you. Go
back
and never set foot upon the moor again."})

- (replace {(.*?),.*?curious\s*(l.*p\W)(.*?)(moor)(.*)}
quotation
(println { $1 } $1 { $2 } $2 { $3 } $3 { $4 } $4 { $5 } $5)
4) ;出於格式的問題上面的字符串多了\n換行, 因此我用4 設置了 PCRE_DOTALL
   ;這樣 . 也表明了換行符

$1 "I cannot explain." She spoke in a low $2 lisp $3 in her utterance. "But f
r God's sake do what I ask you. Go
back
and never set foot upon the $4 moor $5 again."

    上面每個小括號內的匹配值, 都被綁定到了系統變量, 從$1 到$5 , 而$0 表明符
合整個正則表達式的字符串部分. 拗口吧, 蛋疼的看代碼去.

(set 'str "http://newlisp.org:80")
(find "http://(.*):(.*)" str 0) → 0

$0 → "http://newlisp.org:80"
$1 → "newlisp.org"
$2 → "80"

1. 替換部分的表達式
    The replacement expression

> (help replace)
syntax: (replace <str-key> <str-data> <exp-replacement>)
syntax: (replace <str-pattern> <str-data> <exp-replacement> <int-regex-option>)

    <exp-replacement>就是替換部分, 你找到的任何符合要求的數據, 均可以用這裏的
表達式值, 替換. 整個表達式沒有限制, 設置是能夠沒意義的操做.

(set 't "a hypothetical one-dimensional subatomic particle")
(replace {t[h]|t[aeiou]} t (println $0) 0)
th
ti
to
ti
t
;-> "a hypothetical one-dimensional subatomic particle"

    整個replace 表達式的目的是, 將字符串裏, 以t開頭, h或者任何元音字母結尾的字
符打印出來. <exp-replacement> 就是 (println $0) , 他完成了兩個工做, 1. 打印出
匹配的單詞, 也有人叫這"反作用". 第二個利用表達式的返回值$0 , 替換遠字符串中匹
配的值, 而這兩個值是同樣的, 因此原字符串內容看起來沒有任何改變.

(replace "a|e|c" "This is a sentence" (upper-case $0) 0)
;-> "This is A sEntEnCE"

    下面的代碼使用了更復雜的<exp-replacement>.

(set 't "a hypothetical one-dimensional subatomic particle")
(set 'counter 0)
- (replace "o" t
- (begin
(inc 'counter)
(println {replacing "} $0 {" number } counter)
(string counter)) ; 替換的部分必須是字符串. 這個值是<exp-replacement>的返回值
0)
replacing "o" number 1
replacing "o" number 2
replacing "o" number 3
replacing "o" number 4
"a hyp1thetical 2ne-dimensi3nal subat4mic particle"

    begin 將多個表達式組裝成一個表達式, 依次執行, 最後一個表達式, 做爲這個表達
式組的返回值.
    下面讓咱們看一個replace 的實際應用.
    假設有一個文本文件, "zhuzhu.txt"裏面的內容以下:

1 a = 15
2 another_variable = "strings"
4 x2 = "another string"
5 c = 25
3x=9

    如今咱們想將他改爲以下形式, 讓他看起來漂亮點.

10 a                   = 15
20 another_variable    = "strings"
30 x2                  = "another string"
40 c                   = 25
50 x                   = 9

    將下面的代碼保持成ft.lsp . 而後執行 newlisp ft.lsp zhuzhu.txt

(set 'file (open ((main-args) 2) "read"))
;(set 'file (open "ni.txt" "read"))
(set 'counter 0)
- (while (read-line file)
-     (set 'temp
-         (replace {^(\d*)(\s*)(.*)} ; 改變開始的數字
            (current-line)
            (string (inc 'counter 10) " " $3 )
            0))
- (println
-     (replace {(\S*)(\s*)(=)(\s*)(.*)} ; 找出有用的數據
        temp
        (string $1 (dup " " (- 20 (length $1))) $3 " " $5)
    0)))

    while 循環不斷的將文件的每一行讀入, 而後(current-line) 獲取當前讀入的行.
第一個replace 組裝開始的數字, {^(\d*)(\s*)(.*)} 將源字符串分離成, 開始的數字,
接着的空白符, 和最後的內容. 接着用 (string (inc 'counter 10) " " $3 ) 將前兩部
分剔除, 剩下第三部分和 counter 值組成新的字符串. counter 每處理一行, 就加 10 .
替換後的字符串賦值給臨時變量temp.
    第二個replace , 將臨時變量分離成4個部分 {(\S*)(\s*)(=)(\s*)(.*)}.
    \S 表明了除 \s 之外的任何字符.
    從中提取出$1 $3 $5 , 組成新的字符串,
    (string $1 (dup " " (- 20 (length $1))) $3 " " $5)
    爲了對齊, 咱們將$1 和 $3 (也就是等號) , 之間的距離規定成20 , 若是$1 短於
20個字節則dup 出多餘空格來補充.

    Regular expressions aren't very easy for the newcomer,
    but they're very powerful, particularly
    with newLISP's replace function, so they're worth learning.

    正則表達式也許對於初學者來講比較困難, 可是很是強大, 特別是配合上各類
newLISP函數後, 能夠大大的提升效率. 平時仍是該多練習下.

七. 測試和比較字符串
    Testing and comparing strings

    有各類各樣的測試函數能夠用到字符串上. 這些比較操做符會依序相互比較字符串的
每個部分.

(> {Higgs Boson} {Higgs boson}) ; nil ;B 比 b 小
(> {Higgs Boson} {Higgs}) ; true
(< {dollar} {euro}) ; true
(> {newLISP} {LISP}) ; true
(= {fred} {Fred}) ; nil ; f 和 F 不同
(= {fred} {fred}) ; true

    從第一個字符開始比較, 直到得出結果.
    比較多個字符串也不是問題. 介於newLISP 優秀的參數處理能力, 你不用再直接寫迭
代了.

(< "a" "c" "d" "f" "h")
;-> true

    若是隻提供一個參數呢?
    nL會爲你提供默認值. 若是提供的是數字, 則假設和0 比較, 若是是字符串, 則假設
和"" 空字符串比較...

(> 1) ; true - assumes > 0
(> "fred") ; true - assumes > ""

    下面的函數能夠很是方便的分析和提取字符串中的指定內容:
    member , regex , find-all , starts-with , ends-with .

(starts-with "newLISP" "new")
;-> true
(ends-with "newLISP" "LISP")
;-> true

    他們也可使用正則表達式參數. (一般使用 0 和 1)

(starts-with {newLISP} {[a-z][aeiou](?\#lc followed by lc vowel)} 0)
;-> true
(ends-with {newLISP} {[aeiou][A-Z](?\# lc vowel followed by UCase)} 0)
;-> false

    0 表明了PCRE 裏的, 大小寫敏感, 1 則是不敏感.
    find , find-all , member , 和 regex 查找整個字符串.
    find 返回, 第一個符合要求的元素的位置.

(set 't "a hypothetical one-dimensional subatomic particle")
(find "atom" t)
;-> 34
(find "l" t)
;-> 13
(find "L" t)
;-> nil ; 大小寫敏感

    member 判斷一個字符串是不是另外一個字符串的一部分, 若是是, 則返回子串, 以及
以後的全部字符.

(member "rest" "a good restaurant")
;-> "restaurant"

    find 和 member 均可以使用正則表達式選項.

- (set 'quotation {"I cannot explain." She spoke in a low,
eager voice, with a curious lisp in her utterance. "But for
Gods sake do what I ask you. Go back and never set foot upon
the moor again."})

(find "lisp" quotation) ; 沒有正則
;-> 69 ; 位於第 69 位 , 即 l 的位置

(find {i} quotation 0) ; with regex
;-> 15 ; 位於第 15 位

(find {s} quotation 1) ; 大小寫不敏感
;-> 20 ; 位於第 20 位

- (println "character "
(find {(l.*?p)} quotation 0) ": " $0) ; 查找一個字符l 後跟着字符p 的子串
;-> character 13: lain." She sp

    再次提醒, 在console 命令行下, 輸入多行語句的時候, 先輸入一個回城, 而後才能
把語句全粘貼上去, 或者在多行語句的首尾兩行, 分別單獨的寫上[cmd]和[/cmd].

    find-all 的工做方式相似 find , 不過他不只僅是返回第一個匹配子串, 而是以列
表的形式, 返回全部的匹配子串. 他操做字符串的時候默認使用正則表達式. 因此能夠不
用顯示的標註, 正則選項.

> (help find-all)
syntax: (find-all <str-regex-pattern> <str-text> [<exp> [<int-regex-option>]])

- (set 'quotation {"I cannot explain." She spoke in a low,
eager voice, with a curious lisp in her utterance. "But for
Gods sake do what I ask you. Go back and never set foot upon
the moor again."})

(find-all "[aeiou]{2,}" quotation $0) ; 兩個或者更多的原音字母組成的子串
;-> ("ai" "ea" "oi" "iou" "ou" "oo" "oo" "ai")

    find-all 返回的是, 符合要求的內容. 若是還想獲得他們的位置和長度, 就要使用
regex .
    regex 返回符合要求的每一個子串的內容, 開始位置, 以及長度. 第一次看, 會以爲
稍顯複雜.

- (set 'quotation
{She spoke in a low, eager voice, with a curious lisp in her utterance.})

(println (regex {(.*)(l.*)(l.*p)(.*)} quotation 0))
;-->
- ("She spoke in a low, eager voice, with a curious lisp in
her utterance." 0 70 "She spoke in a " 0 15 "low, eager
voice, with a curious " 15 33 "lisp" 48 4 " in her
utterance." 52 18)

    首先返回的就是符合整個正則表達式要求的字符串. 也是最長的, 從 0 開始長達
70 字節. 而後就是第一個第一個括號內匹配的內容, 從位置 0 開始 , 長 15 個字節.
第二個括號(分組)內的數據, 從第 15 位開始, 長 33 字節....

    這些匹配的分組全被放到系統變量裏.

- (for (x 1 4)
(println {$} x ": " ($ x)))
$1: She spoke in a
$2: low, eager voice, with a curious
$3: lisp
$4: in her utterance.

八. 字符串轉換成列表
    Strings to lists

    先讓咱們看看 "聞名遐邇" 的explode , 他能夠將字符串按指定的大小炸成一段段
的子串, 而後以列表的形式返回全部子串.

(set 't "a hypothetical one-dimensional subatomic particle")
(explode t)

- :-> ("a" " " "h" "y" "p" "o" "t" "h" "e" "t" "i" "c" "a" "l"
" " "o" "n" "e" "-" "d" "i" "m" "e" "n" "s" "i" "o" "n" "a"
"l" " " "s" "u" "b" "a" "t" "o" "m" "i" "c" " " "p" "a" "r"
"t" "i" "c" "l" "e")

> (help explode)
syntax: (explode <str> [<int-chunk> [<bool>]])
syntax: (explode <list> [<int-chunk> [<bool>]])

(explode (replace " " t "") 5)
;-> ("ahypo" "theti" "calon" "e-dim" "ensio" "nalsu" "batom" "icpar"
"ticle")

    int-chunk 就是分塊的大小, bool 決定是否要拋棄最後不滿int-chunk 長度的子串.
    你有開天斧, 我有補天石.
    join 和 explode 作的恰好相反, 將一個全是字符串元素的列表組裝成一個新的字符
串.

>(help join)
syntax: (join list-of-strings [str-joint [bool-trail-joint]])

set 'lst '("this" "is" "a" "sentence"))

(join lst " ") → "this is a sentence"

(join (map string (slice (now) 0 3)) "-") → "2012-5-16" ;將數字中

(join (explode "keep it together")) → "keep it together"

(join '("A" "B" "C") "-")         → "A-B-C"
(join '("A" "B" "C") "-" true)    → "A-B-C-"

    find-all 也能夠分割字符串.

(find-all ".{3}" t) ; 默認使用正則表達式
characters
;-> ("a h" "ypo" "the" "tic" "al " "one" "-di" "men" "sio"
"nal" " su" "bat" "omi" "c p" "art" "icl")

九. 分析字符串
    Parsing strings

    接下來這個函數絕對會讓你"聲淚俱下".
    若是你須要常常頻繁的處理大範圍的文本數據的時候. parse 絕對是你的至寶.
    他讓你的數據統計分析, 再也不痛苦. (nL內部還有不少專業的統計學函數)

> (help parse)
syntax: (parse <str-data> [<str-break> [<int-option>]])

    parse 根據<str-break> 來分割字符串. 字符串中的 <str-break> 會被吃掉. 剩下
判斷, 做爲一個個子串組成列表返回.

(parse t) ; 默認的分隔符爲空格...
;-> ("a" "hypothetical" "one-dimensional" "subatomic" "particle")

    <str-break> 能夠是單個的分割符 , 也能夠是字符串.

(set 'pathname {/System/Library/Fonts/Courier.dfont})
(parse pathname {/})
;-> ("" "System" "Library" "Fonts" "Courier.dfont")

(set 't {spamspamspamspamspamspamspamspam})
;-> "spamspamspamspamspamspamspamspam"
(parse t {am}) ; break on "am"
;-> ("sp" "sp" "sp" "sp" "sp" "sp" "sp" "sp" "")

    咱們能夠用filter 將結果列表中的, 空格字符串, 過濾掉.

(filter (fn (s) (not (empty? s))) (parse t {/}))
;-> ("System" "Library" "Fonts" "Courier.dfont")

    過濾HTML-tag:

(set 'html (read-file "/Users/Sites/index.html"))
(println (parse html {<.*?>} 4)) ; option 4: dot matches newline

    nL同時提供了專門的XML分析工具: xml-parse . 後面會有專門一整章介紹.

    在咱們沒有明確指定的 <str-break> 的時候, nL 使用內部的分析規則. 這時候的算
法和指定後的算法也不同.

    When no str-break is given, parse tokenizes according to newLISP's
    internal parsing rules.

- (set 't {Eats, shoots, and leaves ; a book by Lynn Truss})
(parse t)
;-> ("Eats" "," "shoots" "," "and" "leaves") ; she's gone!

    由於沒有指定界定符, 因此 ";" 以後的內容都被斷定成了註釋.
    若是要讓parse 按你的規則分離數據, 就必須提供明確的界定符或者正則表達式.

- (set 't {Eats, shoots, and leaves ; a book by Lynn Truss})
(parse t " ")
;-> ("Eats," "shoots," "and" "leaves" ";" "a" "book" "by" "Lynn" "Truss")

    或者

(parse t "\\s" 0) ; {\s} 是空白字符
;-> ("Eats," "shoots," "and" "leaves" ";" "a" "book" "by" "Lynn" "Truss")

    另外一種分割字符串的方法就是使用 find-all .

(set 'a "1212374192387562311")
(println (find-all {\d{3}|\d{2}$|\d$} a))
;-> ("121" "237" "419" "238" "756" "231" "1")

; 二選一

(explode a 3)
;-> ("121" "237" "419" "238" "756" "231" "1")

    parse 會界定符吃掉, 而 find-all 則是留下來.

(find-all {\w+} t ) ; 匹配一個英文字母、數字或下劃線；等價於[0-9a-zA-Z_]
;-> ("Eats" "shoots" "and" "leaves" "a" "book" "by" "Lynn" "Truss")

(parse t {\w+} 0 ) ; 吃掉界定符
;-> ("" ", " ", " " " " ; " " " " " " " " " "")

(parse t {[^\w]+} 0 )
;->("Eats" "shoots" "and" "leaves" "a" "book" "by" "Lynn" "Truss")

(append '("") (find-all {[^\w]+} t ) '(""))
;-> ("" ", " ", " " " " ; " " " " " " " " " "")

十. 其餘的字符串函數
    Other string functions

    search 在文件中搜索符合要求的字符串. 並返回第一個符合要求的字符串的位置,
而後將文件指針移到字符串頭的位置(默認狀況下), 當 <bool-flag> 爲 true 值時 , 則
移字符串末尾. 下次search 的時候, 從當前文件指針的位置繼續開始.

> (help search )
syntax: (search <int-file> <str-search> [<bool-flag> [<int-options>]])

(set 'f (open {/private/var/log/system.log} {read}))
(search f {kernel})
(seek f (- (seek f) 64)) ; rewind file pointer
- (dotimes (n 3)
(println (read-line f)))
(close f)

    上面的代碼從系統日誌中搜索包含 kernel 的字符串, 而後從找到的位置回溯 64
個字節, 讀取一行日誌, 並打印出來.
    更多的字符串相關函數, 能夠在手冊中搜索 String and conversion functions .

十一. 格式化字符串
      String and conversion functions

    和其餘的語言同樣, nL也提供了優雅的字符串輸出更能 (format 函數).
    假設咱們須要打印以下的內容:

folder: Library
file: mach

    咱們須要使用以下的字符串模板:

"folder: %s" ; or
" file: %s"

    提供給 format 一個文字模板, 以後依序接上全部模板中須要的參數.

(format "folder: %s" f) ; or
(format " file: %s" f)

> (help format)
syntax: (format <str-format> [<exp-data-1> <exp-data-2> ... ])
syntax: (format <str-format> <list-data>)

    <str-format> 就是字符串模板, 只有一個. 其後的參數都是編碼中相對應的數據.
以 (format " file: %s" f) 爲例, 這裏提供的 f 是字符串, 前面模板裏就必須放一個
%s , 若是提供的 f 是數字, 前面的模板就必須放一個 %d . 目前支持 11 種數據類型.

format description
s    text string
c    character (value 1 - 255)
d    decimal (32-bit)
u    unsigned decimal (32-bit)
x    hexadecimal lowercase
X    hexadecimal uppercase
o    octal (32-bits) (not supported on all compilers)
f    floating point
e    scientific floating point
E    scientific floating point
g    general floating point

    相似必須匹配, 不然會報錯. %至關於轉義字符, 他的位置表明了後面的數據在字符
串中的位置.

(set 'f "OneLisp")
(format "folder: %s" f)
;-->"folder: OneLisp"

(format "%s folder: " f)
"OneLisp folder: "

(format "%d" "abc")
;-->ERR: data type and format don't match in function format : "abc"

    下面的代碼使用 directory 函數打印出當前目錄下全部的文件和目錄.

- (dolist (f (directory))
-     (if (directory? f)
        (println (format "folder: %s" f))
        (println (format " file: %s" f))))

;輸出

folder: .
folder: ..
folder: api
file: cd.dll
file: cmd-lisp.bat
folder: code
file: CodePatterns-cn.html
file: CodePatterns-CN.html.bak
file: CodePatterns.html
file: COPYING
file: demo-stdin.lsp
file: drag.bat
folder: examples
file: freetype6.dll
file: gs.bat
folder: guiserver
file: guiserver-keyword.txt
...

    format 裏的字符串模板還能夠就行更精細的輸出控制.

"%w.pf"

    f 就是以前介紹的數據類型標誌, 必選.
    w 是這個數據輸出時, 佔用的寬度.
    p 是這個數據輸出時, 的精度.
    w以前能夠跟, 負號(右對齊), 正號(左對齊), 0 (空位用0填滿) , 默認是右對齊.
    填 0 只在右對齊的時候有用.

>(format "Result = %05d" 2)
"Result = 00002"

> (format "Result = %+05d" 2)
"Result = +0002"
> (format "Result = %+05d" -2)
"Result = -0002"
> (format "Result = %-05d" -2)
"Result = -2   "
> (format "Result = %05d" -2)
"Result = -0002"

    下面來個複雜點的例子. 打印位於 32 - 400 內的全部字符, 並輸出他們的十進制,
十六進制, 和二進制內容.
    由於format 沒法輸出二進制數據, 因此專門寫了個二進制轉換函數. 如今有個現成
的bits 能夠轉換 2 進制了.

- (define (binary x , results)
-   (until (<= x 0)
    (push (string (% x 2)) results) ;使用 % 求餘, 表明每一位的二進制數
    (set 'x (/ x 2))) ; 從新設置 x
  results)

- (for (x 32 0x01a0)
-   (println (char x) ; 先用char將數字轉換成字符
-     (format "%4d\t%4x\t%10s" ; 十進制 \t 十六進制 \t 二進制字符串
            (list x x (join (binary x))))))

x 120     78       1111000
y 121     79       1111001
z 122     7a       1111010
{ 123     7b       1111011
| 124     7c       1111100
} 125     7d       1111101
~ 126     7e       1111110

十二. 讓newLISP思考
      Strings that make newLISP think

    爲何用這個標題, 嘿嘿, 最後有個很好玩的例子. 你甚至能夠寫個, 代碼混亂生成
器, 看看你會獲得些什麼.

    本章最後介紹的兩個函數: eval , eval-string .
    這兩個函數專門負責執行nL代碼.
    只要你提供的代碼能經過檢測, 他們就會返回給你結果.

    eval 接受表達式:

(set 'expr (+ 1 2))
(eval expr)
;-> 3

    eval-string 只接受字符串:

(set 'expr "(+ 1 2)")
(eval-string expr)
;-> 3

    使用這兩個函數你能夠執行任何的nL代碼. 在咱們默認執行的各類表達式中, 都隱含
了他們的身影. 他們被默認的執行着, 而你必定不能忘記他們曾經來過, 不然你極可能成
爲一團漿糊. 當你對 symbol , 對宏對各類表達式的本質和他們的計算迷惑的時候, 回
來從新看看這句話, 你會豁然開朗.
    eval 爲何重要, 由於他表明了自主選擇, 你能夠在任何須要的時間 , 須要的地點
執行須要的代碼. 特別是在操做宏的時候, 你的感覺會更深.

    下面是段很是有趣的代碼, 他能夠不斷的重組列表, 而後調用 eval-string 執行他
們, 直到某個表達式獲得執行後, 才結束.

(set 'code '(")" "set" "'valid" "true" "("))
(set 'valid nil)
- (until valid
    (set 'code (randomize code)) ; 使用radomize 打亂 code 序列
    (println (join code " "))
    (eval-string (join code " ") MAIN nil))

;輸出

) true 'valid ( set
'valid ) ( set true
true set ( 'valid )
'valid true ( set )
'valid ( true set )
) true ( set 'valid
) ( set 'valid true
'valid ) set true (
...
true set ) ( 'valid
true ( 'valid ) set
true 'valid ( set )
true ) 'valid ( set
( set 'valid true )
true

到目前爲止newLISP的基礎, 基本上算是介紹的差很少了, 接下來介紹的會比較深刻點.
context 和宏 .
不過在nL裏這些不管是看起來仍是用起來, 仍是原理上都很是簡潔明瞭.
Good Luck !!!

彩色版本到http://code.google.com/p/newlisp-you-can-do下載使用scite4newlisp觀看

2012-05-14 - 2012-05-17 15:10:29

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。