lilac-parser 是我用 ClojureScript 實現的一個庫, 能夠作一些正則的功能.
看名字, 這個庫設計的時候更可能是一個 parser 的思路,
從使用來講, 當作一個正則也是比較順的. 雖然不如正則簡短明瞭.
正則的缺點主要是基於字符串形態編寫, 須要轉義, 規則長了就很差維護了.
而 lilac-parser 的方式, 就挺容易進行組合的, 我這邊舉一些例子git
首先是 is+
這個規則, 進行精確匹配,github
(parse-lilac "x" (is+ "x")) ; {:ok? true, :rest nil} (parse-lilac "xyz" (is+ "xyz")) ; {:ok? true, :rest nil} (parse-lilac "xy" (is+ "x")) ; {:ok? false} (parse-lilac "xy" (is+ "x")) ; {:ok? true, :rest ("y")} (parse-lilac "y" (is+ "x")) ; {:ok? false}
能夠看到, 頭部匹配上的表達式, 都返回了 true.
後邊是否還有其餘內容, 須要經過 :rest
字段再去單獨判斷了.數組
固然精確匹配比較簡單, 而後是選擇匹配,mvc
(parse-lilac "x" (one-of+ "xyz")) ; {:ok? true} (parse-lilac "y" (one-of+ "xyz")) ; {:ok? true} (parse-lilac "z" (one-of+ "xyz")) ; {:ok? true} (parse-lilac "w" (one-of+ "xyz")) ; {:ok? false} (parse-lilac "xy" (one-of+ "xyz")) ; {:ok? true, :rest ("y")}
反過來, 能夠有排除的規則,ide
(parse-lilac "x" (other-than+ "abc")) ; {:ok? true, :rest nil} (parse-lilac "xy" (other-than+ "abc")) ; {:ok? true, :rest ("y")} (parse-lilac "a" (other-than+ "abc")) ; {:ok? false}
在此基礎上, 增長一些邏輯, 表示判斷的規則能夠不存在,
固然容許不存在的話, 任什麼時候候均可以退回到 true 的結果的,性能
(parse-lilac "x" (optional+ (is+ "x"))) ; {:ok? true, :rest nil} (parse-lilac "" (optional+ (is+ "x"))) ; {:ok? true, :rest nil} (parse-lilac "x" (optional+ (is+ "y"))) ; {:ok? true, :rest("x")}
也能夠設定規則, 判斷多個, 也就是大於 1 個(目前不能控制具體個數),spa
(parse-lilac "x" (many+ (is+ "x"))) (parse-lilac "xx" (many+ (is+ "x"))) (parse-lilac "xxx" (many+ (is+ "x"))) (parse-lilac "xxxy" (many+ (is+ "x")))
若是容許 0 個的狀況, 就不是 many 了, 而是 some 的規則,設計
(parse-lilac "" (some+ (is+ "x"))) (parse-lilac "x" (some+ (is+ "x"))) (parse-lilac "xx" (some+ (is+ "x"))) (parse-lilac "xxy" (some+ (is+ "x"))) (parse-lilac "y" (some+ (is+ "x")))
相應的, or 的規則能夠寫出來,rest
(parse-lilac "x" (or+ [(is+ "x") (is+ "y")])) (parse-lilac "y" (or+ [(is+ "x") (is+ "y")])) (parse-lilac "z" (or+ [(is+ "x") (is+ "y")]))
而 combine 是用來順序組合多個規則的,code
(parse-lilac "xy" (combine+ [(is+ "x") (is+ "y")])) ; {:ok? true, :rest nil} (parse-lilac "xyz" (combine+ [(is+ "x") (is+ "y")])) ; {:ok? true, :rest ("z")} (parse-lilac "xy" (combine+ [(is+ "y") (is+ "x")])) ; {:ok? flase}
而 interleave 是表示兩個規則, 而後相互間隔重複,
這種場景不少都是逗號間隔的表達式的處理當中用到,
(parse-lilac "xy" (interleave+ (is+ "x") (is+ "y"))) (parse-lilac "xyx" (interleave+ (is+ "x") (is+ "y"))) (parse-lilac "xyxy" (interleave+ (is+ "x") (is+ "y"))) (parse-lilac "yxy" (interleave+ (is+ "x") (is+ "y")))
另外當前的代碼還提供了幾個內置的規則, 用來判斷字母, 數字, 中文的狀況,
(parse-lilac "a" lilac-alphabet) (parse-lilac "A" lilac-alphabet) (parse-lilac "." lilac-alphabet) ; {:ok? false} (parse-lilac "1" lilac-digit) (parse-lilac "a" lilac-digit) ; {:ok? false} (parse-lilac "漢" lilac-chinese-char) (parse-lilac "E" lilac-chinese-char) ; {:ok? false} (parse-lilac "," lilac-chinese-char) ; {:ok? false} (parse-lilac "," lilac-chinese-char) ; {:ok? false}
具體某些特殊的字符的話, 暫時只能經過 unicode 範圍來指定了.
(parse-lilac "a" (unicode-range+ 97 122)) (parse-lilac "z" (unicode-range+ 97 122)) (parse-lilac "A" (unicode-range+ 97 122))
有了這些規則, 就能夠組合來模擬正則的功能了, 好比查找匹配項有多少,
(find-lilac "write cumulo and respo" (or+ [(is+ "cumulo") (is+ "respo")])) ; find 2 (find-lilac "write cumulo and phlox" (or+ [(is+ "cumulo") (is+ "respo")])) ; find 1 (find-lilac "write cumulo and phlox" (or+ [(is+ "cirru") (is+ "respo")])) ; find 0
或者直接進行字符串替換, 這就跟正則差很少了.
(replace-lilac "cumulo project" (or+ [(is+ "cumulo") (is+ "respo")]) (fn [x] "my")) ; "my project" (replace-lilac "respo project" (or+ [(is+ "cumulo") (is+ "respo")]) (fn [x] "my")) ; "my project" (replace-lilac "phlox project" (or+ [(is+ "cumulo") (is+ "respo")]) (fn [x] "my")) ; "phlox project"
能夠看到, 這個寫法就是組合出來的, 寫起來比正則長, 可是能夠定義變量, 作一些抽象.
簡單的例子可能看不出這樣作有什麼用, 可能就是以爲搞得反而更長了, 並且性能更差.
個人項目當中有個簡單的 JSON 解析的例子, 這個用正則就搞不定了吧...
直接搬運代碼以下:
; 判斷 true false 兩種狀況, 返回的是 boolean (def boolean-parser (label+ "boolean" (or+ [(is+ "true") (is+ "false")] (fn [x] (if (= x "true") true false))))) (def space-parser (label+ "space" (some+ (is+ " ") (fn [x] nil)))) ; 組合一個包含空白和逗號的解析器, label 只是註釋, 能夠忽略 (def comma-parser (label+ "comma" (combine+ [space-parser (is+ ",") space-parser] (fn [x] nil)))) (def digits-parser (many+ (one-of+ "0123456789") (fn [xs] (string/join "" xs)))) ; 爲了簡單, null 和 undefined 直接返回 nil 了 (def nil-parser (label+ "nil" (or+ [(is+ "null") (is+ "undefined")] (fn [x] nil)))) ; number 的狀況, 須要考慮前面可能有負號, 後面可能有小數點 ; 這邊偷懶沒考慮科學記數法了... (def number-parser (label+ "number" (combine+ ; 負號.. 可選的 [(optional+ (is+ "-")) digits-parser ; 組合出來小數部分, 這也是可選的 (optional+ (combine+ [(is+ ".") digits-parser] (fn [xs] (string/join "" xs))))] (fn [xs] (js/Number (string/join "" xs)))))) (def string-parser (label+ "string" (combine+ ; 字符串的解析, 引號開頭引號結尾 [(is+ "\"") ; 中間是非引號的字符串, 或者轉義符號的狀況 (some+ (or+ [(other-than+ "\"\\") (is+ "\\\"") (is+ "\\\\") (is+ "\\n")])) (is+ "\"")] (fn [xs] (string/join "" (nth xs 1)))))) (defparser value-parser+ () identity (or+ [number-parser string-parser nil-parser boolean-parser (array-parser+) (object-parser+)])) (defparser object-parser+ () identity (combine+ [(is+ "{") (optional+ ; 對象就比較複雜了, 主要看 interleave 部分吧, 外邊只是花括號的處理 (interleave+ (combine+ [string-parser space-parser (is+ ":") space-parser (value-parser+)] (fn [xs] [(nth xs 0) (nth xs 4)])) comma-parser (fn [xs] (take-nth 2 xs)))) (is+ "}")] (fn [xs] (into {} (nth xs 1))))) (defparser array-parser+ () (fn [x] (vec (first (nth x 1)))) (combine+ [(is+ "[") ; 數組, 一樣是 interleave 的狀況 (some+ (interleave+ (value-parser+) comma-parser (fn [xs] (take-nth 2 xs)))) (is+ "]")]))
能夠看到, 經過 lilac-parser 構造規則的當時, 比較容易就生成了一個 JSON Parser.雖然支持的規則比較簡單, 並且性能不大理想, 可是比起正則來講, 這個代碼可讀不少了.相信能夠做爲一種思路, 用在不少文本處理的場景當中.爲了也許能夠提供簡化一些的版本, 在 JavaScript 直接使用, 代替正則.