廢話很少說,rulerz的官方地址是:https://github.com/K-Phoen/ru...php
注意,本例中只拿普通數組作例子進行分析
RulerZ是一個用php實現的composer依賴包,目的是實現一個數據過濾規則引擎。RulerZ不只支持數組過濾,也支持一些市面上常見的ORM,如Eloquent、Doctrine等,也支持Solr搜索引擎。
這是一個缺乏中文官方文檔的開源包,固然因爲star數比較少,可能做者也以爲不必。node
在你的項目composer.json所在目錄下運行:git
composer require 'kphoen/rulerz'
現有數組以下:github
$players = [ ['pseudo' => 'Joe', 'fullname' => 'Joe la frite', 'gender' => 'M', 'points' => 2500], ['pseudo' => 'Moe', 'fullname' => 'Moe, from the bar!', 'gender' => 'M', 'points' => 1230], ['pseudo' => 'Alice', 'fullname' => 'Alice, from... you know.', 'gender' => 'F', 'points' => 9001], ];
初始化引擎:正則表達式
use RulerZ\Compiler\Compiler; use RulerZ\Target; use RulerZ\RulerZ; // compiler $compiler = Compiler::create(); // RulerZ engine $rulerz = new RulerZ( $compiler, [ new Target\Native\Native([ // 請注意,這裏是添加目標編譯器,處理數組類型的數據源時對應的是Native 'length' => 'strlen' ]), ] );
建立一條規則:算法
$rule = "gender = :gender and points > :min_points'
將參數和規則交給引擎分析。express
$parameters = [ 'min_points' => 30, 'gender' => 'F', ]; $result = iterator_to_array( $rulerz->filter($players, $rule, $parameters) // the parameters can be omitted if empty ); // result 是一個過濾後的數組 array:1 [▼ 0 => array:4 [▼ "pseudo" => "Alice" "fullname" => "Alice, from... you know." "gender" => "F" "points" => 9001 ] ]
$rulerz->satisfies($player, $rule, $parameters); // 返回布爾值,true表示知足
下面,讓咱們看看從建立編譯器開始,到最後出結果的過程當中發生了什麼。
1.Compiler::create();
這一步是實例化一個FileEvaluator類,這個類默認會將本地的系統臨時目錄當作下一步臨時類文件讀寫所在目錄,文件類裏包含一個has()方法和一個write()方法。文件類以下:json
<?php declare(strict_types=1); namespace RulerZ\Compiler; class NativeFilesystem implements Filesystem { public function has(string $filePath): bool { return file_exists($filePath); } public function write(string $filePath, string $content): void { file_put_contents($filePath, $content, LOCK_EX); } }
2.初始化RulerZ引擎,new RulerZ()
先看一下RulerZ的構建方法:數組
public function __construct(Compiler $compiler, array $compilationTargets = []) { $this->compiler = $compiler; foreach ($compilationTargets as $targetCompiler) { $this->registerCompilationTarget($targetCompiler); } }
這裏的第一個參數,就是剛剛的編譯器類,第二個是目標編譯器類(實際處理數據源的),由於咱們選擇的是數組,因此這裏的目標編譯器是Native
,引擎會將這個目標編譯類放到本身的屬性$compilationTargets
。app
public function registerCompilationTarget(CompilationTarget $compilationTarget): void { $this->compilationTargets[] = $compilationTarget; }
3.運用filter或satisfies方法
這一點即是核心了。
以filter爲例:
public function filter($target, string $rule, array $parameters = [], array $executionContext = []) { $targetCompiler = $this->findTargetCompiler($target, CompilationTarget::MODE_FILTER); $compilationContext = $targetCompiler->createCompilationContext($target); $executor = $this->compiler->compile($rule, $targetCompiler, $compilationContext); return $executor->filter($target, $parameters, $targetCompiler->getOperators()->getOperators(), new ExecutionContext($executionContext)); }
第一步會檢查目標編譯器是否支持篩選模式。
第二步建立編譯上下文,這個通常統一是Context類實例
public function createCompilationContext($target): Context { return new Context(); }
第三步,執行compiler的compile()方法
public function compile(string $rule, CompilationTarget $target, Context $context): Executor { $context['rule_identifier'] = $this->getRuleIdentifier($target, $context, $rule); $context['executor_classname'] = 'Executor_'.$context['rule_identifier']; $context['executor_fqcn'] = '\RulerZ\Compiled\Executor\\Executor_'.$context['rule_identifier']; if (!class_exists($context['executor_fqcn'], false)) { $compiler = function () use ($rule, $target, $context) { return $this->compileToSource($rule, $target, $context); }; $this->evaluator->evaluate($context['rule_identifier'], $compiler); } return new $context['executor_fqcn'](); } protected function getRuleIdentifier(CompilationTarget $compilationTarget, Context $context, string $rule): string { return hash('crc32b', get_class($compilationTarget).$rule.$compilationTarget->getRuleIdentifierHint($rule, $context)); } protected function compileToSource(string $rule, CompilationTarget $compilationTarget, Context $context): string { $ast = $this->parser->parse($rule); $executorModel = $compilationTarget->compile($ast, $context); $flattenedTraits = implode(PHP_EOL, array_map(function ($trait) { return "\t".'use \\'.ltrim($trait, '\\').';'; }, $executorModel->getTraits())); $extraCode = ''; foreach ($executorModel->getCompiledData() as $key => $value) { $extraCode .= sprintf('private $%s = %s;'.PHP_EOL, $key, var_export($value, true)); } $commentedRule = str_replace(PHP_EOL, PHP_EOL.' // ', $rule); return <<<EXECUTOR namespace RulerZ\Compiled\Executor; use RulerZ\Executor\Executor; class {$context['executor_classname']} implements Executor { $flattenedTraits $extraCode // $commentedRule protected function execute(\$target, array \$operators, array \$parameters) { return {$executorModel->getCompiledRule()}; } } EXECUTOR; }
這段代碼會依照crc13算法生成一個哈希串和Executor拼接做爲執行器臨時類的名稱,並將執行器相關代碼寫進上文提到的臨時目錄中去。生成的代碼以下:
// /private/var/folders/w_/sh4r42wn4_b650l3pc__fh7h0000gp/T/rulerz_executor_ff2800e8 <?php namespace RulerZ\Compiled\Executor; use RulerZ\Executor\Executor; class Executor_ff2800e8 implements Executor { use \RulerZ\Executor\ArrayTarget\FilterTrait; use \RulerZ\Executor\ArrayTarget\SatisfiesTrait; use \RulerZ\Executor\ArrayTarget\ArgumentUnwrappingTrait; // gender = :gender and points > :min_points and points > :min_points protected function execute($target, array $operators, array $parameters) { return ($this->unwrapArgument($target["gender"]) == $parameters["gender"] && ($this->unwrapArgument($target["points"]) > $parameters["min_points"] && $this->unwrapArgument($target["points"]) > $parameters["min_points"])); } }
這個臨時類文件就是最後要執行過濾動做的類。
FilterTrait中的filter方法是首先被執行的,裏面會根據execute返回的布爾值來判斷,是否經過迭代器返回符合條件的行。
execute方法就是根據具體的參數和操做符挨個判斷每行中對應的cell是否符合判斷來返回true/false。
public function filter($target, array $parameters, array $operators, ExecutionContext $context) { return IteratorTools::fromGenerator(function () use ($target, $parameters, $operators) { foreach ($target as $row) { $targetRow = is_array($row) ? $row : new ObjectContext($row); if ($this->execute($targetRow, $operators, $parameters)) { yield $row; } } }); }
satisfies和filter基本邏輯相似,只是最後satisfies是執行單條判斷。
有一個問題,咱們的編譯器是如何知道咱們設立的操做規則$rule
的具體含義的,如何parse的?
這就涉及另外一個問題了,抽象語法樹(AST)。
咱們都知道php zend引擎在解讀代碼的過程當中有一個過程是語法和詞法分析,這個過程叫作parser,中間會將代碼轉化爲抽象語法樹,這是引擎可以讀懂代碼的關鍵步驟。
一樣,咱們在寫一條規則字符串的時候,代碼如何可以明白咱們寫的是什麼呢?那就是抽象語法樹。
以上面的規則爲例:
gender = :gender and points > :min_points
這裏, =、and、>都是操做符,可是機器並不知道他們是操做符,也不知道其餘字段是什麼含義。
因而rulerz使用本身的語法模板。
首先是默認定義了幾個操做符。
<?php declare(strict_types=1); namespace RulerZ\Target\Native; use RulerZ\Target\Operators\Definitions; class NativeOperators { public static function create(Definitions $customOperators): Definitions { $defaultInlineOperators = [ 'and' => function ($a, $b) { return sprintf('(%s && %s)', $a, $b); }, 'or' => function ($a, $b) { return sprintf('(%s || %s)', $a, $b); }, 'not' => function ($a) { return sprintf('!(%s)', $a); }, '=' => function ($a, $b) { return sprintf('%s == %s', $a, $b); }, 'is' => function ($a, $b) { return sprintf('%s === %s', $a, $b); }, '!=' => function ($a, $b) { return sprintf('%s != %s', $a, $b); }, '>' => function ($a, $b) { return sprintf('%s > %s', $a, $b); }, '>=' => function ($a, $b) { return sprintf('%s >= %s', $a, $b); }, '<' => function ($a, $b) { return sprintf('%s < %s', $a, $b); }, '<=' => function ($a, $b) { return sprintf('%s <= %s', $a, $b); }, 'in' => function ($a, $b) { return sprintf('in_array(%s, %s)', $a, $b); }, ]; $defaultOperators = [ 'sum' => function () { return array_sum(func_get_args()); }, ]; $definitions = new Definitions($defaultOperators, $defaultInlineOperators); return $definitions->mergeWith($customOperators); } }
在RulerZParserParser中,有以下方法:
public function parse($rule) { if ($this->parser === null) { $this->parser = Compiler\Llk::load( new File\Read(__DIR__.'/../Grammar.pp') ); } $this->nextParameterIndex = 0; return $this->visit($this->parser->parse($rule)); }
這裏要解讀一個核心語法文件Grammar.pp,Pascal語法腳本:
// // Hoa // // // @license // // New BSD License // // Copyright © 2007-2015, Ivan Enderlin. All rights reserved. // // Redistribution and use in source and binary forms, with or without // modification, are permitted provided that the following conditions are met: // * Redistributions of source code must retain the above copyright // notice, this list of conditions and the following disclaimer. // * Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimer in the // documentation and/or other materials provided with the distribution. // * Neither the name of the Hoa nor the names of its contributors may be // used to endorse or promote products derived from this software without // specific prior written permission. // // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" // AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE // IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE // ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS AND CONTRIBUTORS BE // LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR // CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF // SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS // INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN // CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) // ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE // POSSIBILITY OF SUCH DAMAGE. // // Inspired from \Hoa\Ruler\Grammar. // // @author Stéphane Py <stephane.py@hoa-project.net> // @author Ivan Enderlin <ivan.enderlin@hoa-project.net> // @author Kévin Gomez <contact@kevingomez.fr> // @copyright Copyright © 2007-2015 Stéphane Py, Ivan Enderlin, Kévin Gomez. // @license New BSD License %skip space \s // Scalars. %token true (?i)true %token false (?i)false %token null (?i)null // Logical operators %token not (?i)not\b %token and (?i)and\b %token or (?i)or\b %token xor (?i)xor\b // Value %token string ("|')(.*?)(?<!\\)\1 %token float -?\d+\.\d+ %token integer -?\d+ %token parenthesis_ \( %token _parenthesis \) %token bracket_ \[ %token _bracket \] %token comma , %token dot \. %token positional_parameter \? %token named_parameter :[a-z-A-Z0-9_]+ %token identifier [^\s\(\)\[\],\.]+ #expression: logical_operation() logical_operation: operation() ( ( ::and:: #and | ::or:: #or | ::xor:: #xor ) logical_operation() )? operation: operand() ( <identifier> logical_operation() #operation )? operand: ::parenthesis_:: logical_operation() ::_parenthesis:: | value() parameter: <positional_parameter> | <named_parameter> value: ::not:: logical_operation() #not | <true> | <false> | <null> | <float> | <integer> | <string> | parameter() | variable() | array_declaration() | function_call() variable: <identifier> ( object_access() #variable_access )* object_access: ::dot:: <identifier> #attribute_access #array_declaration: ::bracket_:: value() ( ::comma:: value() )* ::_bracket:: #function_call: <identifier> ::parenthesis_:: ( logical_operation() ( ::comma:: logical_operation() )* )? ::_parenthesis::
上面Llk::load方法會加載這個基礎語法內容並解析出片斷tokens,tokens解析的邏輯就是正則匹配出咱們須要的一些操做符和基礎標識符,並將對應的正則表達式提取出來:
array:1 [▼ "default" => array:20 [▼ "skip" => "\s" "true" => "(?i)true" "false" => "(?i)false" "null" => "(?i)null" "not" => "(?i)not\b" "and" => "(?i)and\b" "or" => "(?i)or\b" "xor" => "(?i)xor\b" "string" => "("|')(.*?)(?<!\\)\1" "float" => "-?\d+\.\d+" "integer" => "-?\d+" "parenthesis_" => "\(" "_parenthesis" => "\)" "bracket_" => "\[" "_bracket" => "\]" "comma" => "," "dot" => "\." "positional_parameter" => "\?" "named_parameter" => ":[a-z-A-Z0-9_]+" "identifier" => "[^\s\(\)\[\],\.]+" ] ]
這一步也會生成一個rawRules
array:10 [▼ "#expression" => " logical_operation()" "logical_operation" => " operation() ( ( ::and:: #and | ::or:: #or | ::xor:: #xor ) logical_operation() )?" "operation" => " operand() ( <identifier> logical_operation() #operation )?" "operand" => " ::parenthesis_:: logical_operation() ::_parenthesis:: | value()" "parameter" => " <positional_parameter> | <named_parameter>" "value" => " ::not:: logical_operation() #not | <true> | <false> | <null> | <float> | <integer> | <string> | parameter() | variable() | array_declaration() | function_call( ▶" "variable" => " <identifier> ( object_access() #variable_access )*" "object_access" => " ::dot:: <identifier> #attribute_access" "#array_declaration" => " ::bracket_:: value() ( ::comma:: value() )* ::_bracket::" "#function_call" => " <identifier> ::parenthesis_:: ( logical_operation() ( ::comma:: logical_operation() )* )? ::_parenthesis::" ]
這個rawRules會經過analyzer類的analyzeRules方法解析替換裏面的::表示的空位,根據$_ppLexemes屬性的值,Compiler\Llk\Lexer()詞法解析器會將rawRules數組每個元素解析放入雙向鏈表棧(SplStack)中,而後再經過對該棧插入和刪除操做,造成一個包含全部操做符和token實例的數組$rules。
array:54 [▼ 0 => Concatenation {#64 ▶} "expression" => Concatenation {#65 ▼ #_name: "expression" #_children: array:1 [▼ 0 => 0 ] #_nodeId: "#expression" #_nodeOptions: [] #_defaultId: "#expression" #_defaultOptions: [] #_pp: " logical_operation()" #_transitional: false } 2 => Token {#62 ▶} 3 => Concatenation {#63 ▼ #_name: 3 #_children: array:1 [▼ 0 => 2 ] #_nodeId: "#and" #_nodeOptions: [] #_defaultId: null #_defaultOptions: [] #_pp: null #_transitional: true } 4 => Token {#68 ▶} 5 => Concatenation {#69 ▶} 6 => Token {#70 ▶} 7 => Concatenation {#71 ▶} 8 => Choice {#72 ▶} 9 => Concatenation {#73 ▶} 10 => Repetition {#74 ▶} "logical_operation" => Concatenation {#75 ▶} 12 => Token {#66 ▶} 13 => Concatenation {#67 ▶} 14 => Repetition {#78 ▶} "operation" => Concatenation {#79 ▶} 16 => Token {#76 ▶} 17 => Token {#77 ▶} 18 => Concatenation {#82 ▶} "operand" => Choice {#83 ▶} 20 => Token {#80 ▶} 21 => Token {#81 ▼ #_tokenName: "named_parameter" #_namespace: null #_regex: null #_ast: null #_value: null #_kept: true #_unification: -1 #_name: 21 #_children: null #_nodeId: null #_nodeOptions: [] #_defaultId: null #_defaultOptions: [] #_pp: null #_transitional: true } "parameter" => Choice {#86 ▶} 23 => Token {#84 ▶} 24 => Concatenation {#85 ▶} 25 => Token {#89 ▶} 26 => Token {#90 ▶} 27 => Token {#91 ▶} 28 => Token {#92 ▶} 29 => Token {#93 ▶} 30 => Token {#94 ▶} "value" => Choice {#95 ▶} 32 => Token {#87 ▶} 33 => Concatenation {#88 ▶} 34 => Repetition {#98 ▶} "variable" => Concatenation {#99 ▶} 36 => Token {#96 ▶} 37 => Token {#97 ▶} "object_access" => Concatenation {#102 ▶} 39 => Token {#100 ▶} 40 => Token {#101 ▶} 41 => Concatenation {#105 ▶} 42 => Repetition {#106 ▶} 43 => Token {#107 ▶} "array_declaration" => Concatenation {#108 ▶} 45 => Token {#103 ▶} 46 => Token {#104 ▶} 47 => Token {#111 ▶} 48 => Concatenation {#112 ▶} 49 => Repetition {#113 ▶} 50 => Concatenation {#114 ▶} 51 => Repetition {#115 ▶} 52 => Token {#116 ▶} "function_call" => Concatenation {#117 ▶} ]
而後返回HoaCompilerLlkParser實例,這個實例有一個parse方法,正是此方法構成了一個語法樹。
public function parse($text, $rule = null, $tree = true) { $k = 1024; if (isset($this->_pragmas['parser.lookahead'])) { $k = max(0, intval($this->_pragmas['parser.lookahead'])); } $lexer = new Lexer($this->_pragmas); $this->_tokenSequence = new Iterator\Buffer( $lexer->lexMe($text, $this->_tokens), $k ); $this->_tokenSequence->rewind(); $this->_errorToken = null; $this->_trace = []; $this->_todo = []; if (false === array_key_exists($rule, $this->_rules)) { $rule = $this->getRootRule(); } $closeRule = new Rule\Ekzit($rule, 0); $openRule = new Rule\Entry($rule, 0, [$closeRule]); $this->_todo = [$closeRule, $openRule]; do { $out = $this->unfold(); if (null !== $out && 'EOF' === $this->_tokenSequence->current()['token']) { break; } if (false === $this->backtrack()) { $token = $this->_errorToken; if (null === $this->_errorToken) { $token = $this->_tokenSequence->current(); } $offset = $token['offset']; $line = 1; $column = 1; if (!empty($text)) { if (0 === $offset) { $leftnl = 0; } else { $leftnl = strrpos($text, "\n", -(strlen($text) - $offset) - 1) ?: 0; } $rightnl = strpos($text, "\n", $offset); $line = substr_count($text, "\n", 0, $leftnl + 1) + 1; $column = $offset - $leftnl + (0 === $leftnl); if (false !== $rightnl) { $text = trim(substr($text, $leftnl, $rightnl - $leftnl), "\n"); } } throw new Compiler\Exception\UnexpectedToken( 'Unexpected token "%s" (%s) at line %d and column %d:' . "\n" . '%s' . "\n" . str_repeat(' ', $column - 1) . '↑', 0, [ $token['value'], $token['token'], $line, $column, $text ], $line, $column ); } } while (true); if (false === $tree) { return true; } $tree = $this->_buildTree(); if (!($tree instanceof TreeNode)) { throw new Compiler\Exception( 'Parsing error: cannot build AST, the trace is corrupted.', 1 ); } return $this->_tree = $tree; }
咱們獲得的一個完整的語法樹是這樣的:
Rule {#120 ▼ #_root: Operator {#414 ▼ #_name: "and" #_arguments: array:2 [▼ 0 => Operator {#398 ▼ #_name: "=" #_arguments: array:2 [▼ 0 => Context {#396 ▼ #_id: "gender" #_dimensions: [] } 1 => Parameter {#397 ▼ -name: "gender" } ] #_function: false #_laziness: false #_id: null #_dimensions: [] } 1 => Operator {#413 ▼ #_name: "and" #_arguments: array:2 [▼ 0 => Operator {#401 ▼ #_name: ">" #_arguments: array:2 [▼ 0 => Context {#399 ▶} 1 => Parameter {#400 ▶} ] #_function: false #_laziness: false #_id: null #_dimensions: [] } 1 => Operator {#412 ▶} ] #_function: false #_laziness: true #_id: null #_dimensions: [] } ] #_function: false #_laziness: true #_id: null #_dimensions: [] } }
這裏有根節點、子節點、操做符參數以及HoaRulerModelOperator實例。
這時$executorModel = $compilationTarget->compile($ast, $context);就能夠經過NativeVisitor的visit方法對這個語法樹進行訪問和分析了。
這一步走的是visitOperator()
/** * {@inheritdoc} */ public function visitOperator(AST\Operator $element, &$handle = null, $eldnah = null) { $operatorName = $element->getName(); // the operator does not exist at all, throw an error before doing anything else. if (!$this->operators->hasInlineOperator($operatorName) && !$this->operators->hasOperator($operatorName)) { throw new OperatorNotFoundException($operatorName, sprintf('Operator "%s" does not exist.', $operatorName)); } // expand the arguments $arguments = array_map(function ($argument) use (&$handle, $eldnah) { return $argument->accept($this, $handle, $eldnah); }, $element->getArguments()); // and either inline the operator call if ($this->operators->hasInlineOperator($operatorName)) { $callable = $this->operators->getInlineOperator($operatorName); return call_user_func_array($callable, $arguments); } $inlinedArguments = empty($arguments) ? '' : ', '.implode(', ', $arguments); // or defer it. return sprintf('call_user_func($operators["%s"]%s)', $operatorName, $inlinedArguments); }
那麼編譯好的規則能夠經過如下方式獲得:
$executorModel->getCompiledRule() // 規則就是 $this->unwrapArgument($target["gender"]) == $parameters["gender"] && ($this->unwrapArgument($target["points"]) > $parameters["min_points"] && $this->unwrapArgument($target["points"]) > $parameters["min_points"])
因爲官方文檔太老且無更,因此若是你按照他的文檔去自定義的話會哭暈,這裏給出一個對應的示例。
$compiler = Compiler::create(); $rulerz = new RulerZ($compiler, [ new Native([ 'length' => 'strlen' ],[ 'contains' => function ($a, $b) { return sprintf('strstr(%s, %s)', $a, $b); } ]) ]);
上文中contains
表示的是用系統函數strstr()
來判斷$a中是否包含$b字符,因爲編譯後的代碼是經過字符串生成的,因此你在這個匿名函數中必需要用字符串表達判斷邏輯,這也是其缺點之一。