返回ProxySQL系列文章:http://www.cnblogs.com/f-ck-need-u/p/7586194.htmlhtml
ProxySQL在收到前端發送來的SQL語句後,能夠根據已定製的規則去匹配它,匹配到了還能夠去重寫這個語句,而後再路由到後端去。前端
何時須要重寫SQL語句?mysql
對於下面這種簡單的讀、寫分離,固然用不上重寫SQL語句。正則表達式
這樣的讀寫分離,實現起來很是簡單。以下:sql
mysql_replication_hostgroups: +------------------+------------------+----------+ | writer_hostgroup | reader_hostgroup | comment | +------------------+------------------+----------+ | 10 | 20 | cluster1 | +------------------+------------------+----------+ mysql_servers: +--------------+----------+------+--------+--------+ | hostgroup_id | hostname | port | status | weight | +--------------+----------+------+--------+--------+ | 10 | master | 3306 | ONLINE | 1 | | 20 | slave1 | 3306 | ONLINE | 1 | | 20 | slave2 | 3306 | ONLINE | 1 | +--------------+----------+------+--------+--------+ mysql_query_rules: +---------+-----------------------+----------------------+ | rule_id | destination_hostgroup | match_digest | +---------+-----------------------+----------------------+ | 1 | 10 | ^SELECT.*FOR UPDATE$ | | 2 | 20 | ^SELECT | +---------+-----------------------+----------------------+
可是,複雜一點的,例如ProxySQL實現sharding功能。對db1庫的select_1語句路由給hg=10的組,將db2庫的select_2語句路由給hg=20的組,將db3庫的select_3語句路由給hg=30的組。數據庫
在ProxySQL實現sharding時,基本上都須要將SQL語句進行重寫。這裏用一個簡單的例子來講明分庫是如何進行的。後端
假如,計算機學院it_db佔用一個數據庫,裏面有一張學生表stu,stu表中有表明專業的字段zhuanye(例子只是隨便舉的,請無視合理性)。app
it_db庫: stu表 +---------+----------+---------+ | stu_id | stu_name | zhuanye | +---------+----------+---------+ | 1-99 | ... | Linux | +---------+----------+---------+ | 100-150 | ... | MySQL | +---------+----------+---------+ | 151-250 | ... | JAVA | +---------+----------+---------+ | 251-550 | ... | Python | +---------+----------+---------+
分庫時,能夠爲各個專業建立庫。因而,建立4個庫,每一個庫中仍保留stu表,但只保留和庫名對應的學生數據:less
Linux庫:stu表 +---------+----------+---------+ | stu_id | stu_name | zhuanye | +---------+----------+---------+ | 1-99 | ... | Linux | +---------+----------+---------+ MySQL庫:stu表 +---------+----------+---------+ | stu_id | stu_name | zhuanye | +---------+----------+---------+ | 100-150 | ... | MySQL | +---------+----------+---------+ JAVA庫:stu表 +---------+----------+---------+ | stu_id | stu_name | zhuanye | +---------+----------+---------+ | 151-250 | ... | JAVA | +---------+----------+---------+ Python庫:stu表 +---------+----------+---------+ | stu_id | stu_name | zhuanye | +---------+----------+---------+ | 251-550 | ... | Python | +---------+----------+---------+
因而,原來查詢MySQL專業學生的SQL語句:測試
select * from it_db.stu where zhuanye='MySQL' and xxx;
分庫後,該SQL語句須要重寫爲:
select * from MySQL.stu where 1=1 and xxx;
至於如何達到上述目標,本文結尾給出了一個參考規則。
sharding而重寫只是一種狀況,在不少使用複雜ProxySQL路由規則時可能都須要重寫SQL語句。下面將簡單介紹ProxySQL的語句重寫,爲後文作個鋪墊,在以後介紹ProxySQL + sharding的文章中有更多具體的用法。
在mysql_query_rules表中有match_pattern字段和replace_pattern字段,前者是匹配SQL語句的正則表達式,後者是匹配成功後(命中規則),將原SQL語句改寫,改寫後再路由給後端。
須要注意幾點:
本文的替換規則出於入門的目的,很簡單,只需掌握最基本的正則知識便可。但想要靈活運用,須要掌握PCRE的正則,若是您已有正則的基礎,可參考個人一篇總結性文章:pcre和正則表達式的誤點。
例如,將下面的語句1重寫爲語句2。
select * from test1.t1; select * from test1.t2;
插入以下規則:
delete from mysql_query_rules; select * from stats_mysql_query_digest_reset where 1=0; insert into mysql_query_rules(rule_id,active,match_pattern,replace_pattern,destination_hostgroup,apply) values (1,1,"^(select.*from )test1.t1(.*)","\1test1.t2\2",20,1); load mysql query rules to runtime; save mysql query rules to disk; select rule_id,destination_hostgroup,match_pattern,replace_pattern from mysql_query_rules; +---------+-----------------------+------------------------------+-----------------+ | rule_id | destination_hostgroup | match_pattern | replace_pattern | +---------+-----------------------+------------------------------+-----------------+ | 1 | 20 | ^(select.*from )test1.t1(.*) | \1test1.t2\2 | +---------+-----------------------+------------------------------+-----------------+
而後執行:
$ proc="mysql -uroot -pP@ssword1! -h127.0.0.1 -P6033 -e" $ $proc "select * from test1.t1;" +------------------+ | name | +------------------+ | test1_t2_malong1 | | test1_t2_malong2 | | test1_t2_malong3 | +------------------+
可見語句成功重寫。
再看看規則的狀態。
Admin> select rule_id,hits from stats_mysql_query_rules; +---------+------+ | rule_id | hits | +---------+------+ | 1 | 1 | | 2 | 0 | +---------+------+ Admin> select hostgroup,count_star,digest_text from stats_mysql_query_digest; +-----------+------------+------------------------+ | hostgroup | count_star | digest_text | +-----------+------------+------------------------+ | 20 | 1 | select * from test1.t2 | <--已替換 +-----------+------------+------------------------+
更簡單的,還能夠直接替換單詞。例如:
delete from mysql_query_rules; select * from stats_mysql_query_digest_reset where 1=0; insert into mysql_query_rules(rule_id,active,match_pattern,replace_pattern,destination_hostgroup,apply) values (1,1,"test1.t1","test1.t2",20,1); load mysql query rules to runtime; save mysql query rules to disk; select rule_id,destination_hostgroup,match_pattern,replace_pattern from mysql_query_rules; +---------+-----------------------+---------------+-----------------+ | rule_id | destination_hostgroup | match_pattern | replace_pattern | +---------+-----------------------+---------------+-----------------+ | 1 | 20 | test1.t1 | test1.t2 | +---------+-----------------------+---------------+-----------------+
以本文前面sharding示例中的語句爲例,簡單演示下sharding時的分庫語句怎麼改寫。更完整的sharding實現方法,見後面的文章。
#原來查詢MySQL專業學生的SQL語句: select * from it_db.stu where zhuanye='MySQL' and xxx; | | | \|/ #改寫爲查詢分庫MySQL的SQL語句: select * from MySQL.stu where 1=1 and xxx;
如下是完整語句:關於這個規則中的正則部分,稍後會解釋。
delete from mysql_query_rules; select * from stats_mysql_query_digest_reset where 1=0; insert into mysql_query_rules(rule_id,active,apply,destination_hostgroup,match_pattern,replace_pattern) values (1,1,1,20,"^(select.*?from) it_db\.(.*?) where zhuanye=['""](.*?)['""] (.*)$","\1 \3.\2 where 1=1 \4"); load mysql query rules to runtime; save mysql query rules to disk; select rule_id,destination_hostgroup dest_hg,match_pattern,replace_pattern from mysql_query_rules; +---------+---------+-----------------------------------------------------------------+-----------------------+ | rule_id | dest_hg | match_pattern | replace_pattern | +---------+---------+-----------------------------------------------------------------+-----------------------+ | 1 | 20 | ^(select.*?from) it_db\.(.*?) where zhuanye=['"](.*?)['"] (.*)$ | \1 \3.\2 where 1=1 \4 | +---------+---------+-----------------------------------------------------------------+-----------------------+
而後執行分庫查詢語句:
proc="mysql -uroot -pP@ssword1! -h127.0.0.1 -P6033 -e" $proc "select * from it_db.stu where zhuanye='MySQL' and 1=1;"
看看是否命中規則,併成功改寫SQL語句:
Admin> select rule_id,hits from stats_mysql_query_rules; +---------+------+ | rule_id | hits | +---------+------+ | 1 | 1 | +---------+------+ Admin> select hostgroup,count_star,digest_text from stats_mysql_query_digest; +-----------+------------+-------------------------------------------+ | hostgroup | count_star | digest_text | +-----------+------------+-------------------------------------------+ | 20 | 1 | select * from MySQL.stu where ?=? and ?=? | | 10 | 1 | select @@version_comment limit ? | +-----------+------------+-------------------------------------------+
解釋下前面的規則:
match_pattern:
- "^(select.*?from) it_db\.(.*?) where zhuanye=['""](.*?)['""] (.*)$"
replace_pattern:
- "\1 \3.\2 where 1=1 \4"
^(select.*?from)
:表示不貪婪匹配到from字符。之因此不貪婪匹配,是爲了不子查詢或join子句出現多個from的狀況。
it_db\.(.*?)
:這裏的it_db是稍後要替換掉爲"MySQL"字符的部分,而it_db後面的表稍後要附加在"MySQL"字符後,因此對其分組捕獲。
zhuanye=['""](.*?)['""]
:
- 這裏的zhuanye字段稍後是要刪除的,但後面的字段值"MySQL"須要保留做爲稍後的分庫,所以對字段值分組捕獲。同時,字段值先後的引號多是單引號、雙引號,因此兩種狀況都要考慮到。
- ['""]
:要把引號保留下來,須要對額外的引號進行轉義:雙引號轉義後成單個雙引號。因此,真正插入到表中的結果是['"]
。
- 這裏的語句並不健壯,由於若是是zhuanye='MySQL"
這樣單雙引號混用也能被匹配。若是要避免這種問題,須要使用PCRE的反向引用。例如,改寫爲:zhuanye=(['""])(.*?)\g[N]
,這裏的[N]
要替換爲(['""])
對應的分組號碼,例如\g3
。
(.*)$
:匹配到結束。由於這裏的測試語句簡單,沒有join和子查詢什麼的,因此直接匹配。
"\1 \3.\2 where 1=1 \4"
:這裏加了1=1
,是爲了防止出現and/or等運算符時前面缺乏表達式。例如(.*)$
捕獲到的內容爲and xxx=1
,不加上1=1的話,將替換爲where and xxx=1
,這是錯誤的語句,因此1=1是個佔位表達式。
可見,要想實現一些複雜的匹配目標,正則表達式是很是繁瑣的。因此,頗有必要去掌握PCRE正則表達式。