非規範SQL的sharding-jdbc實踐

時間 2019-12-01

原文原文鏈接

在《「分庫分表" ？選型和流程要慎重，不然會失控》中，咱們談處處於驅動層的sharding-jdbc。開源作到這個水平，已經超棒了，不像tddl成了個太監。但仍是有坑。java

不過不能怪框架，畢竟有些sql，只有程序和鬼能懂。python

<select id="getCodes" resultMap="BaseResultMap" parameterType="java.util.Map">
    <foreach collection="orderCodes" index="index" item="item" open="" separator="union all" close="">
        select
      	<include refid="Base_Column_List"/>
       	from order
       	where  orderCode =  #{item}
    </foreach>
</select>
複製代碼

不支持的操做

分庫分表後，就成爲了一個閹割型的數據庫。不少sql的特性是不支持的，須要使用其餘手段改進。如下以3.0.0版本進行描述。spring

distinct

sharding-jdbc不支持distinct，單表可以使用group by進行替代。多表聯查可以使用exists替代sql

select DISTINCT
        a, b, c, d
        from  table
        where df=0
複製代碼

改爲數據庫

select a, b, c, d
        from  table
        where df=0
        group by a, b, c, d
複製代碼

having

sharding-jdbc不支持having，可以使用嵌套子查詢進行替代bash

union

sharding-jdbc不支持union（all），可拆分紅多個查詢，在程序拼接mybatis

關於子查詢

sharding-jdbc不支持在子查詢中出現一樣的表，如如下能夠==>app

SELECT COUNT(*) FROM (SELECT * FROM t_order o)
複製代碼

如下報錯==>框架

SELECT COUNT(*) FROM (SELECT * FROM t_order o WHERE o.id IN (SELECT id FROM t_order WHERE status = ?))
複製代碼

因爲歸併的限制，子查詢中包含聚合函數目前沒法支持。函數

mybatis 註釋

sharding-jdbc不支持sql中的<!-- – >註釋，如必須使用則寫在sql前，或使用/* */

不支持text字段

改成varchar，好幾年的bug了，可是沒改

case when

某些case when是不支持的，好比不在聚合函數中的case when，須要將這部分sql邏輯寫到程序裏。

case when不該該是DBA禁用的函數麼？咱們在填坑

一些奇怪的反應

這個是能夠的

select  a-b from dual  
複製代碼

但這個不能夠...

select (a-b)c from dual  
複製代碼

sharding 也不支持以下形式查詢，解析紊亂

and (1=1 or 1=1)
複製代碼

關於分頁

嚴禁無切分鍵的深分頁！由於會對SQL進行如下解釋，而後在內存運行。

select *  from a limit 10 offset 1000
複製代碼

=======>

Actual SQL:db0 ::: select *  from a limit 1010 offset 0
複製代碼

關於表名

表名需與sharding-jdbc配置一致，推薦均爲小寫。由於路由是放在hashmap裏的，沒有區分大小寫...因此若是你的sql寫錯了會找不到。

配置冗餘

每一張表都要配置路由信息纔可以被正確解析，若是你庫裏的表太多，這個配置文件會膨脹的特別大，上千行也是有的。因此在yml中能夠將配置文件分開。

spring.profiles.include: sharding
複製代碼

如何掃多庫

好比一些定時任務，須要遍歷全部庫。

方法1：遍歷全部庫

使用如下方式拿到真正的數據庫列表

Map<String, DataSource> map = ShardingDataSource.class.cast(dataSource).getDataSourceMap();
複製代碼

而後在每個庫上執行掃描邏輯。這種狀況下沒法使用mybaits，須要寫原生jdbc

方法2：根據切分鍵遍歷

此種方法會拿到一個切分鍵的列表，好比日期等。而後經過遍歷這個列表執行業務邏輯。此種方法在列表特別大的時候執行會比較緩慢。

如何驗證

分庫分表很危險，由於一旦數據入錯庫，後續的修理很麻煩。因此剛開始能夠將路由信息指向到源表，即：只驗證SQL路由的準確性。等待全部的SQL路由都驗證經過，再切換到真正的分庫或者表。

確保可以打印SQL

sharding.jdbc.config.sharding.props.sql.show: true
複製代碼

將sql打印到單獨的文件(logback)

<appender name="SQL" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>${LOG_HOME}/sharding.log</file>
    <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
    <fileNamePattern>${LOG_HOME}/backup/sharding.log.%d{yyyy-MM-dd}
    </fileNamePattern>
    <maxHistory>100</maxHistory>
</rollingPolicy>
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
    <pattern>${ENCODER_PATTERN}</pattern>
</encoder>
</appender>
複製代碼

寫一些腳本進行SQL文件的驗證。我這裏有個通用的，你能夠改下你的邏輯。

import sys
import re
import getopt

def process(SQL):
    one= "".join(line.strip().replace("\n", " ") for line in SQL)
    place = [m.groups()[0] if m.groups()[0] else m.groups()[1] for m in re.finditer(r"[ ]+(\w+)[ ]*=[ ]*\?|(\?)", one)]

    if len(place):
        mat = re.search(r"::: \[\[(.*)\]\]", one)
        if mat is not None:
            vals = [str(i).strip() for i in str(mat.groups()[0]).split(',')]
            if "splitKey" in place:
                for i in range(len(place)):
                    part = place[i]
                    //這裏寫你的邏輯
            else:
                 print("no splitKey", one)

SQL = []
def process_line(line):
    global SQL
    if "Actual SQL" in line:
        SQL = []
        SQL.append(line)
    else:
        if line.strip().endswith("]]"):
            SQL.append(line)
            process(SQL)
            SQL = []
        else:
            SQL.append(line)

opts, args = getopt.getopt(sys.argv[1:], "bf")

for op, value in opts:
    if op == "-b":
        print("enter comman mode , such as 'python x.py -b sharding.log > result'")
        with open(args[0], "rb") as f:
            for line in f:
                process_line(line)
    elif op== "-f":
    	print("enter stream scroll mode , such as 'python x.py -f sharding.log '")
        with open(args[0], "rb") as f:
            f.seek(0,2)
            while True:
                last_pos = f.tell()
                line = f.readline()
            if line: process_line(line)
複製代碼

其餘

你可能要常常切換路由，因此某些時候路由信息要放在雲端可以動態修改。

哦對了，我這裏還有一段開發階段的驗證代碼，能讓你快速驗證SQL可否正確解析。

@RunWith(SpringRunner.class)
@SpringBootTest(classes = App.class)

public class ShardingTest {
    @Autowired
    DataSource dataSource;

    @Test
    public void testGet() {
        try {
            Connection conn = dataSource.getConnection();
            PreparedStatement stmt;
            ResultSet rs;
            String sql = new String(Files.readAllBytes(Paths.get("/tmp/a.sql")));

            stmt = conn.prepareStatement(sql);
            rs = stmt.executeQuery();
            printRS(rs);

        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
    public static void printRS(ResultSet rs) throws Exception {
        ResultSetMetaData rsmd = rs.getMetaData();
        int columnsNumber = rsmd.getColumnCount();
        while (rs.next()) {
            for (int i = 1; i <= columnsNumber; i++) {
                if (i > 1) System.out.print(", ");
                String columnValue = rs.getString(i);
                System.out.print(columnValue + " " + rsmd.getColumnName(i));
            }
            System.out.println("");
        }
    }
}
複製代碼