昨天開發找到咱們DBA,要咱們寫一條Hive SQL。html
需求:mysql
有一個t表,主要有機場名稱airport,機場的經緯度distance這兩個列組成,想獲得全部距離小於100的兩個機場名。sql
其實寫這個SQL的邏輯並非很困難,難點是如何去重複值,apache
我用MySQL模擬的一個表,其實Hive語法和SQL差很少,插入了三條數據,a, b, c 分別表明三個機場名稱,結構以下:bash
mysql> show create table t\G *************************** 1. row *************************** Table: t Create Table: CREATE TABLE `t` ( `airport` varchar(10) DEFAULT NULL, `distant` int(11) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 1 row in set (0.00 sec) mysql> select * from t; +---------+---------+ | airport | distant | +---------+---------+ | a | 130 | | b | 140 | | c | 150 | +---------+---------+ 3 rows in set (0.00 sec)
經過!=篩選掉本機場本身之間的比較,用abs函數取絕對值獲得位置小於100的兩個機場
ide
mysql> select t1.airport, t2.airport from t t1,t t2 where t1.airport != t2.airport and abs(t1.distant-t2.distant) < 100; +---------+---------+ | airport | airport | +---------+---------+ | b | a | | c | a | | a | b | | c | b | | a | c | | b | c | +---------+---------+ 6 rows in set (0.00 sec)
可是問題來了,(b,a) 與(a,b),(c,a)與(a,c),(c,b)與(b,c)這裏被咱們視爲重複值,咱們只須要獲得其中某一行的數據,就知道是哪兩個機場名了,那麼,如何去掉這個重複值呢?函數
貌似distinct,group by都派不上用場了,最後諮詢了一位資深的SQL高手,找到了這麼一個函數hex(),能夠把一個字符轉化成十六進制,Hive也有對應的函數,效果以下:優化
mysql> select t1.airport,hex(t1.airport), t2.airport,hex(t2.airport) from t t1,t t2 where t1.airport != t2.airport and abs(t1.distant-t2.distant) < 100; +---------+-----------------+---------+-----------------+ | airport | hex(t1.airport) | airport | hex(t2.airport) | +---------+-----------------+---------+-----------------+ | b | 62 | a | 61 | | c | 63 | a | 61 | | a | 61 | b | 62 | | c | 63 | b | 62 | | a | 61 | c | 63 | | b | 62 | c | 63 | +---------+-----------------+---------+-----------------+ 6 rows in set (0.00 sec)
這樣咱們就能夠經過比較機場1和機場2的大小,來去掉重複值了ui
mysql> select t1.airport, t2.airport from t t1,t t2 where t1.airport != t2.airport and hex(t1.airport) < hex(t2.airport) and abs(t1.distant-t2.distant) < 100; +---------+---------+ | airport | airport | +---------+---------+ | a | b | | a | c | | b | c | +---------+---------+ 3 rows in set (0.00 sec)
最後再優化一下,結果以下:spa
mysql> select t1.airport, t2.airport from t t1,t t2 where hex(t1.airport) < hex(t2.airport) and abs(t1.distant-t2.distant) < 100; +---------+---------+ | airport | airport | +---------+---------+ | a | b | | a | c | | b | c | +---------+---------+ 3 rows in set (0.00 sec)
SQL並不複雜,沒有太多表的join和子查詢,可是以前遇到去掉重複值用distinct或者group by就能夠解決了,此次貌似不太適用,因此記錄一下,歡迎拍磚。
參考連接
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_hex