今天才明確知道group by實際上仍是有去重讀做用的,其實細想一下,按照xx分類,確定相同的就算是一類了,也就至關於去重來,詳細的看一下。面試
hive> select * from test; OK zhao 15 20170807 zhao 14 20170809 zhao 15 20170809 zhao 16 20170809 hive> select name from test; OK zhao zhao zhao zhao hive> select name from test group by name; ... OK zhao Time taken: 40.273 seconds, Fetched: 1 row(s)
按照這個去分類,最後結果只有一個,達到了去重的效果;實際上,所謂去重,確定是兩個同樣的才能夠去重,下面試一下兩列的效果:.net
hive> select name,age from test group by name,age; ... OK zhao 14 zhao 15 zhao 16 Time taken: 36.943 seconds, Fetched: 3 row(s) hive> select name,age from test group by name; FAILED: SemanticException [Error 10025]: Line 1:12 Expression not in GROUP BY key 'age'
只group by name就會出錯,想一下只用name去作那麼age不一樣就無法處理了,也合情合理。code
這個也比較簡單,就是去重:blog
hive> select distinct name from test; ... OK zhao Time taken: 37.047 seconds, Fetched: 1 row(s) hive> select distinct name,age from test; OK zhao 14 zhao 15 zhao 16 Time taken: 39.131 seconds, Fetched: 3 row(s) hive> select distinct(name),age from test; OK zhao 14 zhao 15 zhao 16 Time taken: 37.739 seconds, Fetched: 3 row(s)