Teradata 之top n與sample n

時間 2019-11-29

標籤 teradata sample 简体版

原文原文鏈接

Teradata取n條樣本數據的方法有兩種：redis

select top x * from table;
select * from table sample n;數據庫

那麼兩者有什麼區別呢？說明以下：less

TOP N
若是有Order By關鍵字首先要對數據庫的數據進行排序，而後獲取N條數據或者抽樣比率爲N；
若是沒有Order By關鍵字，要作一次STAT FUNCTION的全AMP收集，而後選擇某一個或者幾個AMP 來抽取數據。dom

Sample N
首先要對數據庫的數據進行全表掃描，而後獲取N數據；
採用的是Sampling的形式。ide

[@more@]性能

性能比較：
當數據量比較小的時候，TOP N的速度要比Sample的速度快；
當數據量比較大的時候，Sample N的速度要比TOP的速度快。測試

測試：ui

Explain select top 10 * from PD_PORTAL.TOPIC_COMP_DETAILthis

1) First, we lock a distinct PD_PORTAL."pseudo table" for read on a
RowHash to prevent global deadlock for PD_PORTAL.TOPIC_COMP_DETAIL.
2) Next, we lock PD_PORTAL.TOPIC_COMP_DETAIL for read.
3) We do an all-AMPs STAT FUNCTION step from
PD_PORTAL.TOPIC_COMP_DETAIL by way of an all-rows scan with no
residual conditions into Spool 5, which is redistributed by hash
code to all AMPs. The result rows are put into Spool 1
(group_amps), which is built locally on the AMPs. This step is
used to retrieve the TOP 10 rows. One AMP is randomly selected to
retrieve 10 rows. If this step retrieves less than 10 rows, then
execute step 4. The size is estimated with low confidence to be
10 rows (27,460 bytes).
4) We do an all-AMPs STAT FUNCTION step from
PD_PORTAL.TOPIC_COMP_DETAIL by way of an all-rows scan with no
residual conditions into Spool 5 (Last Use), which is
redistributed by hash code to all AMPs. The result rows are put
into Spool 1 (group_amps), which is built locally on the AMPs.
This step is used to retrieve the TOP 10 rows. The size is
estimated with low confidence to be 10 rows (27,460 bytes).
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1..net

Explain select * from PD_PORTAL.TOPIC_COMP_DETAIL sample 10

1. mapreduce top n（轉）
2. top-N 抽樣
3. Mysql Top N
4. mapreduce top n
5. TOP-N 分析
6. mysql top n 問題
7. n++,n--與++n,--n的區別
8. ORACLE中的TOP-N查詢（TOP-N分析）、分頁查詢
9. 分組Top N問題(一) - java實現Top n算法基礎
10. Oracle之子查詢：Top-N問題
更多相關文章...
• Docker top 命令 - Docker命令大全
• SQL SELECT TOP, LIMIT, ROWNUM 子句 - SQL 教程
• 互聯網組織的未來：剖析GitHub員工的任性之源
• Composer 安裝與使用

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。