現象:hive 表中的小時數據,每隔幾天就會缺失一個小時的,最後發現時在作數據聚合cat的時候,失敗,致使:spa
修改腳本,作下面的方案,解決了:awk
##merge 5min data into hour data cat $datapath/news_5min_$xhour* > $localpath/data/channelnews_$hour.txt #####check tmppath="${localpath}/data/channelnews_${hour}.txt" i=0 while (( $i < 10)) do m=`du -b $path | awk '{print int($1)}'` if [ $m -lt 1024 ]; then echo "${path} is small ,is $m" sleep 5; else break fi let "i++" done echo "i is:$i" channel