改善wget沒法限制下載文件大小的缺陷

wget 沒法限制下載文件的大小,若是你的URL列表中有一個很大的文件,勢必致使下載過程延長,故用curl得到文件的header,解析出其中的content-length,來得到將要下載的文件長度。若是超過預先設置的threshold,則不予下載。
 
固然目前的bash shell版本對不存在content-length的URL,不作特殊處理,因此仍沒法避免大文件的下載。具體擴展思路:
在下載過程當中,不斷query文件的大小,當超過必定閾值,kill掉下載進程。
 
因此當前版本還有待改進:
if [ $# -eq 4 ] then     echo "start downloading..."     urllist=$1     limitsize=$2     outfolder=$3     logfolder=$4     echo "url list file:$urllist"     echo "limited file size:$limitsize bytes"     echo "output folder:$outfolder"     echo "log folder:$logfolder" else     echo "usage: ./download.sh <url list> <limited file size> <output folder> <log folder>..."     exit fi if [ -d "$outfolder" ] then     echo "$outfolder exists..." else     echo "make $outfolder..."     mkdir $outfolder fi if [ -d "$logfolder" ] then     echo "$logfolder exists..." else     echo "make $logfolder..."     mkdir $logfolder fi cat $urllist|while read url;do     echo "downloading:$url"     len=$(curl -I -s "$url"|grep Content-Length|cut -d' ' -f2|tr -d '\15')     if [ ! -z $len ]     then         echo "length:$len bytes"         if [ $len -gt $limitsize ]         then             echo "$url is greater than $limitsize bytes, can't be downloaded."         else             echo "$url is smaller than $limitsize bytes, can be downloaded."             filename=$(echo $url|tr -d ':/?\|*<>')             wget -P $outfolder -x -t 3 --save-headers --connect-timeout=10 --read-timeout=10 --level=1 $url -o $logfolder/$filename.txt         fi     else         echo "$url file size is unknown."         filename=$(echo $url|tr -d ':/?\|*<>')         wget -P $outfolder -x -t 3 --save-headers --connect-timeout=10 --read-timeout=10 --level=1 $url -o $logfolder/$filename.txt     fi done
相關文章
相關標籤/搜索