一共寫了三個腳本,第一次寫shell腳本,很蹩腳。寫完回頭看一看,質量確實不好勁。java
腳本一:addLinkFiles.shweb
在當前目錄下有一個xmlfile.xml文件,該文件須要手動編輯,文件中用標籤表示出來兩級目錄,例如:shell
<class>Life-sciences</class>bash
<dataset>uniprot</dataset>網絡
<location>http://www.example.com/example.nt.gz</location>ssh
沒有使用xml樹形結構,由於shell中解析實在是太麻煩了。用相對位置表示上下級關係。因此這並非一個嚴格的xml文件,四不像。上面三句話表示class中有dataset,dataset中有一個下載連接,對應的目錄結構爲"Life-sciences/uniprot/link",link中保存連接。addLinkFiles.sh會在啓動時檢查一下該文件,並在./download/data/中檢查linkpath = ${class}/${dataset}/link文件是否存在,若不存在,新建目錄和文件,而後添加下載連接到link文件中,並將${linkpath}添加到${modifiedLinkFile}中,每隔${interval}秒,腳本會檢查一次${xmlfile}的修改時間,若是修改時間改變了,說明有新的location添加了,檢查目錄和xml文件內容的對應關係,給xml中新添加的內容創建相應的目錄和文件。oop
#!/bin/bash #********************************************************* #addLinkFiles.sh
#Keep checking the $xmlfile. #The $xmlfile shoul have only 3 tags: class, name, location. # #last edited 2013.09.03 by Lyuxd. # #********************************************************* #****************** #----init---------- #****************** interval=10 rootDir=${PWD} dataDir=$rootDir"/data" logDir=$rootDir"/log" link="link" log="add.log" modifiedLinkFile="modifiedlinkfile" xmlfile="xmlfile.xml" level1="class" level2="name" level3="location" currentClass=$rootDir currentDataSet=$rootDir xmlLastMT=0 cd $rootDir #**************************************** #------Create Data, Log Directories------ #**************************************** if [ ! -d "$dataDir" ];then mkdir "$dataDir" fi if [ ! -d "$logDir" ];then mkdir "$logDir" fi #**************************************** #------Parsing the xmlfile------ #**************************************** if [ ! -f "$xmlfile" ];then echo "`date "+%Y.%m.%d-%H:%M:%S--ERROR: "`No xmlfile found. exit." >> "$logDir/$log" exit 1 fi #check the modified-time of xmlfile every $interval sec. If modified-time changed parse xmlfile. while true do xmlMT=$(stat -c %Y $xmlfile|awk '{print $0}') if [ "$xmlLastMT" -lt "$xmlMT" ];then xmlLastMT=$xmlMT echo "`date "+%Y.%m.%d-%H:%M:%S--INFO: "`parsing $xmlfile..." >> "$logDir/$log" while read line do #bu wei kong. if [ "$line"x != x ];then tmp=$(echo $line | awk -F "<|>| " '{print $2}') tag=$(echo $tmp) #check if "class" is existing. If not, create it. if [ "$tag"x = "$level1"x ]; then currentClass=$(echo $line | awk -F "<$tmp>|</$tmp>" '{print $2}') currentClass=$(echo $currentClass) currentDataSet=$rootDir if [ ! -z "$currentClass" ] && [ ! -d "$dataDir/$currentClass" ]; then mkdir "$dataDir/$currentClass" echo "`date "+%Y.%m.%d-%H:%M:%S--INFO: "`mkdir $dataDir/$currentClass" >> "$logDir/$log" fi #check if "name" is existing. If not, create it. elif [ "$tag"x = "$level2"x ] && [ "$currentClass" != "$rootDir" ]; then currentDataSet=$(echo $line | awk -F "<$tmp>|</$tmp>" '{print $2}') currentDataSet=$(echo $currentDataSet) if [ ! -z "$currentClass" ] &&[ ! -z "$currentDataSet" ] && [ ! -d "$dataDir/$currentClass/$currentDataSet" ]; then mkdir "$dataDir/$currentClass/$currentDataSet" echo "`date "+%Y.%m.%d-%H:%M:%S--INFO: "`mkdir $dataDir/$currentDataSet" >> "$logDir/$log" fi #check if "link" is existing. If not, create it. elif [ "$tag"x = "$level3"x ] && [ ! -z "$currentClass" ] && [ ! -z "$currentDataSet" ] && [ "$currentDataSet" != "$rootDir" ] && [ -d "$dataDir/$currentClass/$currentDataSet" ]; then if [ ! -f "$dataDir/$currentClass/$currentDataSet/$link" ]; then touch "$dataDir/$currentClass/$currentDataSet/$link" echo "`date "+%Y.%m.%d-%H:%M:%S--INFO: "`Create link file : $dataDir/$currentClass/$currentDataSet/$link" >> "$logDir/$log" fi newRecord=$(echo $line | awk -F "<$tmp>|</$tmp>" '{print $2}') ifexit=$(grep "$newRecord" "$dataDir/$currentClass/$currentDataSet/$link") if [ ! -z "$newRecord" ] && [ -z $ifexit ]; then #不存在相同的記錄 echo "$newRecord" >> "$dataDir/$currentClass/$currentDataSet/$link" echo "`date "+%Y.%m.%d-%H:%M:%S--INFO: "`Add new link $newRecord to $datatDir/$currentClass/$currentDataSet/$link" >> "$logDir/$log" echo "$dataDir/$currentClass/$currentDataSet/$link" >> "$logDir/modifiedLinkFile.tmp" fi else echo "`date "+%Y.%m.%d-%H:%M:%S--ERROR: "`Failed to process $line" >> "$logDir/$log" fi fi done <$xmlfile #****************************** #modifiedLinkFile.tmp contains the paths who were modified in last loop. #Deduplicate modifiedLinkFile.tmp --> modifiedLinkFile #****************************** if [ -f "$logDir/modifiedLinkFile.tmp" ]; then cat "$logDir/modifiedLinkFile.tmp"| awk '!a[$0]++{"date \"+%Y%m%d%H%M%S\""|getline time; print time,$0}' >> "$logDir/$modifiedLinkFile" rm "$logDir/modifiedLinkFile.tmp" else touch "$logDir/$modifiedLinkFile" fi fi #echo "`date "+%Y.%m.%d-%H:%M:%S--INFO: "`parsing end." >> "$logDir/$log" sleep $interval done
腳本二:checkmodifiedLinkFiles.shurl
modifiedlinkfile文件是由上面的腳本一來添加內容的,腳本二會以interval爲間隔檢查modifiedlinkfile文件的修改時間,若是修改時間發生改變,說明該文件被腳本一修改過了,也就是說,xmlfile中添加了新的下載連接,而且創建了對應的目錄。此時,腳本二就會將modifiedlinkfile中的記錄取出來(記錄是新建的link文件的絕對路徑),調用腳本三monitot.sh執行下載任務。spa
#!/bin/bash #************************************************* #This script reads in modifiedLinkFile, #for every record calling monitor.sh. #monitor.sh /home/class/name "wget -c -i link -b" # #last edited 2013.09.10 by lyuxd. # #************************************************* interval=10 rootDir=${PWD} dataDir=$rootDir"/data" logDir=$rootDir"/log" failedqueue="$logDir/failedQueue" runningTask="$logDir/runningTask" modifiedLinkFile="$logDir/modifiedLinkFile" modifiedLinkFileMT="$logDir/modifiedLinkFile.MT" log=$logDir"/check.log" maxWgetProcess=5 echo "`date "+%Y.%m.%d-%H:%M:%S--INFO: "`check is running...">>$log #***************************************** #-----------restart interrupted tasks----- #***************************************** if [ -f "$runningTask" ]; then while read line do counterWgetProcess=$(ps -A|grep -c "monitor.sh") while [ $counterWgetProcess -ge $maxWgetProcess ] do sleep 20 counterWgetProcess=$(ps -A|grep -c "monitor.sh") done echo "`date "+%Y.%m.%d-%H:%M:%S--INFO: "`Call ./monitor for $downloadDir." >> $log nohup "./monitor.sh" "$line" "wget -nd -c -i link -b" >> /dev/null & sleep 1 done <$runningTask fi #********************************* #------------failedQueue----- #********************************* #if [ -f "$failedqueue" ] && [ `ls -l "$failedqueue"|awk '{print $5}'` -gt "0" ];then # line=($(awk '{print $0}' $failedqueue)) # echo ${line[1]} # :>"$failedqueue" # for ((i=0;i<${#line[@]};i++)) # do # counterWgetProcess=$(ps -A|grep -c "monitor.sh") # while [ $counterWgetProcess -ge $maxWgetProcess ] # do # sleep 20 # counterWgetProcess=$(ps -A|grep -c "monitor.sh") # done # echo "./monitor.sh" "${line[i]}" "wget -nd -c -i link -b" # "./monitor.sh" "${line[i]}" "wget -nd -c -i link -b" >> /dev/null & #ex "$failedqueue" <<EOF #1d #wq #EOF # done #fi #*************************************************** #------------check new task in modifiedLinkFile----- #*************************************************** if [ ! -f "$modifiedLinkFile" ];then echo "`date "+%Y.%m.%d-%H:%M:%S--"`No modifiedLinkFile found. checkmodifiedLinkFiles.sh exit 1." >> $log exit 1 fi if [ ! -f "$modifiedLinkFileMT" ];then echo "0" > "$modifiedLinkFileMT" fi while true do newMT=$(stat -c %Y $modifiedLinkFile|awk '{print $0}') oldMT=$(awk '{print $0}' "$modifiedLinkFileMT") if [ "$newMT" != "$oldMT" ]; then while read line do if [ ! -z "$line" ] && [ "$line" != "" ]; then counterWgetProcess=$(ps -A|grep -c "monitor.sh") while [ $counterWgetProcess -ge $maxWgetProcess ] do #echo "waiting 20sec" sleep 20 counterWgetProcess=$(ps -A|grep -c "monitor.sh") done newLink=$(echo $line |awk '{print $2}') linkfileName=$(echo $newLink |awk -F "/" '{print $NF}') downloadDir=$(echo $newLink|awk -F "$linkfileName" '{print $1}') echo "`date "+%Y.%m.%d-%H:%M:%S--INFO: "`Call ./monitor for $downloadDir." >> $log "./monitor.sh" "$downloadDir" "wget -nd -c -i $linkfileName -b" >> /dev/null & sleep 1 fi done <$modifiedLinkFile : > $modifiedLinkFile echo $(stat -c %Y $modifiedLinkFile|awk '{print $0}') > "$modifiedLinkFileMT" #else #echo "nothing to do" fi sleep $interval done
腳本三:monitor.sh 這個腳本主要就是被腳本二調用,執行具體的下載任務了。下載前會在Life-sciences/uniprot下新建一個wgetlog目錄,目錄中存放下載日誌wget-log。下載過程當中,monitor.sh會以10S爲時間間隔不斷檢查日誌文件的大小,一旦文件大小在連續兩次檢查中沒有發生改變,則去查看日誌的最後三行,發現FINISH或者failed等關鍵字時,就中止下載,而且經過郵件通知。若是發現日誌後三行沒有找到關鍵字,則認爲是網絡速度有問題,致使下載速度爲0,因此日誌沒有增加,在interval時間後從新檢查日誌大小,重複此過程共maxchecktimes次,若是仍是沒有增加,則將該錯誤經過郵件通知。rest
#!/bin/bash #********************************************************* #monitor download directory. #One moniter.sh process is started for one download task. #IF some url in $downloadDir/link can't be reached, monitor #will log "WARNING". If load failed, log "ERROR". If #finished, log "FINISH". #mail to $mailAddress. # #Last edited 2013.09.04 by Lyuxd. # #********************************************************* #every $interval sec check the size of wgetlog. interval=30 #if size of wgetlog stay the same, try $maxtrytimes to check maxtrytimes=5 downloadDir=$1 command=$2 rootDir=${PWD} dataDir=$rootDir"/data" logDir=$rootDir"/log" log=$logDir"/monitor.log" wgetlogDir="$downloadDir/wgetlog" wgetlogname="`date +%Y%m%d%H%M%S`-wgetlog" wgetlog="$wgetlogDir/$wgetlogname" failedqueue="$logDir/failedQueue" runningTask="$logDir/runningTask" mailAddress="15822834587@139.com" lastERROR="e" addtoBoolean=0 cd $downloadDir sleep 1 counterMail=0 echo "`date "+%Y.%m.%d-%H:%M:%S--"`Monitor for directory: ${PWD}.">> $log whereAmI=$(echo ${PWD} | awk -F "/" '{print $NF}') if [ ! -d $wgetlogDir ]; then mkdir $wgetlogDir fi # Put current task into runningTask is case of power off. When checkmodifiedLinkFile.sh up, runningTask will be checked if some task interrupted. And interrupted task will be started again by checkmodifiedLinkFile.sh . isexit=$(grep $downloadDir $runningTask) if [ -z "$isexit" ];then echo $downloadDir >> $runningTask fi #Begainning downloading. `$command -b -o "$wgetlog" &` #Check the size of logfile every $interval times. #Continue cheching Until size is same with it in #last check, then wait a $interval long period time, #try again, try again...(try $maxtrytimes totally) #read in wgetlog to find if there is #something not right. #Mail to $mailAddress. trytimesRemain=$maxtrytimes logoldsize=0 sleep 10 lognewsize=$(echo $(ls -l $wgetlog | awk '{print $5}')) while [ ! -z "$lognewsize" ] && [ "$trytimesRemain" -gt 0 ] do # If log's size stays unchanging in $interval*$maxtrytime # find "FINISH" from log. # if [ "$lognewsize" -eq "$logoldsize" ];then message=$(tail -n3 "$wgetlog") level=$(echo $message|grep "FINISH") if [ -z "$level" ];then trytimesRemain=`expr $trytimesRemain - 1` echo "`date "+%Y.%m.%d-%H:%M:%S--"`WARNNING: $downloadDir Download speed 0.0 KB/s. MaxTryTimes=$maxtrytimes. Try(`expr $maxtrytimes - $trytimesRemain`). ">> $log else break fi else trytimesRemain=$maxtrytimes fi ERROR=$(tail -n250 "$wgetlog" | grep "ERROR\|failed") if [ ! -z "$ERROR" ] && [ "$ERROR" != "$lastERROR" ] && [ "$counterMail" -lt 5 ] then echo "`date "+%Y.%m.%d-%H:%M:%S--"`WARNNING: $downloadDir $ERROR. mail to $mailAddress.">> $log echo -e "${PWD}\n$ERROR\n"|mutt -s "Wget Running State : WARNNING in $whereAmI" $mailAddress counterMail=$counterMail+1 lastERROR=$ERROR addtoBoolean=1 fi logoldsize=$lognewsize sleep $interval lognewsize=$(echo $(ls -l $wgetlog | awk '{print $5}')) done if [ ! -z "$level" ] then echo "`date "+%Y.%m.%d-%H:%M:%S--"`FINISHI: $message. mail to $mailAddress.">> $log echo -e "`date '+%Y-%m-%d +%H:%M:%S'`\n${PWD}\n$message\n"|mutt -s "Wget Report : FINISH $whereAmI--RUNNING $(ps -A|grep -c wget)" $mailAddress counterMail=$counterMail+1 else echo "`date "+%Y.%m.%d-%H:%M:%S--"`ERROR: $message. mail to $mailAddress.">> $log echo -e "`date '+%Y-%m-%d +%H:%M:%S'`\n${PWD}\n$message\n"|mutt -s "Wget Report : ERROR in $whereAmI" $mailAddress addtoBoolean=1 counterMail=$counterMail+1 fi if [ "$addtoBoolean" -eq "1" ];then echo "$downloadDir" >> "$failedqueue" fi #Remove the interrupted task from runningTask. sed -i "/$whereAmI/d" "$runningTask" echo "`date "+%Y.%m.%d-%H:%M:%S--"`$downloadDir Monitor ending.">> $log
總結:第一次寫shell腳本,中間基本上每修改一次都會產生不少錯誤。腳本的質量也不好,好在三個腳本的耦合度不算過高,分工還算明確,這也帶來了很多方便。因爲平時工做電腦是教育網,而下數據用的是聯通的PPPoE撥號,因此ssh訪問速度也比較慢,雖然全部工做都簡化爲了維護一個xml文件(好吧,嚴格說,它根本不是xml文件,只是一個帶標籤的文本而已),可是ssh上敲一個字符須要等待三四秒鐘的龜速仍是沒法忍受的,因此下一步想將第一個腳本的工做用java重寫一下,在web上管理xml文件。