因爲項目計劃書寫做須要,重畫了Qi Zhang, Mohamed Faten Zhani, Raouf Boutaba, Joseph L. Hellerstein,python
Dynamic Heterogeneity-Aware Resource Provisioning in the Cloud. IEEE TRANSACTIONS ON CLOUDapi
COMPUTING, VOL. 2, NO. 1, JANUARY-MARCH 2014.中的TaskEvent分佈統計圖。原圖更跟重畫圖以下:服務器
原圖:google
重畫圖:url
數據來源:spa
介紹:code
https://code.google.com/p/googleclusterdata/wiki/ClusterData2011_1blog
全部文件列表及校驗和:ip
https://commondatastorage.googleapis.com/clusterdata-2011-1/SHA256SUMget
格式說明:
https://commondatastorage.googleapis.com/clusterdata-2011-1/schema.csv
數據文件示例鏈接:
https://commondatastorage.googleapis.com/clusterdata-2011-1/job_events/part-00017-of-00500.csv.gz
重畫的步驟以下。
1 因爲數據存放在https://commondatastorage.googleapis.com/clusterdata-2011-1/
須要FQ才能訪問,故全部數據處理都是在牆外的位於東亞的azure服務器完成的。故首先建一個雲服務器,並完成環境配置。
(主要是裝個python)
2 下載數據文件(數據總量較大,1.51G)
import urllib2 url = 'https://commondatastorage.googleapis.com/clusterdata-2011-1/' f = open('C:\\SHA256SUM') l = f.readlines() f.close() for i in l: if i.count('task_events')>0: fileAddr = i.split()[1][1:] fileName = fileAddr.split('/')[1] print 'downloading', fileName data = urllib2.urlopen(url+fileAddr).read() print 'saving', fileName fileDown = open('C:\\task_events\\'+fileName, 'wb') fileDown.write(data) fileDown.close()
注意:
(1) 執行腳本前要將全部文件列表及校驗和文件SHA256SUM
(https://commondatastorage.googleapis.com/clusterdata-2011-1/SHA256SUM)
放到C盤根目錄下,它負責生成其餘文件的下載連接。
(2) 這裏只下載了task_events,若是要分析其餘數據的話,參考前文提到的格式說明及介紹修改要下載的文件部分。
3 生成要處理的文件名
f = open('C:\\SHA256SUM') l = f.readlines() f.close() fName = open('C:\\task_events_file_name.txt', 'w') for i in l: if i.count('task_events')>0: fileAddr = i.split()[1][1:] fileName = fileAddr.split('/')[1] fName.write(fileName+'\r\n') fName.close()
4 統計
import gzip fName = open('C:\\task_events_file_name.txt') fileNames = fName.readlines() fName.close() cntMapGratis = {} cntMapProduction = {} cntMapOthers = {} #fileNames = ['part-00000-of-00500.csv.gz'] for l in fileNames: print 'now at: '+ l.strip() f = gzip.open('C:\\task_events\\'+l.strip()) for log in f.readlines(): log = log.split(',') if log[9]!='' and log[10]!='': index = log[9]+' '+log[10] priority = int(log[8]) if priority <= 1: #Gratis Task cntMap = cntMapGratis elif priority >= 9 and priority <= 11: cntMap = cntMapProduction else: cntMap = cntMapOthers if not index in cntMap: cntMap[index]=1 else: cntMap[index]+=1 f.close() fReasult = open('C:\\CPUandMEMuseGratis.txt', 'w') for i in cntMapGratis: fReasult.write(i+' '+str(cntMapGratis[i])+"\r\n") fReasult.close() fReasult = open('C:\\CPUandMEMuseProduction.txt', 'w') for i in cntMapProduction: fReasult.write(i+' '+str(cntMapProduction[i])+"\r\n") fReasult.close() fReasult = open('C:\\CPUandMEMuseOthers.txt', 'w') for i in cntMapOthers: fReasult.write(i+' '+str(cntMapOthers[i])+"\r\n") fReasult.close()
5 使用matlab繪製
clear all
close all
%load('D:\\CPUandMEMuseGratis.txt')
%load('D:\\CPUandMEMuseProduction.txt')
load('D:\\CPUandMEMuseOther.txt')
%CPUandMEMuse = CPUandMEMuseGratis;
%CPUandMEMuse = CPUandMEMuseProduction;
CPUandMEMuse = CPUandMEMuseOther;
x=CPUandMEMuse(:,1);
y= CPUandMEMuse(:,2);
s = CPUandMEMuse(:,3)/10000000;
s = log(s);
%max_r = 0.002; %for production and gratis
max_r = 0.001; %for other only
s = s/max(s)*max_r;
for i=1:size(x)
if x(i) == 0 || y(i) == 0
s(i)=0;
end
end
t= 0:pi/10:2*pi;
figure();
grid on
for i=1:size(x)
if x(i)~=0 && y(i)~=0
pb=patch((s(i)*sin(t)*0.5+ x(i)),(s(i)*cos(t)+y(i)),'b','edgecolor','k');
alpha(pb,.3);
end
end
axis([0 0.5 0 1]);
xlabel('CPU size');
ylabel('Memory size');
set(gca,'FontSize',25);
set(get(gca,'XLabel'),'FontSize',30);
set(get(gca,'YLabel'),'FontSize',30);
%saveas(gcf,'D:\\CPUandMEMuseGratis.jpg')
%saveas(gcf,'D:\\CPUandMEMuseProduction.jpg')
saveas(gcf,'D:\\CPUandMEMDemandOther.jpg')
附註:
1. Task經過優先級劃分類別的
0-1 是Gratis
9-11 是Production
其餘(2-8) 是Other
2. 畫圖的時候,圓的半徑表示數量的對數(log)