【轉載】 谷歌集羣數據分析 clusterdata-2011-2

原文地址:git

https://www.twblogs.net/a/5c2dc304bd9eee35b21c418b/zh-cngithub

 

 

 

 

 

 ------------------------------------------------------------------------------------------------app

 

 

 

 

 

本篇主要是解析數據集clusterdata-2011-2dom

by ——https://github.com/google/cluster-dataide

dataset的說明文檔:https://drive.google.com/file/d/0B5g07T_gRDg9Z0lsSTEtTWtpOW8/view測試

數據集描述:The clusterdata-2011-2 trace represents 29 day's worth of cell information from May 2011, on a cluster of about 12.5k machines.this

將csv文件導入到MySQL中的各表信息以下:(表結構在末尾)google

 

 

 

 

job event表:.net

row1672923   286.86 MB (300,795,688)翻譯

index:jobid,btree 35.56 MB (37,285,888)

 

machine events表:

row:37780    2.99 MB (3,138,540)

 

machine attribute:

row:10748566    1.09 GB (1,175,642,124)

 

task constrains:

row:28485619    2.95 GB (3,163,127,240)

 

task usage:

row:1232799308    182.55 GB (196,015,089,972)

index:69.61 GB (74,743,799,808)

machineid(btree) jobid(btree)

 

task event:(導入數據有點問題,正在處理)

row:144648292 12.76 GB (13,700,652,148)

index: 6.90 GB (7,414,187,008)

machineid,jobid,username

 

 

 

 

 

 

explain part1:字段

explain part2:表格

 

part1.字段

一個job包含多個task,每個task表示一個Linux項目,可能有多個進程。

timestamp:以微秒爲單位,在日誌開始前600s開始計時(如20s開始的時間爲620s)

                   0時刻的記錄表明在日誌記錄以前發生的事件,由於做業可能在日誌記錄以前被提交。

                   2的63次方-1的時間爲日誌記錄結束以後的事件。

job和machine的ID不會被複用,能夠看成惟一表識。(machineID重複多是因爲一個機器被移除集羣后又從新加了進來,jobID重複多是一個job被中止而後配置從新啓動)

user和job的name被hash了,爲了保密以及測試時相同。

machine event type:0.add 1.remove 2.update

job和task的event type:0.submit 1.schedule 2.evict 3.fail 4.kill 5.finish 6.lost 7.update_pending 8.update_running

priority:0爲最低的

infrastructure (11)—this is the highest (most entitled to get resources) priority in the trace and accounts for most of the recorded disk I/O, so we speculate it includes some storage services;
monitoring (10)
normal production (9)—this is the lowest (and most occupied) of the priorities labeled ‘production’. The trace providers indicate that jobs at this priority and higher which are latency-sensitive should not be 「evicted due to over-allocation of machine resources」 .
other (2-8) — we speculate that these priorities are dominated by batch jobs; 
gratis (free) (0-1) — the trace providers indicate that resources used by tasks at these priorities are generally not charged.
 

 

missing info:正常數據爲NULL,丟失數據爲0-2.

0.SNAPSHOT_BUT_NO_TRANSITION:we did not find a record representing the given event, but a later snapshot of the job or task state indicated that the transition must have occurred. The timestamp of the synthesized event is the timestamp of the snapshot.

1.NO_SNAPSHOT_OR_TRANSITION : we did not find a record representing the given termination event, but the job or task disappeared from later snapshots of cluster states, so it must have been terminated. The timestamp of the synthesized
event is a pessimistic upper bound on its actual termination time assuming it could have legitimately been missing from one snapshot.
2.EXISTS_BUT_NO_CREATION : we did not find a record representing the creation of the given task or job. In this case, we may be missing metadata (job name, resource requests, etc.) about the job or task and we may have placed SCHEDULE or SUBMIT events latter than they actually are.

 
 
 

scheduleclass,該類粗略地表示做業的延遲敏感程度。調度類型由一個數字表示,3表示一個對延遲比較敏感的做業,0表示一個非生產任務(例如:非關鍵業務分析等)

 comparison operator:??

怎麼比的不明白。。。

小於(2),大於(3):將機器屬性表示爲整數(或0,若是屬性不存在),而後將其與提供的屬性值進行比較。這些比較嚴格小於和嚴格大於;等於(0),不等於(1):機器屬性表示爲字符串(或空字符串若是它不存在的話),而後比較所提供的屬性值。(翻譯文檔)

 

 

 

part2:

table:

 

1.Machine events
Each machine is described by one or more records in the machine event table. The majority of records describe machines that existed at the start of the trace.
1. timestamp
2. machine ID
3. event type
4. platform ID
5. capacity: CPU
6. capacity: memory

 

2.job event&task event

The two event tables describe jobs/tasks and their lifecycles. The constraints table describes task placement constraints that restrict the machines onto which tasks can schedule.

The simplest case is shown by the top path in the diagram above: a job is SUBMITted and gets put into a pending queue; soon afterwards, it is SCHEDULEd onto a machine and starts running; some time later it FINISHes successfully.

先提交(0),而後進隊(1),以後完成(4)

 

 

 

 

 

 

3.task usage

這篇博客詳細解釋了http://www.javashuo.com/article/p-zuqsbjuq-nh.html

 

 

 

 

 

 

生成的中間表有

 

 

 

 

 

分別是各平臺內包含的機器id,以及全部中等優先級的task(priority爲2-8),以及全部成功進入隊列的task(event type爲1)的表,並創建相應的索引。(使用中間表後,檢索時間由數小時級別降低到1min之內)

 

 

 

 

----------------------------------------------------------------------------------

相關文章
相關標籤/搜索