Kaggle 是一個數據競賽平臺,2010 年創立,2017 年被谷歌收購。平臺提供大量開放的數據集和免費的計算資源。只須要註冊一個賬號就能夠在線編寫代碼,分析數據。python
數據集首頁 https://www.kaggle.com/bigque...git
目前有 700 多個 Kernels。介紹中說數據持續更新,目前來看更新到 2018 年 9 月。github
比特幣鏈上數據大小超過百 GB。在這裏是經過 Google Big Query API 訪問,而沒有任何數據文件。因此這個數據集只能在線使用,而不能下載,但它們提供了數據抽取的代碼(https://github.com/blockchain...),因此能夠選擇本身在本地建立這部分數據。據文檔介紹,每一個帳戶每個月訪問數據的上限是 5 TB。google
數據一共有 4 張表:blocks、inputs、outputs、transactions。spa
代碼來自這裏,有改動(原代碼因爲庫版本變化,沒法執行),還略去一些次要內容。code
from google.cloud import bigquery import pandas as pd client = bigquery.Client() # Query by Allen Day, GooglCloud Developer Advocate (https://medium.com/@allenday) query = """ #standardSQL SELECT o.day, COUNT(DISTINCT(o.output_key)) AS recipients FROM ( SELECT TIMESTAMP_MILLIS((timestamp - MOD(timestamp, 86400000))) AS day, output.output_pubkey_base58 AS output_key FROM `bigquery-public-data.bitcoin_blockchain.transactions`, UNNEST(outputs) AS output ) AS o GROUP BY day ORDER BY day """ query_job = client.query(query) iterator = query_job.result(timeout=30) rows = list(iterator) # Transform the rows into a nice pandas dataframe transactions = pd.DataFrame(data=[list(x.values()) for x in rows], columns=list(rows[0].keys())) # Look at the first 10 headlines transactions.head(10)
輸出:orm
transactions.tail(10)
輸出:blog
import matplotlib from matplotlib import pyplot as plt %matplotlib inline plt.plot(transactions['day'], transactions['recipients'])
歡迎來個人博客: https://codeplot.top/
個人博客比特幣分類ip