前幾天利用python爬取了我愛我家的租房的一些數據,就想着能不能對房租進行一波分析,因而經過書籍和博客等查閱了相關資料,進行了房租的區間分析。不得不說,用python作區間分析比我以前用sql關鍵字統計區間簡單多了,話很少說,上代碼python
# coding=utf-8 import pandas as pd import pymysql import matplotlib.pyplot as plt db = pymysql.connect(host="127.0.0.1", port=3306, user="root", passwd="root", db="woaiwojia", charset='utf8') cursor = db.cursor() df = pd.read_sql("select * from zufang ", db) #如下注釋爲對pandas讀取數據以後的數據處理讀取的嘗試 #前三行 #rows = df[0:3] #price和lxrphone兩列 #cols = df[['price', 'lxrphone']] #aa = pd.DataFrame(df) #前三行和lxrphone和price列 # print(df.ix[0:3,['price','lxrphone']]) #讀取數據的信息 # print(df.info()) #查看錶的描述性信息 # print(df.describe()) #如下爲獲取price列的最大最小值並分組 xse = df['price'] # print(xse.max()) # print(xse.min()) fanwei = list(range(1500, xse.max(), 1500)) fenzu = pd.cut(xse.values, fanwei, right=False) # 分組區間,長度91 # print(fenzu.codes)#標籤 # print(fenzu.categories)#分組區間,長度8 pinshu = fenzu.value_counts() # series,區間-個數 #print(pinshu) # print(pinshu.index) #設置plot的展現格式 pinshu.plot(kind='bar') qujian = pd.cut(xse, fanwei, right=False) df['區間'] = qujian.values df.groupby('區間').median() df.groupby('區間').mean() pinshu_df = pd.DataFrame(pinshu, columns=['頻數']) pinshu_df['頻率f'] = pinshu_df / pinshu_df['頻數'].sum() pinshu_df['頻率%'] = pinshu_df['頻率f'].map(lambda x: '%.2f%%' % (x * 100)) pinshu_df['累計頻率f'] = pinshu_df['頻率f'].cumsum() pinshu_df['累計頻率%'] = pinshu_df['累計頻率f'].map(lambda x: '%.4f%%' % (x * 100)) print(pinshu_df) plt.show()
打印的結果mysql
使用matplotlib.pyplot的show方法展現的數據 sql
參考博客 pandas分區間,算頻率 參考書籍《Python3爬蟲、數據清洗與可視化實戰》.net