數據分析

時間 2019-11-19

標籤數據分析简体版

原文原文鏈接

數據分析三劍客：Numpy Pandas Matplotlib數組

import numpy as np np.array([1,2,3,4,5])　　　　　　　　　　#建立一維數組 array([1, 2, 3, 4, 5]) np.random.randint(0,100,size=(5,6))　　#建立二維數組 array([[73, 26, 72, 34, 26, 38], [ 7, 10, 56, 19, 89, 22], [90,  5, 58, 65, 68, 0], [ 7, 55,  4, 82, 44, 89], [14, 14, 27, 69, 85, 78]])

二維數組取值：dom

attr=np.random.randint(0,100,size=(5,6))工具

attr[[1,2]]　　#取數組中的1行到2行spa

atr[0:3]　　#從第一行到第三行code

attr[;,2:4]　　#從第3列到第4列對象

關於數組的反轉：blog

好比三維數組的操做：索引

import matplotlib.pyplot as plt img=plt.imread('./mm.jpg') plt.imshow(img[::-1,::-1,::-1])

上下倒置::-1 左右倒置::-1 像素倒置::-1圖片

數組維度間的變形：字符串

如一維變二維

a1=np.array([1,2,3,4,5,6]) a1.reshape(-1,2)        #二維
a2.reshape(-1,2,2)      #三維

二維變三維：

a2=np.random.randint(20,size=(4,5))
a2.reshape(-1,2,2)

數組的級聯操做：就是指將兩個或多個數組拼接起來

前提：維度必須相同，形狀相符，要麼橫向長度相等（axis=1），要麼縱向長度相等(axis=0)。

import numpy as np a2=np.random.randint(20,size=(2,10))　　　　　　#2行10列 a3=np.random.randint(30,size=(3,10))　　　　　　#3行10列 np.concatenate((a2,a3),axis=0)　　　　　　　　　　#列相同，拼接列 axis=0

小案例：拼接九宮格圖片

import numpy as np from matplotlib import pyplot as plt cat=plt.imread('./cat.jpg') c3=np.concatenate((cat,cat,cat),axis=1) #橫向拼接三張
c9=np.concatenate((c3,c3,c3),axis=0)    #縱向拼接三張
plt.imshow(c9)　　#顯示在jupyter中 plt.show() #在pycharm中調用顯示圖片

效果圖以下：

對指定圖片進行裁剪：推薦使用切片

按照「井」形切割 由上到下，由左到右

plt.imshow(cat[50:330,100:400])

就能實現以下效果：

對於使用工具的環境搭建，pycharm畫圖方面沒有jupyter簡單專業。但也能夠配置好了使用pycharm

直接使用anaconda中的Python.exe環境，等待導入pycharm的時間略長。

用於作數據清洗的pandas：

Series建立索引：

import pandas as pd import numpy as np from pandas import Series,DataFrame s1=Series(np.random.randint(1,50,size=(4,)),index=['a','p','e','f'],name='kevin') w=Series(data=[2,3,5,7,11,11,5,6,7]) print(w.unique()) print(s1)

兩個series對象進行加：索引對應的元素會相加，不對應的元素就補空

import pandas as pd import numpy as np from pandas import Series,DataFrame s1=Series([1,2,3,4],index=['a','b','c','d']) s2=Series([4,5,6,7],index=['a','c','f','g']) print(s1) print(s2) print(s1+s2)

即a和c有值，其他值爲空，效果如圖所示：且值會以浮點型計算顯示

DataFrame的使用：

from pandas import Series,DataFrame dic={ "kevin":[66,77,88,99], "lisa":[71,82,93,64],
　　 "jack":[88,77,108,11] } df=DataFrame(data=dic,index=['語文','數學','外語','綜合']) print(df)

執行結果以下：

df.columns=['凱文','麗莎','傑克'] df

列屬性被改變：修改的列屬性個數必定要和列索引數量一致，不然會報錯

索引操做：

找行數據：df.loc['語文']

找列數據：df.麗莎

找行列定位具體數據： df.loc['外語'，‘麗莎’]

切片：

df[a:b] 切行

df[:,a:b]　　切列

loc：取顯示索引

iloc：取隱式索引

index_col :將列做爲行索引

parse_dates :將某一列的數據轉爲時間序列

resample :對數據的從新取樣　　　　前提是源數據索引必須是時間序列

import tushare as ts

import pandas as pd

df=ts.get_k_data(code='002460',start='2015',end='2019-06-06')          #查詢2015到2019全部行情

df.to_csv('./jlgf.csv')        #數據保存至文本

df=pd.read_csv('./jlgf.csv',index_col='date',parse_dates=['date'])       #將日期由字符串改成日期對象

df.drop(labels='Unnamed: 0',axis=1,inplace=True)             #移除掉空白的列元素

last_price=df['open'][-1]                                   #當前上一個交易日

df_mounths=df.resample('M').first()                         #要買股票的次數

df_years=df.resample('Y').last()[:-1]                       #去除最後一年 不會賣出                         

count_money=0                                               #純利潤

hold=0                                                      #持有股票

for year in range(2015,2020):
    df_mounths-=df_mounths.loc[str(year)]['open'].sum()*100              #當年買一百股花費本金
    hold=len(df_mounths.loc[str(year)]['open'])*100                           #當年持有股票數
    if year !=2019:
        count_money+=df_years[str(year)]['open'][0]*hold                      #賣出當年持有所有股票
        hold=0                                                                #持有股票數清零
count_money+=hold*last_price                                                  #最終獲利
print(count_money)

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。