day 18 numpy 模塊、matplotlib模塊、pandas模塊

時間 2019-11-17

標籤 day numpy 模塊 matplotlib pandas 简体版

原文原文鏈接

numpy 模塊 impotr numpy as np

做用：

1.區別於list列表，提供了數組操做、數組運算、以及統計分佈和簡單的數學模型python

2.計算速度快，甚至要因爲python內置的簡單運算，使得其成爲pandas、sklearn等模塊的依賴包。高級的框架如TensorFlow、PyTorch等，其數組操做也和numpy很是類似。sql

建立numpy 數組json

一維數組：數組

arr = np.array([1, 2, 4])
print(type(arr), arr)
<class 'numpy.ndarray'> [1 2 4]

二維數組：框架

arr = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
print(arr)
[[1 2 3]
 [4 5 6]]

三維數組：dom

arr3 = np.array([
    [[1, 2, 3],
     [4, 5, 6]],
    [[1, 2, 3],
     [4, 5, 6]],
])
print(arr3)

[[[1 2 3] [4 5 6]]函數

[[1 2 3] [4 5 6]]]字體

數組的轉置：行與列互換spa

arr = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

# T    數組的轉置（對高維數組而言） --> 行列互換，轉置
print(arr, '\n', arr.T)

[[1 2 3]
 [4 5 6]]
 [[1 4]
 [2 5]
 [3 6]]

dtype 數組元素的數據類型3d

# dtype    數組元素的數據類型,numpy數組是屬於python解釋器的；int32/float64屬於numpy的  32  64  表示計算機顯示的最高長度
print(arr.dtype)   # int32

size 數組元素的個數

print(arr.size)  # 6

ndim 數組的維數

print(arr.ndim)    # 2
print(arr3.ndim)   #  3

shape 數組的維度大小（以元祖的形式）幾行，幾列

print (arr.shape) #  (2,3)
print(arr.shape[0]) # 拿到特定集合 表示有多少行  2
print(arr.shape[1])  #表示有多少列             3

astype 類型轉換

arr = arr.astype(np.float64)
print(arr)

[[1. 2. 3.]
 [4. 5. 6.]]

切片numpy數組

arr = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

print(arr[:, :])  # 行，列

print(arr[0, 0]) #第一行第一列  1

print(arr[0, :])  #第一行全部元素  [1, 2, 3]

print(arr[:, -2:])  # 最後兩列 [[2 3] 
                              [5 6]]

邏輯取值

print(arr[arr > 2])  取出全部大於4的元素  # [3,4,5,6]

賦值

arr = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

arr[0, 0] = 0  第一行第一個邊爲0
print(arr)

arr[0, :] = 0  第一行都爲0
print(arr)

arr[:, :] = 0  全部的都爲0
print(arr)

數組的合併

# 數組的合併

arr1 = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

arr2 = np.array([
    [7, 8, 9],
    ['a', 'b', 'c']
])

print(np.hstack((arr1, arr2)))  # 只能放元組  水平合併 [['1' '2' '3' '7' '8' '9']
  ['4' '5' '6' 'a' 'b' 'c']]   


print(np.vstack((arr1, arr2)))  #  垂直合併
　　　　　　　　　　　　　　　　　　　　　　[['1' '2' '3']
　　　　　　　　　　　　　　　　　　　　　　['4' '5' '6']
　　　　　　　　　　　　　　　　　　　　　　['7' '8' '9']
　　　　　　　　　　　　　　　　　　　　　　['a' 'b' 'c']]
print(np.concatenate((arr1, arr2), axis=1))  # 默認以列合併 # 0表示列，1表示行

經過函數建立numpy函數

print(np.ones((2, 3))) 建立2行3列的元素都爲1的數組
[[1. 1. 1.]
 [1. 1. 1.]]


print(np.zeros((2, 3))) 建立2行3列的元素都爲0的數組
[[0. 0. 0.]
 [0. 0. 0.]]

print(np.eye(3, 3))    建立3行3列的單位矩陣
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

print(np.linspace(1, 100, 10)) 開始，中止，多少個
[  1.  12.  23.  34.  45.  56.  67.  78.  89. 100.]

print(np.arange(2, 10))
 [2 3 4 5 6 7 8 9] # 只能創一維的

arr1 = np.zeros((1, 12))
print(arr1.reshape((3, 4)))  # 把arr1重構形狀（元素個數不變）

numpy 數組運算

# +-*'
arr1 = np.ones((3, 4)) * 4
print(arr1)

[[4. 4. 4. 4.] [4. 4. 4. 4.] [4. 4. 4. 4.]]

# numpy數組運算函數

print(np.sin(arr1)) #對數組裏面的元素正弦
[[-0.7568025 -0.7568025 -0.7568025 -0.7568025]
 [-0.7568025 -0.7568025 -0.7568025 -0.7568025]
 [-0.7568025 -0.7568025 -0.7568025 -0.7568025]]

# 矩陣運算--點乘

arr1 = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

arr2 = np.array([
    [1, 2],
    [4, 5],
    [6, 7]
])
# 2* 3 3*2
print(np.dot(arr1, arr2))
[[27 33]
 [60 75]]

# 求逆
arr = np.array([[1, 2, 3], [4, 5, 6], [9, 8, 9]])
print(np.linalg.inv(arr))
[[ 0.5        -1.          0.5       ]
 [-3.          3.         -1.        ]
 [ 2.16666667 -1.66666667  0.5       ]]

# numpy數組數學和統計方法
print(np.sum(arr[0, :])) #把第1行的元素求和
6

# numpy.random生成隨機數(******)
print(np.random.rand(3, 4))  
隨機生成3行4列的數，符合正態分佈0-1

print(np.random.random((3, 4)))
隨機數3行4列，徹底隨機0-1

# 針對某一個範圍
print(np.random.randint(1, 100, (3, 4)))
[[40 81 62 79]
 [26 59 94 90]
 [16 82 84 43]]

# np.random.seed(1)控制下面的數據，永久不隨機
print(np.random.random((3, 4)))

s = np.random.RandomState(1) 等同於上面的seed(),控制不隨機
print(s.random((3, 4)))

arr = np.array([[1, 2, 3], [4, 5, 6], [9, 8, 9]])
np.random.shuffle(arr)洗牌，隨機
print(arr)
[[9 8 9]
 [1 2 3]
 [4 5 6]]

# 針對一維
print(np.random.choice([1, 2, 3], 1))隨機取出一個
3

總結：

屬性	解釋
T	數組的轉置（對高維數組而言）
dtype	數組元素的數據類型
size	數組元素的個數
ndim	數組的維數
shape	數組的維度大小（以元組形式）
astype	類型轉換

方法	詳解
array()	將列表轉換爲數組，可選擇顯式指定dtype
arange()	range的numpy版，支持浮點數
linspace()	相似arange()，第三個參數爲數組長度
zeros()	根據指定形狀和dtype建立全0數組
ones()	根據指定形狀和dtype建立全1數組
eye()	建立單位矩陣
empty()	建立一個元素全隨機的數組
reshape()	重塑形狀

運算符	說明
+	兩個numpy數組對應元素相加
-	兩個numpy數組對應元素相減
*	兩個numpy數組對應元素相乘
/	兩個numpy數組對應元素相除，若是都是整數則取商
%	兩個numpy數組對應元素相除後取餘數
**n	單個numpy數組每一個元素都取n次方，如**2：每一個元素都取平方

方法	詳解
sum	求和
cumsum	累加求和
mean	求平均數
std	求標準差
var	求方差
min	求最小值
max	求最大值
argmin	求最小值索引
argmax	求最大值索引
sort	排序

matpottlib模塊畫圖

條形圖：

from matplotlib import pyplot as plt

from matplotlib.font_manager import FontProperties

font = FontProperties(fname='C:\Windows\Fonts\simsun.ttc')

plt.style.use('ggplot')  # 設置背景

clas = ['3班', '4班', '5班', '6班']
students = [50, 55, 45, 60]
clas_index = range(len(clas))

# [0,1,2,3] [50,55,45,60]
plt.bar(clas_index,students,color='darkblue')

plt.xlabel('學生',fontproperties=font)
plt.ylabel('學生人數',fontproperties=font)
plt.title('班級-學生人數',fontproperties=font,fontsize=20,fontweight=25)
plt.xticks(clas_index,clas,fontproperties=font)

plt.show()

直方圖：

import numpy as np

from matplotlib import pyplot as plt # 約定俗成

from matplotlib.font_manager import FontProperties # 修改字體
font = FontProperties(fname='C:\Windows\Fonts\simsun.ttc')

import numpy as np
from matplotlib import pyplot as plt  # 約定俗成
from matplotlib.font_manager import FontProperties  # 修改字體

font = FontProperties(fname='C:\Windows\Fonts\simsun.ttc')

plt.style.use('ggplot')

x1 = np.random.randn(10000)

x2 = np.random.randn(10000)

fig = plt.figure()  # 生成一張畫布
ax1 = fig.add_subplot(1, 2, 1)  # 1行2列取第一個
ax2 = fig.add_subplot(1, 2, 2)

ax1.hist(x1, bins=50,color='darkblue')
ax2.hist(x2, bins=50,color='y')

fig.suptitle('兩個正太分佈',fontproperties=font,fontsize=20)
ax1.set_title('x1的正太分佈',fontproperties=font)  # 加子標題
ax2.set_title('x2的正太分佈',fontproperties=font)
plt.show()

折線圖：

import numpy as np

from matplotlib import pyplot as plt # 約定俗成

from matplotlib.font_manager import FontProperties # 修改字體
font = FontProperties(fname='C:\Windows\Fonts\simsun.ttc')

import numpy as np
from matplotlib import pyplot as plt  # 約定俗成
from matplotlib.font_manager import FontProperties  # 修改字體

font = FontProperties(fname='C:\Windows\Fonts\simsun.ttc')

plt.style.use('ggplot')

np.random.seed(10)
x1 = np.random.randn(40).cumsum()
x2 = np.random.randn(40).cumsum()
x3 = np.random.randn(40).cumsum()
x4 = np.random.randn(40).cumsum()

plt.plot(x1, c='r', linestyle='-', marker='o', label='紅圓線')
plt.plot(x2, color='y', linestyle='--', marker='*', label='黃虛線')
plt.plot(x3, color='b', linestyle='-.', marker='s', label='藍方線')
plt.plot(x4, color='black', linestyle=':', marker='s', label='黑方線')
plt.legend(loc='best', prop=font)  # 顯示label
plt.show()

散點圖+直線圖：

import numpy as np

from matplotlib import pyplot as plt # 約定俗成

from matplotlib.font_manager import FontProperties # 修改字體
font = FontProperties(fname='C:\Windows\Fonts\simsun.ttc')

import numpy as np
from matplotlib import pyplot as plt  # 約定俗成
from matplotlib.font_manager import FontProperties  # 修改字體

font = FontProperties(fname='C:\Windows\Fonts\simsun.ttc')

plt.style.use('ggplot')

fig = plt.figure()
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)

x = np.arange(20)
y = x ** 2

x2 = np.arange(20)
y2 = x2

ax1.scatter(x, y, c='r', label='紅')
ax1.scatter(x2, y2, c='b', label='藍')

ax2.plot(x, y)
ax2.plot(x2, y2)

fig.suptitle('兩張圖', fontproperties=font, fontsize=15)
ax1.set_title('散點圖', fontproperties=font)
ax2.set_title('折線圖', fontproperties=font)
ax1.legend(prop=font)
plt.show()

pandas模塊：import panda as pd

操做excel/json/sql/ini/csv（配置文件）/

import pandas as pd

df = pd.read_csv('test.csv',header=None)

df.to_excel('test.xls')

pandas 操做excle文件，從excel 中讀取DataFrame數據類型

import numpy as np
import pandas as pd

np.random.seed(10)

index = pd.date_range('2019-01-01', periods=6, freq='M')
print(index)
columns = ['c1', 'c2', 'c3', 'c4']  
print(columns)
val = np.random.randn(6, 4)  #6行4列
print(val)

df = pd.DataFrame(index=index, columns=columns, data=val)
print(df)

# 保存文件，讀出成文件
df.to_excel('date_c.xlsx')

# 讀出文件
df = pd.read_excel('date_c.xlsx', index_col=[0])  #索引定爲第一列
print(df)

print(df.index)  # 打印索引
print(df.columns) # 打印專欄
print(df.values)  #打印 值

print(df[['c1', 'c2']]) 按照c1 c2列取值

# 按照index取值
# print(df['2019-01-31'])錯誤方法
print(df.loc['2019-01-31'])
print(df.loc['2019-01-31':'2019-05-31'])

# 按照values取值
print(df)
print(df.iloc[0, 0])  # 取數據裏面的第一行第一列個

df.iloc[0, :] = 0  # 修改值
print(df)