python及pandas,numpy等知識點技巧點學習筆記

時間 2019-11-10

標籤 python pandas numpy 知識技巧學習筆記欄目 Python 简体版

原文原文鏈接

python和java,.net,php web平臺交互最好使用web通訊方式，不要使用Jypython,IronPython,這樣的好處是可以保持程序模塊化，解耦性好javascript

python容許使用'''...'''方式來表示多行代碼:

>>> print(r'''Hello,
... Lisa!''')
Hello,
Lisa!
>>>

>>> print('''line1
... line2
... line3''')
line1
line2
line3

也可使用r' xxx '表示xxx內部不作任何轉義操做，對於原生輸出內容有益

print(r'\\\t\\')
# 輸出 \\\t\\

python可以直接處理的數據類型：

整數，浮點數，字符串，布爾值(True,False),php

還有list(相似數組），dict（相似js object literal）html

常量: PIjava

兩種除法：python

/ : 自動使用浮點數，好比10/3=3.33333 9/3=3.0mysql

// : 取整 10//3= 3web

%: 10%3=1算法

注意：sql

python支持多種數據類型，而在計算機內部，能夠把任何數據都當作一個"對象「，而變量就是在程序中用來指向這些數據對象的，對變量賦值實際上就是把數據和變量給關聯起來」shell

python的整數沒有大小的限制

python字符串編碼經常使用的函數：

ord(‘x’)返回x字符對應的unicode編碼，chr(‘hexcode’)則返回unicode編碼對應的祖父

>>> ord('A')
65
>>> ord('中')
20013
>>> chr(66)
'B'
>>> chr(25991)
'文'

因爲python的字符串類型是str,在內存中以unicode表示，一個字符都會對應着若干個字節，可是若是要在網絡上傳輸，或者保存到磁盤上，則須要把str變爲以字節爲單位的bytes類型。

python對bytes類型的數據用帶b前綴的單引號或者雙引號表示：

>>> 'ABC'.encode('ascii')
b'ABC'
>>> '中文'.encode('utf-8')
b'\xe4\xb8\xad\xe6\x96\x87'

反過來，若是從網絡或者磁盤上讀取了utf-8 byte字節流，那麼必須作decode操做成爲unicode後才能在代碼中使用，須要使用decode方法:

>>> b'ABC'.decode('ascii')
'ABC'
>>> b'\xe4\xb8\xad\xe6\x96\x87'.decode('utf-8')
'中文'

>>> len('abc')
3
>>> len('中')
1
>>> len('中文'.encode('utf-8'))
6

Python解釋器讀取源代碼時，爲了讓它按UTF-8編碼讀取，咱們一般在文件開頭寫上這兩行：
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

全部python中須要顯示的字符串，應該以 u"this is unicode字符串"的方式來定義使用字符串

字符串的格式化輸出：

>>> 'Hello, %s' % 'world'
'Hello, world'
>>> 'Hi, %s, you have $%d.' % ('Michael', 1000000)
'Hi, Michael, you have $1000000.'

list類型數據

list相似於js的array,是一種有序的集合，能夠隨時添加和刪除對應的元素

>>> classmates = ['Michael', 'Bob', 'Tracy']
>>> classmates
['Michael', 'Bob', 'Tracy']
>>> len(classmates)
3
>>> classmates[0]
'Michael'
>>> classmates[1]
'Bob'
>>> classmates[2]
'Tracy'
>>> classmates[3]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> classmates[-1]
'Tracy'
>>> classmates[-2]
'Bob'
>>> classmates[-3]
'Michael'
>>> classmates[-4]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range

list還有如下經常使用的操做函數: append,insert,pop

list列表生成式

L = ['Hello', 'World', 18, 'Apple', None]
print([s.lower() if isinstance(s,str) else s for s in  L])
['hello', 'world', 18, 'apple', None]

generator生成式

在科學計算中，若是range爲百萬，咱們沒有必要所有先在內存中以list形式生成好，只需在用到的時候再生成，這就是generator,generator自己保存的是算法，generator自己也是iteratable可遞歸訪問的（用在for循環中）

>>> L = [x * x for x in range(10)]
>>> L
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> g = (x * x for x in range(10))
>>> g
<generator object <genexpr> at 0x1022ef630>
>>> next(g)
0
>>> next(g)
1
>>> next(g)
4
>>> next(g)
9
>>> next(g)
16
>>> g = (x * x for x in range(10))
>>> for n in g:
... print(n)
...
0
1
4
9

若是是複雜的generator算法邏輯，則能夠經過相似函數來定義。

相對比較複雜的generator

gougu = {z: (x,y) for z in [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26] for y in range(1, z) for x in range(1, y) if x*x + y*y == z*z}

gougu
Out[17]: 
{5: (3, 4),
 10: (6, 8),
 13: (5, 12),
 15: (9, 12),
 17: (8, 15),
 20: (12, 16),
 25: (7, 24),
 26: (10, 24)}
gougu = [[x, y, z] for z in [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26] for y in range(1, z) for x in range(1, y) if x*x + y*y == z*z]
gougu
Out[19]: 
[[3, 4, 5],
 [6, 8, 10],
 [5, 12, 13],
 [9, 12, 15],

pyt = ((x, y, z) for z in [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26] for y in range(1, z) for x in range(1, y) if x*x + y*y == z*z)
#這裏pyt就是一個generator，注意最外面的括號!隨後可使用for來調用生成式
print([m for m in pyt])
[(3, 4, 5), (6, 8, 10), (5, 12, 13), (9, 12, 15), (8, 15, 17), (12, 16, 20), (15, 20, 25), (7, 24, 25), (10, 24, 26)]

import jieba
documents = [u'我來到北京清華大學',
             u'假如當前的單詞表有10個不一樣的單詞',
             u'我是中華人民共和國的公民，來自上海，老家是湖北襄陽']

documents_after = []
documents_after = [[w for w in jieba.cut(s)] for s in documents]
documents_after2 = [' '.join(s) for s in documents_after]
print(documents_after)
print(documents_after2)
[['我', '來到', '北京', '清華大學'], ['假如', '當前', '的', '單詞表', '有', '10', '個', '不一樣', '的', '單詞'], ['我', '是', '中華人民共和國', '的', '公民', '，', '來自', '上海', '，', '老家', '是', '湖北', '襄陽']]
['我 來到 北京 清華大學', '假如 當前 的 單詞表 有 10 個 不一樣 的 單詞', '我 是 中華人民共和國 的 公民 ， 來自 上海 ， 老家 是 湖北 襄陽']

generator(yield)函數:

def fib(max):
    n,a,b = 0,0,1
    while n < max:
        yield b
        a,b = b,a+b
        n = n+1
    return  'done'
f = fib(6)
for n in fib(6):
    print(n)

1
1
2
3
5
8

Generator in-depth

generator是一個產生一系列結果的一個函數(注意不是隻產生一個value的函數哦！)

def countdown(n):
    print("counting down from ",n)
    while n > 0:
        yield n
        n -=1
x = countdown(10)
print(x)
# 注意並未打印出 counting down from 10的信息哦 <generator object countdown at 0x0000026385694468>

print(x.__next__())

# counting down from 10
# 10
print(x.__next__())
#Out[17]:
#9

generator和普通函數的行爲是徹底不一樣的。調用一個generator functionjiang chuangjian yige generator object.可是注意這時並不會調用函數自己！！

當generator return時，iteration就將stop.

當調用__next__()時yield一個value出來，可是並不會繼續往下執行，function掛起pending,直到下一次next()調用時才往下執行，可是卻記錄着相應的狀態.

generator雖然行爲和iterator很是相似，可是也有一點差異：generator是一個one-time operation

generator還有一個無與倫比的優勢：因爲generator並不會一次性把全部序列加載到內存處理後返回，而是一輪一輪地加載一輪一輪地處理並返回，所以再大的文件，generator也能夠處理！

generator expression

a = [1,2,3,4]
b = (2*x for x in a)
b
Out[19]: 
<generator object <genexpr> at 0x0000023EDA2C6CA8>
for i in b:
    print(i)
2
4
6
8

generator表達式語法:

(expression for i in s if condition)
# 等價於
for i in s:
    if condition:
        yield expression

注意：若是generator expression僅僅用於做爲惟一的函數形參時，能夠省略()

a = [1,2,3,4]
sum(x*x for x in a)
Out[21]: 
30

迭代器iterator

咱們知道能夠用於for循環中不斷迭代的數據有：list,tuple,dict,set,str等集合類數據類型，或者是generator（包括帶yield的generator function)。全部這些類型的數據咱們都稱之爲可迭代的數據類型(iterable)，可使用isinstance()來具體判斷：

>>> from collections import Iterable
>>> isinstance([], Iterable)
True
>>> isinstance({}, Iterable)
True
>>> isinstance('abc', Iterable)
True
>>> isinstance((x for x in range(10)), Iterable)
True
>>> isinstance(100, Iterable)
False

而generator不只能夠用於for循環，還能夠被next()函數所調用，而且返回下一個值，直到拋出StopIteration異常。

全部能夠被next()函數調用並不斷返回下一個值的對象成爲迭代器Iterator

一樣可使用isinstance()來判斷是否Iterator對象：

>>> from collections import Iterator
>>> isinstance((x for x in range(10)), Iterator)
True
>>> isinstance([], Iterator)
False
>>> isinstance({}, Iterator)
False
>>> isinstance('abc', Iterator)
False

從上面能夠看到，雖然list,dict,set,str是Iterable,可是卻不是Iterator，而generator是Iterator

可是咱們能夠經過iter()函數將dist,list等iterable對象轉變爲iterator,好比：

>>> isinstance(iter([]), Iterator)
True
>>> isinstance(iter('abc'), Iterator)
True

iterable小結

凡是可做用於for循環的對象都是Iterable類型；
凡是可做用於next()函數的對象都是Iterator類型，它們表示一個惰性計算的序列；
集合數據類型如list、dict、str等是Iterable但不是Iterator，不過能夠經過iter()函數得到一個Iterator對象。
Python的for循環本質上就是經過不斷調用next()函數實現的，例如：

for x in [1, 2, 3, 4, 5]:
    pass
#徹底等價於:
# 首先得到Iterator對象:
it = iter([1, 2, 3, 4, 5])
# 循環:
while True:
    try:
        # 得到下一個值:
        x = next(it)
    except StopIteration:
        # 遇到StopIteration就退出循環
        break

tuple:

tuple是特殊的list，用()來定義，他一旦定義就不能變動

>>> classmates = ('Michael', 'Bob', 'Tracy')

只有一個元素的tuple必須用,分開以避免歧義，不然會被認爲是一個元素自己，而非只含一個元素的tuple,

>>> t = (1,)
>>> t
(1,)

python切片slice

https://stackoverflow.com/questions/509211/understanding-pythons-slice-notation

a[start:end] # items start through end-1
a[start:]    # items start through the rest of the array
a[:end]      # items from the beginning through end-1
a[:]         # a copy of the whole array
a[start:end:step] # start through not past end, by step
a[-1]    # last item in the array
a[-2:]   # last two items in the array
a[:-2]   # everything except the last two items
a[::-1]    # all items in the array, reversed
a[1::-1]   # the first two items, reversed
a[:-3:-1]  # the last two items, reversed
a[-3::-1]  # everything except the last two items, reversed

numpy ndarray indexing/slice

https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

ndarray可使用標準的python $x[obj]$方式來訪問和切片，這裏$x$是數組自己，而$obj$是相應的選擇表達式。ndarray支持3中不一樣的index方式:field access, basic slicing, advanced indexing,具體使用哪種取決於$obj$自己。

注意:

$x[(exp1, exp2, ..., expN)] 等價於 x[exp1, exp2, ..., expN]$

basic slicing and indexing

ndarray的basic slicing將python僅能針對一維數組的基礎index和slicing概念拓展到N維。當前面的$x[obj]$ slice形式中的obj爲一個slice對象($[start:stop:step]$格式),或者一個整數，或者$(slice obj,int)$時,這就是basic slicing。basic slicing的標準規則在每一個緯度上分別應用。

全部basic slicing產生的數組其實是原始數組的view，數據自己並不會複製。

如下是抽象出來的基礎順序切片規則

$i:j:k$,$i = start:end:step$,其中，若是$i,j$爲負數，則能夠理解爲$n+i,n+j$，n是相應維度上元素的個數。若是$k<0$，則表示走向到更小的indices.

>>> x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> x[1:7:2]
array([1, 3, 5])
>>> x[-2:10]
array([8, 9])
>>> x[-3:3:-1]
array([7, 6, 5, 4])
>>> x[5:]
array([5, 6, 7, 8, 9])
>>> x = np.array([[[1],[2],[3]], [[4],[5],[6]]])
>>> x.shape
(2, 3, 1)
>>> x[1:2]
array([[[4],
        [5],
        [6]]])
>>> x[...,0]
array([[1, 2, 3],
       [4, 5, 6]])
>>> x[:,np.newaxis,:,:].shape
(2, 1, 3, 1)

advanced indexing

若是selction obj不是一個sequence obj的tuple,而是一個值爲int或者bool的ndarray，或者是至少包含一個start:end:step或int/bool性ndarray的tuple，則就會應用advanced indexing.有兩種模式:integer和boolean

高級index總會返回數據的一份copy(基礎slicing只返回一個view,而未作copy!)

注意:

$x[(1,2,3),]$: 高級slicing
$x[(1,2,3)] = x[1,2,3]$: basic slicing

advanced integer array indexing

>>> x = array([[ 0,  1,  2],
...            [ 3,  4,  5],
...            [ 6,  7,  8],
...            [ 9, 10, 11]])
>>> rows = np.array([[0, 0],
...                  [3, 3]], dtype=np.intp)
>>> columns = np.array([[0, 2],
...                     [0, 2]], dtype=np.intp)
>>> x[rows, columns]
array([[ 0,  2],
       [ 9, 11]])

>>> x = np.array([[1, 2], [3, 4], [5, 6]]) >>> x[[0, 1, 2], [0, 1, 0]] array([1, 4, 5])

Boolean array indexing

若是obj是一個boolean值的數組，則使用該slicing策略。

>>> x = np.array([[1., 2.], [np.nan, 3.], [np.nan, np.nan]])
>>> x[~np.isnan(x)]
array([ 1.,  2.,  3.])
>>> x = np.array([1., -1., -2., 3])
>>> x[x < 0] += 20
>>> x
array([  1.,  19.,  18.,   3.])
>>> x = np.array([[0, 1], [1, 1], [2, 2]])
>>> rowsum = x.sum(-1)
>>> x[rowsum <= 2, :]
array([[0, 1],
       [1, 1]])
>>> rowsum = x.sum(-1, keepdims=True)
>>> rowsum.shape
(3, 1)
>>> x[rowsum <= 2, :]    # fails
IndexError: too many indices
>>> x[rowsum <= 2]
array([0, 1])
>>> x = array([[ 0,  1,  2],
...            [ 3,  4,  5],
...            [ 6,  7,  8],
...            [ 9, 10, 11]])
>>> rows = (x.sum(-1) % 2) == 0
>>> rows
array([False,  True, False,  True])
>>> columns = [0, 2]
>>> x[np.ix_(rows, columns)]
array([[ 3,  5],
       [ 9, 11]])
>>> rows = rows.nonzero()[0]
>>> x[rows[:, np.newaxis], columns]
array([[ 3,  5],
       [ 9, 11]])

pandas indexing and slicing

https://pandas.pydata.org/pandas-docs/stable/indexing.html

假設咱們有如下數據集,咱們來練習使用pandas作數據檢索和切片:

# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Print out country column as Pandas Series
print(cars['country'])
In [4]: cars['country']
Out[4]: 
US     United States
AUS        Australia
JAP            Japan
IN             India
RU            Russia
MOR          Morocco
EG             Egypt
Name: country, dtype: object: Pandas Series
# Print out country column as Pandas DataFrame
print(cars[['country']])
In [5]: cars[['country']]
Out[5]: 
           country
US   United States
AUS      Australia
JAP          Japan
IN           India
RU          Russia
MOR        Morocco
EG           Egypt
# Print out DataFrame with country and drives_right columns
print(cars[['country','drives_right']])
In [6]: cars[['country','drives_right']]
Out[6]: 
           country  drives_right
US   United States          True
AUS      Australia         False
JAP          Japan         False
IN           India         False
RU          Russia          True
MOR        Morocco          True
EG           Egypt          True
# Print out first 3 observations
print(cars[0:4])

# Print out fourth, fifth and sixth observation
print(cars[4:7])

# Print out first 3 observations
print(cars[0:4])

# Print out fourth, fifth and sixth observation
print(cars[4:7])
In [14]: cars
Out[14]: 
     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JAP           588          Japan         False
IN             18          India         False
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True

In [15]: cars.loc['RU']
Out[15]: 
cars_per_cap       200
country         Russia
drives_right      True
Name: RU, dtype: object

In [16]: cars.iloc[4]
Out[16]: 
cars_per_cap       200
country         Russia
drives_right      True
Name: RU, dtype: object

In [17]: cars.loc[['RU']]
Out[17]: 
    cars_per_cap country  drives_right
RU           200  Russia          True

In [18]: cars.iloc[[4]]
Out[18]: 
    cars_per_cap country  drives_right
RU           200  Russia          True

In [19]: cars.loc[['RU','AUS']]
Out[19]: 
     cars_per_cap    country  drives_right
RU            200     Russia          True
AUS           731  Australia         False

In [20]: cars.iloc[[4,1]]
Out[20]: 
     cars_per_cap    country  drives_right
RU            200     Russia          True
AUS           731  Australia         False
In [3]: cars.loc['IN','cars_per_cap']
Out[3]: 18

In [4]: cars.iloc[3,0]
Out[4]: 18

In [5]: cars.loc[['IN','RU'],'cars_per_cap']
Out[5]: 
IN     18
RU    200
Name: cars_per_cap, dtype: int64

In [6]: cars.iloc[[3,4],0]
Out[6]: 
IN     18
RU    200
Name: cars_per_cap, dtype: int64

In [7]: cars.loc[['IN','RU'],['cars_per_cap','country']]
Out[7]: 
    cars_per_cap country
IN            18   India
RU           200  Russia

In [8]: cars.iloc[[3,4],[0,1]]
Out[8]: 
    cars_per_cap country
IN            18   India
RU           200  Russia
print(cars.loc['MOR','drives_right'])
True
In [1]: cars.loc[:,'country']
Out[1]: 
US     United States
AUS        Australia
JAP            Japan
IN             India
RU            Russia
MOR          Morocco
EG             Egypt
Name: country, dtype: object

In [2]: cars.iloc[:,1]
Out[2]: 
US     United States
AUS        Australia
JAP            Japan
IN             India
RU            Russia
MOR          Morocco
EG             Egypt
Name: country, dtype: object

In [3]: cars.loc[:,['country','drives_right']]
Out[3]: 
           country  drives_right
US   United States          True
AUS      Australia         False
JAP          Japan         False
IN           India         False
RU          Russia          True
MOR        Morocco          True
EG           Egypt          True

In [4]: cars.iloc[:,[1,2]]
Out[4]: 
           country  drives_right
US   United States          True
AUS      Australia         False
JAP          Japan         False
IN           India         False
RU          Russia          True
MOR        Morocco          True
EG           Egypt          True

if判斷：

age = 3
if age >= 18:
　　print('adult')
elif age >= 6:
　　print('teenager')
else:
　　print('kid')

循環：

names = ['Michael', 'Bob', 'Tracy']
for name in names:
　　print(name)

>>> list(range(5))
[0, 1, 2, 3, 4]
sum =0
for x in range(101):
    sum = sum+x
print(sum)

dist字典

dist數據相似於javascript的object，由key-value來定義的對象

>>> d = {'Michael': 95, 'Bob': 75, 'Tracy': 85}
>>> d['Michael']
95

set（集合）

set和dist相似，可是它只保存key，不存value,就像是js中literal對象{1,2,3,'a','b'},能夠當作數學意義上的無序和無重複元素的集和，支持交集，並集等集合操做，由一個list輸入傳給set()函數來生成

>>> s = set([1, 2, 3])
>>> s
{1, 2, 3}

>>> s1 = set([1, 2, 3])
>>> s2 = set([2, 3, 4])
>>> s1 & s2
{2, 3}
>>> s1 | s2
{1, 2, 3, 4}

str,int,None是不可變對象，而List,dict是可變對象

幫助資源查詢：

https://docs.python.org/3/library/functions.html#abs

函數：

函數有def來定義，能夠返回多個值

import math
def move(x, y, step, angle=0):
    nx = x + step * math.cos(angle)
    ny = y - step * math.sin(angle)
    return nx, ny
>>> x, y = move(100, 100, 60, math.pi / 6)
>>> print(x, y)
151.96152422706632 70.0
>>> r = move(100, 100, 60, math.pi / 6)
>>> print(r)
#本質上函數返回的是一個tuple,而這個tuple的對應元素的值分別賦值給了左變量
(151.96152422706632, 70.0)

函數支持默認參數:

def enroll(name, gender, age=6, city='Beijing'):
    print('name:', name)
    print('gender:', gender)
    print('age:', age)
    print('city:', city)
enroll('Bob', 'M', 7)
enroll('Adam', 'M', city='Tianjin')

函數可變參數：

def calc(*numbers):
    sum = 0
　　 print(type(numbers))

# 注意這裏的numbers是tuple數據<class 'tuple'>

for n in numbers:
    sum = sum + n * n
    return sum
>>> nums = [1, 2, 3]
>>> calc(*nums) #加一個*把list或者tuple變成可變參數傳進去*nums表示把nums這個list的全部元素做爲可變參數傳進去
14

函數關鍵字參數：

def person(name, age, **kw):
    print('name:', name, 'age:', age, 'other:', kw)
　　 print(type(kw)) # 注意kw是dict數據類型： <class 'dict'> >>> person('Michael', 30)
name: Michael age: 30 other: {}
>>> person('Bob', 35, city='Beijing')
name: Bob age: 35 other: {'city': 'Beijing'}
>>> person('Adam', 45, gender='M', job='Engineer')
name: Adam age: 45 other: {'gender': 'M', 'job': 'Engineer'}

>>> extra = {'city': 'Beijing', 'job': 'Engineer'}
>>> person('Jack', 24, **extra)
name: Jack age: 24 other: {'city': 'Beijing', 'job': 'Engineer'}

**extra表示把extra這個dict的全部key-value用關鍵字參數傳入到函數的**kw參數，kw將得到一個dict，注意kw得到的dict是extra的一份拷貝，對kw的改動不會影響到函數外的extra

命名關鍵字參數：

def person(name, age, *, city='Beijing', job):  #含默認值的命名關鍵字參數，city默認就爲'beijing'
    print(name, age, city, job)
>>> person('Jack', 24, city='Beijing', job='Engineer')
Jack 24 Beijing Engineer

關鍵字參數有什麼用？它能夠擴展函數的功能。好比，在person函數裏，咱們保證能接收到name和age這兩個參數，可是，若是調用者願意提供更多的參數，咱們也能收到。試想你正在作一個用戶註冊的功能，除了用戶名和年齡是必填項外，其餘都是可選項，利用關鍵字參數來定義這個函數就能知足註冊的需求

map

不少高級語言都提供相似的功能，其做用是對於list裏面的每個元素都執行相同的函數，而且返回一個iterator,進而可使用list()函數來生成新的list

def f(x):
    return x*x
r = map(f,[1,2,3,4,5])
print(r)

print(isinstance(r, Iterator)) # True

print(list(r)) #結果以下 #<map object at 0x000000000072B9B0>, 返回結果是一個Iterator，所以必須經過list()調用才能生成list #[1, 4, 9, 16, 25]

Modules:

https://pypi.python.org/pypi/mysql-connector-python/2.0.4

image module code example:

from PIL import  Image
im = Image.open(r'C:\Users\Administrator\Desktop\jj.png')
print(im.format,im.size,im.mode)
im.thumbnail((100,50))
im.save('thumb.jpg','png')

Python網絡服務編程

服務端：

import  socket
import threading
import time
def tcplink(sock,addr):
    print(('Accept new connection from %s:%s...' % addr))
    sock.send(b'Welcome, client!')
    while True:
        data = sock.recv(1024)
        time.sleep(1)
        if not data or data.decode('utf-8') == 'exit':
            break
        sock.send(('Hello, %s!' % data).encode('utf-8'))
    sock.close()
    print('Connection from %s:%s closed.' %addr)
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.bind(('127.0.0.1',9999))
s.listen(5)
print('waiting for connection coming on server...')
while True:
    sock, addr = s.accept()
    t = threading.Thread(target=tcplink,args=(sock,addr))
    t.start()
#下面是server端的打印信息：

waiting for connection coming on server...
Accept new connection from 127.0.0.1:64891...
Connection from 127.0.0.1:64891 closed.
Accept new connection from 127.0.0.1:65304...
Connection from 127.0.0.1:65304 closed.
Accept new connection from 127.0.0.1:65408...
Connection from 127.0.0.1:65408 closed.
Accept new connection from 127.0.0.1:65435...
Connection from 127.0.0.1:65435 closed.
Accept new connection from 127.0.0.1:65505...
Connection from 127.0.0.1:65505 closed.

測試客戶端

import  socket
import threading
import time
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect(('127.0.0.1',9999))
print((s.recv(1024).decode('utf-8')))
for data in [b'Michael',b'Tracy',b'Sarah']:
    s.send(data)
    print(s.recv(1024).decode('utf-8'))
s.send(b'exit')
s.close()
#下面是客戶端的打印信息：

Welcome, client!
Hello, b'Michael'!
Hello, b'Tracy'!
Hello, b'Sarah'!

python vs. iPython. vs jupyter notebooks以及演進路線架構

ipython notebook->jupyter notebooks演進

總的來講分爲interface level和kernel level兩個領域，接口這一層能夠有notebooks,ipython console, qt console，直接經過一個MQ over socket和kernel level通訊，該通訊接口負責傳輸要執行的python code以及code執行完成後返回的data。

而jupyter將notebooks的這種模式擴展到多種語言，好比R, bash，在kernel層分別增長對應語言的kernel組件，負責對應語言的執行和返回結果。

https://plot.ly/python/ipython-vs-python/

jupyter notebooks的工做原理架構

到底什麼是IPython?

IPython是一個加強交互能力的python console環境，它提供了不少有用的feature:

和標準的python console相比，它提供： Tab completion的功能，exlporing your objects,好比經過object_name?就將列出全部關於對象的細節。Magic functions, 好比%timeit這個magic經常能夠用來檢查代碼執行的效率, %run這個magic能夠容許你執行任何python scirpt而且將其全部的data直接加載到交互環境中。執行系統shell commands，好比!ping www.xxx.com，也能夠獲取到系統腳本命令輸出的內容:

files = !ls

!grep -rF $pattern ipython/*.

將python的變量$pattern傳入上面的grep系統命令

http://ipython.org/ipython-doc/dev/interactive/tutorial.html#magic-functions

如何在ipython下直接運行 <<<的例子代碼？

答案是在ipython下執行如下命令

%doctest_mode

如何使用notebooks學習和開發python?

Jupyter notebook軟件在至少如下兩種場景中很是好用：

1. 但願針對已經存在的notebook作進一步實驗或者純粹的學習；

2. 但願本身開發一個notebook用於輔助教學或者生成學術文章

在這兩種場景下，你可能都但願在一個特定的目錄下運行Jupyter notebook：

cd到你的目錄中，執行如下命令：

jupyter notebook

便可打開notebook，而且列出該目錄下的全部文件： http://localhost:8888/tree

some python debug study tips:

1. dir(obj) 列出對象的全部屬性和方法

y=[x*x for x in range(1,11)]
print(dir(y))
# 輸出:
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

2. 在notebook ipython環境中，使用%who magic 命令列出命名空間中全部global變量

%who Series #列出全部Series類型的變量
s     temp_diffs     temps1     temps2     
%who
#列出全部global
DataFrame     Series     dates     np     pd     plt     s     temp_diffs     temps1     
temps2     
%whos
#列出全部global及其詳細的type:
Variable     Type             Data/Info
---------------------------------------
DataFrame    type             <class 'pandas.core.frame.DataFrame'>
Series       type             <class 'pandas.core.series.Series'>
dates        DatetimeIndex    DatetimeIndex(['2014-07-0<...>atetime64[ns]', freq='D')
my_func      function         <function my_func at 0x00000211211B7C80>
np           module           <module 'numpy' from 'C:\<...>ges\\numpy\\__init__.py'>
pd           module           <module 'pandas' from 'C:<...>es\\pandas\\__init__.py'>
plt          module           <module 'matplotlib.pyplo<...>\\matplotlib\\pyplot.py'>
s            Series           a    1\nb    2\nc    3\nd    4\ndtype: int64
temp_diffs   Series           2014-07-01    10\n2014-07<...>10\nFreq: D, dtype: int64
temps1       Series           2014-07-01    80\n2014-07<...>87\nFreq: D, dtype: int64
temps2       Series           2014-07-01    70\n2014-07<...>77\nFreq: D, dtype: int64

3. 檢視一個module定義的方法以及方法的詳細用法

import pandas as pd
print(dir(pd))
print(help(pd.Series))
['Categorical', 'CategoricalIndex', 'DataFrame', 'DateOffset', 'DatetimeIndex', 'ExcelFile', 'ExcelWriter', 'Expr', 'Float64Index', 'Grouper', 'HDFStore', 'Index', 'IndexSlice', 'Int64Index', 'MultiIndex', 'NaT', 'Panel', 'Panel4D', 'Period', 'PeriodIndex', 'RangeIndex', 'Series', 'SparseArray', 'SparseDataFrame', 'SparseList', 'SparsePanel', 'SparseSeries', 'SparseTimeSeries', 'Term', 'TimeGrouper', 'TimeSeries', 'Timedelta', 'TimedeltaIndex', 'Timestamp', 'WidePanel', '__builtins__', '__cached__', '__doc__', '__docformat__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_np_version_under1p10', '_np_version_under1p11', '_np_version_under1p12', '_np_version_under1p8', '_np_version_under1p9', '_period', '_sparse', '_testing', '_version', 'algos', 'bdate_range', 'compat', 'computation', 'concat', 'core', 'crosstab', 'cut', 'date_range', 'datetime', 'datetools', 'dependency', 'describe_option', 'eval', 'ewma', 'ewmcorr', 'ewmcov', 'ewmstd', 'ewmvar', 'ewmvol', 'expanding_apply', 'expanding_corr', 'expanding_count', 'expanding_cov', 'expanding_kurt', 'expanding_max', 'expanding_mean', 'expanding_median', 'expanding_min', 'expanding_quantile', 'expanding_skew', 'expanding_std', 'expanding_sum', 'expanding_var', 'factorize', 'fama_macbeth', 'formats', 'get_dummies', 'get_option', 'get_store', 'groupby', 'hard_dependencies', 'hashtable', 'index', 'indexes', 'infer_freq', 'info', 'io', 'isnull', 'json', 'lib', 'lreshape', 'match', 'melt', 'merge', 'missing_dependencies', 'msgpack', 'notnull', 'np', 'offsets', 'ols', 'option_context', 'options', 'ordered_merge', 'pandas', 'parser', 'period_range', 'pivot', 'pivot_table', 'plot_params', 'pnow', 'qcut', 'read_clipboard', 'read_csv', 'read_excel', 'read_fwf', 'read_gbq', 'read_hdf', 'read_html', 'read_json', 'read_msgpack', 'read_pickle', 'read_sas', 'read_sql', 'read_sql_query', 'read_sql_table', 'read_stata', 'read_table', 'reset_option', 'rolling_apply', 'rolling_corr', 'rolling_count', 'rolling_cov', 'rolling_kurt', 'rolling_max', 'rolling_mean', 'rolling_median', 'rolling_min', 'rolling_quantile', 'rolling_skew', 'rolling_std', 'rolling_sum', 'rolling_var', 'rolling_window', 'scatter_matrix', 'set_eng_float_format', 'set_option', 'show_versions', 'sparse', 'stats', 'test', 'timedelta_range', 'to_datetime', 'to_msgpack', 'to_numeric', 'to_pickle', 'to_timedelta', 'tools', 'tseries', 'tslib', 'types', 'unique', 'util', 'value_counts', 'wide_to_long']

Help on class Series in module pandas.core.series:

class Series(pandas.core.base.IndexOpsMixin, pandas.core.strings.StringAccessorMixin, pandas.core.generic.NDFrame)
 |  One-dimensional ndarray with axis labels (including time series).
 |  
 |  Labels need not be unique but must be any hashable type. The object

4. notebooks中的命令模式和編輯模式相關命令：

Numpy

爲什麼要引入Numpy?

因爲標準的python list中保存的是對象的指針，所以必須二次尋址才能訪問到list中的元素。顯然這是低效而且浪費空間的。。

而且標準python list或者array不支持二緯數組，也不支持對數組數據作一些複雜適合數字運算的函數。

numpy爲了提升性能，而且支持二緯數組的複雜運算使用C語言編寫底層的實現而且以python obj方式給python調用。

其核心實現瞭如下兩個東西:

ndarray :它是存儲單一數據類型的多緯數組，而且基於該數組可以支持多種複雜的運算函數
ufunc：若是numpy提供的標準運算函數不知足需求，你可使用這種機制定義本身的函數
應用在ndarray數組中的數字上作數值運算時，都將是element wise的，也就是逐元素計算的！

import numpy as np
from matplotlib import pyplot as plt
x = np.linspace(0,2 * np.pi,100)
y = np.sin(x) // y是對x中的全部元素執行sin計算
plt.plot(x,y,'r-',linewidth=3,label='sin function')
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.show()

上面的代碼先產生0到$2\pi$的等差數組，而後傳遞給np.sin()函數，逐個計算其sin值。因爲np.sin()是一個ufunc函數，所以其內部對數組x的每一個元素進行循環，分別計算他們的正弦值，將結果保存爲一個數組並返回。

numpy高級特性(broadcasting, ufunc詳解)

https://www.jianshu.com/p/3c3f7da88516

參看<<利用Python進行數據分析·第2版>>

Pandas

爲什麼須要pandas

numpy的2d數組雖然能夠模擬pandas提供的功能，可是主要numpy原生的2d數組必須使用相同的數據類型，而在現實的數據分析任務中不少是不一樣類型的。

pandas在numpy之上又提供了相似於sql數據處理機制，提供Series和Dataframe兩種數據類型。每一個Series實際上包含index和values兩個ndarray.其中index保存建立series時傳入的index信息，values則是保存對應值的ndarray數組。numpy的ufunc函數都對該values數組來執行.

pandas dataframe圖解

http://www.tetraph.com/blog/machine-learning/jupyter-notebook-keyboard-shortcut-command-mode-edit-mode/

dataframe.loc/iloc vs []index operator

.oc/iloc都是指的row,而[]則默認給column selection, column總歸會有一個name,所以column selection老是label based

df.loc[:,['Name','cost']]
#返回全部store的name和cost value

如何複製而不是引用相同的list?

shoplist = ['apple','mango','carrot','banana']
mylist = shoplist
del shoplist[0]
print('shoplist is:',shoplist)
print('mylist is:',mylist)
# 上面是相同的輸出
print('copy via slice and asignment')
mycopiedlist = shoplist[:] # make a copy by doing a full slice
del(mycopiedlist[0])
print('shoplist is: ',shoplist)
print('mycopiedlist is:',mycopiedlist)

從字符串直接建立單字母的list

list('ABCD')
# 輸出 ['A', 'B', 'C', 'D']

python list .vs. numpy .vs. pandas

如何在ipython shell中查看已經load進namespace的函數源代碼？

有的時候，咱們經過ipython shell作探索式編程，有一些函數已經作了定義和運行，隨後想再查看一下這個函數的代碼，而且準備調用它，這時你就須要想辦法「重現」該函數的代碼。

方法是：經過inspect模塊

import inspect
source_DF = inspect.getsource(pandas.DataFrame)
print(type(source_DF))
print(source_DF[:200]) #打印源程序代碼
source_file_DF = inspect.getsourcefile(pandas.DataFrame)
print(source_file_DF)
# D:\Users\dengdong\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py

如何獲得python變量的地址address?

a = [0,1,2,3,4,5,6,7,8,9]
b = a[:]
print(id(a))
# 54749320
print(id(b))
# 54749340

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。