Python基礎知識分享

時間 2019-11-30

原文原文鏈接

介紹

以前也蜻蜓點水的看了一下Python的基礎，可是感受有點不紮實，因此本身又從新細細的把基礎過了一遍，同時把覺着重要的記錄下來。文章最末尾分享了《Python爬蟲開發與項目實戰》pdf書籍，此pdf是高清有目錄的，有須要的朋友拿去。html

元組

元組內的數據不能修改和刪除算法

Python 表達式	結果	描述
('Hi!',) * 4	('Hi!', 'Hi!', 'Hi!', 'Hi!')	複製
3 in (1, 2, 3)	True	元素是否存在

任意無符號的對象，以逗號隔開，默認爲元組。例：x, y = 1, 2;json

建立一個元素的元組

必定要有一個逗號，要不是錯誤的數組

tuple = ("apple",)
複製代碼

經過元組實現數值交換

def test2():
    x = 2
    y = 3
    x, y = y, x
    print x,y
複製代碼

查看幫助文檔

help多線程

help(list)
複製代碼

字典

dict["x"]="value"
複製代碼

若是索引x不在字典dict的key中，則會新增一條數據，反之爲修改數據app

set()內置函數

set() 函數建立一個無序不重複元素集，可進行關係測試，刪除重複數據，還能夠計算交集、差集、並集等。框架

x = set(["1","2"])
y = set(["1","3","4"])
print x&y # 交集
print x|y # 並集
print x-y # 差集
zip(x) #解包爲數組
複製代碼

zip()內置函數

zip() 函數用於將可迭代的對象做爲參數，將對象中對應的元素打包成一個個元組，而後返回由這些元組組成的列表。函數

若是各個迭代器的元素個數不一致，則返回列表長度與最短的對象相同，利用 * 號操做符，能夠將元組解壓爲列表。性能

a = [1,2,3]
b = [4,5,6]
c = [4,5,6,7,8]
zipped = zip(a,b)     # 打包爲元組的列表[(1, 4), (2, 5), (3, 6)]
zip(a,c) # 元素個數與最短的列表一致[(1, 4), (2, 5), (3, 6)]
zip(*zipped)  #與zip相反，可理解爲解壓，返回二維矩陣式
[(1, 2, 3), (4, 5, 6)]
複製代碼

可變參數

在函數的參數使用標識符"*"來實現可變參數的功能。"*"能夠引用元組，把多個參會組合到一個元組中； "**"能夠引用字典測試

def search(*t,**d):
    keys = d.keys()
    for arg in t:
        for key in keys:
            if arg == key:
                print ("find:",d[key])

search("a","two",a="1",b="2") #調用
複製代碼

時間與字符串的轉換

時間轉字符串使用time模塊中的strftime()函數

import time

print time.strftime("%Y-%m-%d",time.localtime())
複製代碼

字符串到時間使用time模塊中strftime和datetime模塊中的datetime()函數

import time
import datetime

t = time.strptime("2018-3-8", "%Y-%m-%d")
y, m, d = t[0:3]

print datetime.datetime(y,m,d)
複製代碼

操做文件和目錄操做

好比對文件重命名、刪除、查找等操做

os庫:文件的重命名、獲取路徑下全部的文件等。os.path模塊能夠對路徑、文件名等進行操做

files = os.listdir(".")
print type(os.path)
for filename in files:
    print os.path.splitext(filename)# 文件名和後綴分開
複製代碼

shutil庫：文件的複製、移動等操做
glob庫：glob.glob("*.txt")查找當前路徑下後綴名txt全部文件

讀取配置文件

經過configparser(3.x，ConfigParser（2.x）)庫進行配置的文件的讀取、更改、增長等操做

config = ConfigParser.ConfigParser()
config.add_section("系統")
config.set("系統", "系統名稱", "iOS")
f = open("Sys.ini", "a+")
config.write(f)
f.close()
複製代碼

正則

re正則匹配查找等操做

類

屬性

私有屬性名字前邊加"__"

class Fruits:
    price = 0               # 類屬性，全部的類變量共享，對象和類都可訪問。可是修改只能經過類訪問進行修改

    def __init__(self):
        self.color = "red"  # 實例變量，只有對象才能夠訪問
        zone = "中國"        # 局部變量
        self.__weight = "12" # 私有變量，不能夠直接訪問，能夠經過_classname__attribute進行訪問


if __name__ == "__main__":
    apple = Fruits()
    print (apple._Fruits__weight) #訪問私有變量
複製代碼

方法

靜態方法

 @staticmethod
    def getPrice():
        print (Fruits.price)
複製代碼

私有方法

def __getWeight(self):
        print self.__weight
複製代碼

類方法

 @classmethod
    def getPrice2(cls):
        print (cls.price)
複製代碼

動態增長方法

Python做爲動態腳本語言，編寫的程序也具備很強的動態性。

class_name.method_name = function_name

類的繼續

而且支持多重繼承

格式：

class class_name(super_class1,super_class2):

抽象方法

 @abstractmethod
    def grow(self):
        pass
複製代碼

運算符的重載

Python將運算符和類的內置方法關聯起來,每一個運算符對應1個函數。例如__add__()表示加好運算符;gt()表示大於運算符

經過重載運算符咱們能夠實現對象的加減或者比較等操做。

異常

捕獲異常

try: except:finally:

拋出異常

raise語言拋出異常

斷言

assert len(t)==1

文件持久化

`shelve`本地建庫

shelve模塊提供了本地數據化存儲的方法

addresses = shelve.open("addresses") # 若是沒有本地會建立
addresses["city"] = "北京"
addresses["pro"] = "廣東"
addresses.close()
複製代碼

cPickle 序列化

cPickle和pickle兩個模塊都是來實現序列號的，前者是C語言編寫的，效率比較高

序列化：

import cPickle as pickle
str = "我須要序列化"
f = open("serial.txt", "wb")
pickle.dump(str, f)
f.close()
複製代碼

反序列化:

f = open("serial.txt","rb")
str = pickle.load(f)
f.close()
複製代碼

json文件存儲

Python內置了json模塊用於json數據的操做

序列號到本地

import json
new_str = [{'a': 1}, {'b': 2}]
f = open('json.txt', 'w')
json.dump(new_str, f,ensure_ascii=False)
f.close()
複製代碼

從本地讀取

import json
f = open('json.txt', 'r')
str = json.load(f)
print str
f.close()
複製代碼

線程

threading模塊

class threading.Thread(group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None)

線程和queue

# -*- coding:UTF-8 -*-

import threading
import Queue

class MyJob(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self, name="aa")

    def run(self):
        print threading.currentThread()

        while not q.empty():
            a = q.get()
            print("個人%d"%a)
            print "個人線程"
            q.task_done()


def job(a, b):
    print a+b
    print threading.activeCount()
    print "多線程"


thread = threading.Thread(target=job, args=(2, 4), name="mythread")
q = Queue.Queue()
if __name__ == "__main__":
    myjob = MyJob()
    for i in range(100):
        q.put(i)
    myjob.start()
    q.join() #每一個昨晚的任何須須調用task_done()，要不主線程會掛起
複製代碼

進程

multiprocessing中 Process能夠建立進程，經過Pool進程池能夠對進程進行管理

from multiprocessing import Process
import os

def run_pro(name):
    print 'process %s(%s)' % (os.getpid(),name)

if __name__ == "__main__":
    print 'parent process %s' % os.getpid()
    for i in range(5):
        p = Process(target=run_pro, args=(str(i)))
        p.start()
複製代碼

爬蟲

爬取數據

urllib2/urllib Python內置的，能夠實現爬蟲，比較經常使用

import urllib2
response = urllib2.urlopen('http://www.baidu.com')
html = response.read()
print html

try:
    request = urllib2.Request('http://www.google.com')
    response = urllib2.urlopen(request,timeout=5)
    html = response.read()
    print html
except urllib2.URLError as e:
    if hasattr(e, 'code'):
        print 'error code:',e.code
    print e
複製代碼

Requests 第三方比較人性化的框架

import requests
r = requests.get('http://www.baidu.com')
print r.content
print r.url
print r.headers
複製代碼

解析爬取的數據

經過BeautifulSoup來解析html數據，Python標準庫（html.parser）容錯比較差，通常使用第三方的lxml,性能、容錯等比較好。

hash算法庫

hashlib介紹

hashlib 是一個提供了一些流行的hash算法的 Python 標準庫．其中所包括的算法有 md5, sha1, sha224, sha256, sha384, sha512. 另外，模塊中所定義的 new(name, string=」) 方法可經過指定系統所支持的hash算法來構造相應的hash對象