一直有看到網上有討論Python2和Python3的比較,最近公司也在考慮是否在spark-python大數據開發環境中升級到python3。經過本篇博文記錄Python2.7.13和Pthon3.5.3的各方面比較。html
這裏繼續使用咱們在以前博文裏配置的環境。
由於是比較Python2和Python3差別,因此單純升級Python版本沒法解決,我經過pyenv和virtualenv兩個工具來實現隔離的測試環境。
參考文檔:使用pyenv和virtualenv搭建python虛擬環境、使用 pyenv 能夠在一個系統中安裝多個python版本
配置的步驟以下:python
sudo yum install tkinter -ysudo yum install tk-devel tcl-devel -y
sudo yum install readline readline-devel readline-static -yyum install openssl openssl-devel openssl-static -yyum install sqlite-devel -yyum install bzip2-devel bzip2-libs -y
git clone https://github.com/yyuu/pyenv.git ~/.pyenvchgmod 777 -R ~/.pyenvecho 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bash_profileecho 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bash_profileecho 'eval "$(pyenv init -)"' >> ~/.bash_profileexec $SHELLsource ~/.bash_profile pyenv install --listpyenv install -v 2.7.13pyenv install -v 3.5.3
git clone https://github.com/yyuu/pyenv-virtualenv.git ~/.pyenv/plugins/pyenv-virtualenv echo 'eval "$(pyenv virtualenv-init -)"' >> ~/.bash_profilesource ~/.bash_profilepyenv virtualenv 2.7.13 py2pyenv virtualenv 3.5.3 py3
好,到此基本搞定兩個隔離的python環境,測試以下,咱們能夠發現當前的python環境從centos7默認的2.7.5切換到2.7.13再切換到3.5。mysql
[kejun@localhost ~]$ python -VPython 2.7.5[kejun@localhost ~]$ pyenv activate py2(py2) [kejun@localhost ~]$ python -VPython 2.7.13(py2) [kejun@localhost ~]$ pyenv deactivate[kejun@localhost ~]$ pyenv activate py3(py3) [kejun@localhost ~]$ python -VPython 3.5.
咱們安裝了經常使用的數據分析第三方工具包,並作了安裝測試和樣例測試,樣例測試的腳本見最下。git
分類 | 工具名 | 用途 |
---|---|---|
數據收集 | scrapy | 網頁採集,爬蟲 |
數據收集 | scrapy-redis | 分佈式爬蟲 |
數據收集 | selenium | web測試,仿真瀏覽器 |
數據處理 | beautifulsoup | 網頁解釋庫,提供lxml的支持 |
數據處理 | lxml | xml解釋庫 |
數據處理 | xlrd | excel文件讀取 |
數據處理 | xlwt | excel文件寫入 |
數據處理 | xlutils | excel文件簡單格式修改 |
數據處理 | pywin32 | excel文件的讀取寫入及複雜格式定製 |
數據處理 | Python-docx | Word文件的讀取寫入 |
數據分析 | numpy | 基於矩陣的數學計算庫 |
數據分析 | pandas | 基於表格的統計分析庫 |
數據分析 | scipy | 科學計算庫,支持高階抽象和複雜模型 |
數據分析 | statsmodels | 統計建模和計量經濟學工具包 |
數據分析 | scikit-learn | 機器學習工具庫 |
數據分析 | gensim | 天然語言處理工具庫 |
數據分析 | jieba | 中文分詞工具庫 |
數據存儲 | MySQL-python | mysql的讀寫接口庫 |
數據存儲 | mysqlclient | mysql的讀寫接口庫 |
數據存儲 | SQLAlchemy | 數據庫的ORM封裝 |
數據存儲 | pymssql | sql server讀寫接口庫 |
數據存儲 | redis | redis的讀寫接口 |
數據存儲 | PyMongo | mongodb的讀寫接口 |
數據呈現 | matplotlib | 流行的數據可視化庫 |
數據呈現 | seaborn | 美觀的數據但是湖庫,基於matplotlib |
工具輔助 | jupyter | 基於web的python IDE,經常使用於數據分析 |
工具輔助 | chardet | 字符檢查工具 |
工具輔助 | ConfigParser | 配置文件讀寫支持 |
工具輔助 | requests | HTTP庫,用於網絡訪問 |
# encoding=utf-8import sysimport platformimport tracebackimport gcimport ctypes STD_OUTPUT_HANDLE= -11 FOREGROUND_BLACK = 0x0 FOREGROUND_BLUE = 0x01 # text color contains blue. FOREGROUND_GREEN= 0x02 # text color contains green. FOREGROUND_RED = 0x04 # text color contains red. FOREGROUND_INTENSITY = 0x08 # text color is intensified. class WinPrint: """ 提供給Windows打印彩色字體使用 """ std_out_handle = ctypes.windll.kernel32.GetStdHandle(STD_OUTPUT_HANDLE) def set_cmd_color(self, color, handle=std_out_handle): bool = ctypes.windll.kernel32.SetConsoleTextAttribute(handle, color) return bool def reset_color(self): self.set_cmd_color(FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE) def print_red_text(self, print_text): self.set_cmd_color(FOREGROUND_RED | FOREGROUND_INTENSITY) print (print_text) self.reset_color() def print_green_text(self, print_text): self.set_cmd_color(FOREGROUND_GREEN | FOREGROUND_INTENSITY) print (print_text) self.reset_color() class UnixPrint: """ 提供給Centos打印彩色字體 """ def print_red_text(self, print_text): print('\033[1;31m%s\033[0m'%print_text) def print_green_text(self, print_text): print('\033[1;32m%s\033[0m'%print_text)py_env = "Python2" if sys.version.find("2.7") > -1 else "Python3"sys_ver = "Windows" if platform.system().find("indows") > -1 else "Centos"my_print = WinPrint() if platform.system().find("indows") > -1 else UnixPrint()def check(sys_ver, py_env): """ 裝飾器,統一輸入輸出 順便測試帶參數的裝飾器,非必須帶參數 """ def _check(func): def __check(): try: func() my_print.print_green_text( "[%s,%s]: %s pass." % (sys_ver, py_env, func.__name__)) except: traceback.print_exc() my_print.print_red_text( "[%s,%s]: %s fail." % (sys_ver, py_env, func.__name__)) return __check return _checkdef make_requirement(filepath, filename): """ 處理pip requirements文件 """ result = [] with open(filepath + "\\" + filename, "r") as f: data = f.readlines() for line in data: if line.find("==") > -1: result.append(line.split("==")[0] + "\n") else: result.append(line + "\n") with open(filepath + "\\" + filename.split(".")[0] + "-clean.txt", "w") as f1: f1.writelines(result)@check(sys_ver, py_env)def test_scrapy(): from scrapy import signals from selenium import webdriver from scrapy.http import HtmlResponse from selenium.webdriver.common.desired_capabilities import DesiredCapabilities from selenium.webdriver.common.keys import Keys from selenium.common.exceptions import NoSuchElementException from selenium.common.exceptions import TimeoutException from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait@check(sys_ver, py_env)def test_matplotlib(): import matplotlib.pyplot as plt l = [1, 2, 3, 4, 5] h = [20, 14, 38, 27, 9] w = [0.1, 0.2, 0.3, 0.4, 0.5] b = [1, 2, 3, 4, 5] fig = plt.figure() ax = fig.add_subplot(111) rects = ax.bar(l, h, w, b) # plt.show()@check(sys_ver, py_env)def test_beautifulSoup(): from bs4 import BeautifulSoup html_str = "<html><meta/><head><title>Hello</title></head><body onload=crash()>Hi all<p></html>" soup = BeautifulSoup(html_str, "lxml") # print (soup.get_text())@check(sys_ver, py_env)def test_lxml(): from lxml import html html_str = "<html><meta/><head><title>Hello</title></head><body onload=crash()>Hi all<p></html>" html.fromstring(html_str)@check(sys_ver, py_env)def test_xls(): import xlrd import xlwt from xlutils.copy import copy excel_book2 = xlwt.Workbook() del excel_book2 excel_book1 = xlrd.open_workbook("1.xlsx") del excel_book1 import docx doc = docx.Document("1.docx") # print (doc) del doc gc.collect()@check(sys_ver, py_env)def test_data_analysis(): import pandas as pd import numpy as np data_list = np.array([x for x in range(100)]) data_serial = pd.Series(data_list) # print (data_serial) from scipy import fft b = fft(data_list) # print (b)@check(sys_ver, py_env)def test_statsmodels(): import statsmodels.api as sm data = sm.datasets.spector.load() data.exog = sm.add_constant(data.exog, prepend=False) # print data.exog@check(sys_ver, py_env)def test_sklearn(): from sklearn import datasets iris = datasets.load_iris() data = iris.data # print(data.shape)@check(sys_ver, py_env)def test_gensim(): import warnings warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim') from gensim import corpora from collections import defaultdict documents = ["Human machine interface for lab abc computer applications", "A survey of user opinion of computer system response time", "The EPS user interface management system", "System and human system engineering testing of EPS", "Relation of user perceived response time to error measurement", "The generation of random binary unordered trees", "The intersection graph of paths in trees", "Graph minors IV Widths of trees and well quasi ordering", "Graph minors A survey"] stoplist = set('for a of the and to in'.split()) texts = [[word for word in document.lower().split() if word not in stoplist] for document in documents] frequency = defaultdict(int) for text in texts: for token in text: frequency[token] += 1 texts = [[token for token in text if frequency[token] > 1] for text in texts] dictionary = corpora.Dictionary(texts) dictionary.save('deerwester.dict')@check(sys_ver, py_env)def test_jieba(): import jieba seg_list = jieba.cut("我來到了北京參觀天安門。", cut_all=False) # print("Default Mode: " + "/ ".join(seg_list)) # 精確模式@check(sys_ver, py_env)def test_mysql(): import MySQLdb as mysql #測試pet_shop鏈接 db = mysql.connect(host="xx", user="yy", passwd="12345678", db="zz") cur = db.cursor() sql="select id from role;" cur.execute(sql) result = cur.fetchall() db.close() # print (result)@check(sys_ver, py_env)def test_SQLAlchemy(): from sqlalchemy import Column, String, create_engine,Integer from sqlalchemy.orm import sessionmaker from sqlalchemy.ext.declarative import declarative_base engine = create_engine('mysql://xxx/yy',echo=False) DBSession = sessionmaker(bind=engine) Base = declarative_base() class rule(Base): __tablename__="role" id=Column(Integer,primary_key=True,autoincrement=True) role_name=Column(String(100)) role_desc=Column(String(255)) new_rule=rule(role_name="test_sqlalchemy",role_desc="forP2&P3") session=DBSession() session.add(new_rule) session.commit() session.close()@check(sys_ver, py_env)def test_redis(): import redis pool = redis.Redis(host='127.0.0.1', port=6379) @check(sys_ver, py_env)def test_requests(): import requests r=requests.get(url="http://www.cnblogs.com/kendrick/") # print (r.status_code)@check(sys_ver, py_env)def test_PyMongo(): from pymongo import MongoClient conn=MongoClient("localhost",27017)if __name__ == "__main__": print ("[%s,%s] start checking..." % (sys_ver, py_env)) test_scrapy() test_beautifulSoup() test_lxml() test_matplotlib() test_xls() test_data_analysis() test_sklearn() test_mysql() test_SQLAlchemy() test_PyMongo() test_gensim() test_jieba() test_redis() test_requests() test_statsmodels() print ("[%s,%s] finish checking." % (sys_ver, py_env))