python基礎操做以及hdfs操做

時間 2019-12-09

標籤 python 基礎以及 hdfs 欄目 Python 简体版

原文原文鏈接

1、前言

做爲一個全棧工程師，必需要熟練掌握各類語言。。。HelloWorld。最近就被「逼着」走向了python開發之路，大致實現的功能是寫一個通用類庫將服務器本地存儲的文件進行簡單清洗後轉儲到HDFS中，因此基本上python的相關知識都涉及到了，這裏對一些基礎操做以及hdfs操做作一總結，以備查閱。node

2、基礎操做

2.1 字符串操做

字符串操做應該是全部語言的基礎。python基本上也提供了其餘語言經常使用的一些字符串處理函數，經常使用的以下：python

一、startswith 以某個字符串起始服務器

二、endswith 以某個字符串結尾app

三、contain python沒有提供contain函數，可使用 'test' in somestring 的方式來進行判斷，固然也可使用index來判斷函數

四、strip 去除空格及特殊符號spa

五、len 判斷字符串長度len(str).net

六、upper lower 大小寫轉換code

七、split 分隔字符串blog

2.2 文件操做

文件以及文件夾操做也是寫程序中常常用到的功能。python中文件操做經常使用的有如下函數。遞歸

一、walk 用於遞歸遍歷文件夾，獲取全部文件。

二、os.path 文件、文件夾路徑等操做。

對文件操做進行了簡單的封裝，代碼以下，僅供參考：

def isFile(name): return os.path.isfile(name) def isDir(name): return os.path.isdir(name) def getDirPath(filename): return os.path.dirname(filename) def getFilename(path): return os.path.basename(path) def getExt(filename): return os.path.splitext(filename)[1] def changeExt(filename, ext): if not ext.startswith('.'): ext = '.' + ext return getFilenameWithoutExt(filename) + ext def getDirAndFileNameWithoutExt(filename): return os.path.splitext(filename)[0] def getFilenameWithoutExt(filename): return getFilename(getDirAndFileNameWithoutExt(filename)) def deleteFileOrFolder(path): try: if isFile(path): os.remove(path) elif isDir(path): shutil.rmtree(path) # or os.rmdir(path) except: pass

2.3 壓縮解壓縮操做

能夠參考http://blog.csdn.net/luoshengkim/article/details/46647423

一、tar.gz

壓縮、解壓.tar.gz文件能夠直接使用tarfile包，首先引入：import tarfile。解壓縮操做以下：

tar = tarfile.open(path, 'r:gz') file_names = tar.getnames() for file_name in file_names: tar.extract(file_name, path) tar.close()

壓縮操做以下：

tar = tarfile.open(tarpath, 'w:gz') if isFile(srcpath): tar.add(srcpath, arcname=srcpath) elif isDir(srcpath): for root, dir, files in os.walk(srcpath): for file in files: fullpath = os.path.join(root, file) tar.add(fullpath, arcname=file) tar.close()

tarfile.open的mode有如下種，每種對應不一樣的方式，須要根據本身須要選取：

mode action
'r' or 'r:*' Open for reading with transparent compression (recommended). 'r:' Open for reading exclusively without compression. 'r:gz' Open for reading with gzip compression. 'r:bz2' Open for reading with bzip2 compression. 'a' or 'a:' Open for appending with no compression. The file is created if it does not exist. 'w' or 'w:' Open for uncompressed writing. 'w:gz' Open for gzip compressed writing. 'w:bz2' Open for bzip2 compressed writing.

二、gz

壓縮、解壓.gz文件能夠直接使用gzip包，首先引入：import gzip。解壓縮操做以下：

fname = path.replace('.gz', '').replace('.GZ', '') gfile = gzip.GzipFile(path) open(fname, 'wb').write(gfile.read()) gfile.close()

壓縮操做以下：

gfile = gzip.GzipFile(srcpath + '.gz', mode='w') gfile.write(open(srcpath, 'rb').read()) gfile.close()

此處一樣須要注意mode的選取，而且還要注意解壓縮的時候建立解壓縮文件時的mode。

三、zip

壓縮、解壓.zip文件能夠直接使用zipfile包，首先引入：import zipfile。解壓縮操做以下：

zip_file  = zipfile.ZipFile(path, mode='r') for name in zipfile.namelist(): zip_file.extract(name, getFilenameWithoutExt(path)) zip_file.close()

壓縮操做以下：

zip_file  = zipfile.ZipFile(zippath, mode='w') if isFile(srcpath): zip_file.write(srcpath, arcname=srcpath) elif isDir(srcpath): for root, dir, files in os.walk(srcpath): for file in files: fullpath = os.path.join(root, file) zip_file.write(fullpath, arcname=file) zip_file.close()

3、hdfs操做

hdfs操做採用hdfs3庫，這是c語言寫的libhdfs庫的python封裝版，基本能知足經常使用的hdfs操做。

3.1 引入hdfs3

只須要知道namenode的地址以及端口號便可，代碼以下：

from hdfs3 import HDFileSystem hdfs = HDFileSystem(host='namenode', port=8020)

3.2 創建文件夾

若是想要上傳文件等到hdfs，必須保證其文件夾存在，不然會報錯，此時就能夠先建立文件夾，只須要使用hdfs.mkdir(dir)便可，而且此命令會遞歸建立文件夾，即不須要一層層的建立不存在的文件夾。

3.3 上傳文件

上傳文件的時候只須要指定本地文件地址以及hdfs中存儲地址便可，hdfs地址也須要包含文件名，命令爲hdfs.put(localfile, remotefile)。

3.4 hdfs操做封裝

一樣將我封裝的hdfs操做代碼封裝以下：

def mkdir(remotepath): if not exists(remotepath): hdfs.mkdir(dir) def get(remotepath, localpath): if exists(remotepath): hdfs.get(remotepath, localpath) def put(localfile, remotefile): dir = getDirPath(remotefile) mkdir(dir) hdfs.put(localfile, remotefile) def exists(remotepath): return hdfs.exists(remotepath) def delete(remotepath): if exists(remotepath): hdfs.rm(remotepath, recursive=True)