Python合併PDF、操做圖片以及生成驗證碼

時間 2019-11-11

標籤 python 合併 pdf 圖片以及生成驗證碼欄目 Python 简体版

原文原文鏈接

在記錄今天重點內容的筆記以前，我想要先記錄一下匿名函數，由於以前對匿名函數的理解僅停留在瞭解的狀態，以致於實際應用很困難，近兩天的內容恰好碰到相似的應用，遂再次深刻的解析一下。html

1、匿名函數

python 使用 lambda 來建立匿名函數。python

所謂匿名，意即再也不使用 def 語句這樣標準的形式定義一個函數。linux

lambda 只是一個表達式，函數體比 def 簡單不少。
lambda的主體是一個表達式，而不是一個代碼塊。僅僅能在lambda表達式中封裝有限的邏輯進去。
lambda 函數擁有本身的命名空間，且不能訪問本身參數列表以外或全局命名空間裏的參數。
雖然lambda函數看起來只能寫一行，卻不等同於C或C++的內聯函數，後者的目的是調用小函數時不佔用棧內存從而增長運行效率。

在一些狀況下，咱們一般會須要對一些有規律命名規則的文件進行必定的排序，可是單純使用 sorted() 獲取的文件名列表是按照 ascii 碼排序的，例如：1.pdf, 10.pdf, 11.pdf, 2.pdf ··· ···數組

sort 與 sorted 區別：bash

sort 是應用在 list 上的方法，sorted 能夠對全部可迭代的對象進行排序操做。app

list 的 sort 方法返回的是對已經存在的列表進行操做，而內建函數 sorted 方法返回的是一個新的 list，而不是在原來的基礎上進行的操做。dom

sorted() 的語法是ide

sorted(iterable, key=None, reverse=False)  

'''
參數說明：
iterable -- 可迭代對象。
key      -- 主要是用來進行比較的元素，只有一個參數，具體的函數的參數就是取自於可迭代對象中，指定可迭代對象中的一個元素來進行排序。
reverse  -- 排序規則，reverse = True 降序 ， reverse = False 升序（默認）。
'''

當咱們須要把一個可迭代對象進行排序操做的時候，那麼 key 值進行比較的元素一般就可使用 lambda 函數獲取到。函數

若是咱們須要把下面的列表中的對象以數字的順序排序，咱們使用 lambda 實現就方便多了字體

File=['1.pdf', '10.pdf', '7.pdf', '2.pdf', '21.pdf', '6.pdf', '3.pdf', '34.pdf']

首先咱們須要肯定數字的下標

d[0:-4]

'''
個位數以 1.pdf 爲例，從左邊開始 1 的下標爲 0，從右邊開始 1 的下標爲 -4，[0:-4]取到的值爲 1
多位數以 31.pdf 爲例，從左邊開始 3 的下標爲 0，從右邊開始 1 的下標爲 -4，[0:-4]取到的值爲 31
'''

取到用於比較的元素以後，在使用 sorted 排序就簡單多了

File=['1.pdf', '10.pdf', '7.pdf', '2.pdf', '22.pdf', '6.pdf', '3.pdf', '31.pdf']
# newFiles = sorted(File, key=lambda d: int(d.split(".pdf")[0])) 此方法亦可行，但不適用於首位不爲數字的名稱
newFiles = sorted(File, key=lambda d:int(d[0:-4]))
print(newFiles)


''' 運行結果
['1.pdf', '2.pdf', '3.pdf', '6.pdf', '7.pdf', '10.pdf', '22.pdf', '31.pdf']
'''

那麼當名稱的開頭是字母不爲數字時，如 'chapter1.pdf', 'chapter10.pdf' 之類，咱們就只能從中間取值，好比

listD=['chapter1.pdf', 'chapter10.pdf', 'chapter11.pdf', 'chapter12.pdf', 'chapter13.pdf', 'chapter14.pdf', 'chapter15.pdf', 'chapter16.pdf', 'chapter17.pdf', 'chapter18.pdf', 'chapter19.pdf', 'chapter2.pdf', 'chapter20.pdf', 'chapter21.pdf', 'chapter22.pdf', 'chapter23.pdf', 'chapter24.pdf', 'chapter25.pdf', 'chapter26.pdf', 'chapter3.pdf', 'chapter4.pdf', 'chapter5.pdf', 'chapter6.pdf', 'chapter7.pdf', 'chapter8.pdf', 'chapter9.pdf']

d[0:-4]

'''
個位數以 chapter1.pdf 爲例，從左邊開始 1 的下標爲 7，從右邊開始 1 的下標爲 -4，[0:-4]取到的值爲 1
多位數以 chapter26.pdf 爲例，從左邊開始 2 的下標爲 7，從右邊開始 6 的下標爲 -4，[0:-4]取到的值爲 26
'''

接着，咱們能夠這樣實現

files=sorted(listD,key=lambda x:int(x[7:-4]))
print(files)

''' 運行結果
['chapter1.pdf', 'chapter2.pdf', 'chapter3.pdf', 'chapter4.pdf', 'chapter5.pdf', 'chapter6.pdf', 'chapter7.pdf', 'chapter8.pdf', 'chapter9.pdf', 'chapter10.pdf', 'chapter11.pdf', 'chapter12.pdf', 'chapter13.pdf', 'chapter14.pdf', 'chapter15.pdf', 'chapter16.pdf', 'chapter17.pdf', 'chapter18.pdf', 'chapter19.pdf', 'chapter20.pdf', 'chapter21.pdf', 'chapter22.pdf', 'chapter23.pdf', 'chapter24.pdf', 'chapter25.pdf', 'chapter26.pdf']
'''

2、合併PDF

首先咱們須要安裝 PyPDF2 模塊

pip install PyPDF2

具體實現代碼以下

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# @Time    : 2018/6/10 20:07
# @Author  : zhouyuyao
# @File    : demon1.py

import codecs
import PyPDF2
import os

filename="E:/GitHub/Python-Learning/LIVE_PYTHON/2018-06-07/deal-with-pdf/aminglinux"
files = list()
for fileName in os.listdir(filename):
    if fileName.endswith(".pdf"):
        files.append(fileName)

newFiles = sorted(files,key=lambda x:int(x[7:-4]))
print(newFiles)

os.chdir(filename)
pdfWriter = PyPDF2.PdfFileWriter()                 # 生成一個空白的pdf
for item in newFiles:
    pdfReader = PyPDF2.PdfFileReader(open(item, "rb"))
    for page in range(pdfReader.numPages):
        pdfWriter.addPage(pdfReader.getPage(page))

with codecs.open("aminglinux.pdf", "wb") as f:     # 命名一個PDF文件 並以二進制寫入的方式打開
    pdfWriter.write(f)                             # 將內容寫進PDF

運行結果

'''
運行結果，生成了一個aminglinux.pdf的文件，內容爲所選 pdf 文件內容的彙總
'''

3、處理圖片

PIL (Python Image Library) 是 Python 平臺處理圖片的事實標準，兼具強大的功能和簡潔的 API。

3.1 安裝

Pillow 庫則是 PIL 的一個分支，維護和開發活躍，Pillow 兼容 PIL 的絕大多數語法，在這裏咱們也是推薦使用pillow。

pip install pillow

3.2 新建一個 Image 類的實例

PIL 的主要功能定義在 Image 類當中，而 Image 類定義在同名的 Image 模塊當中。使用 PIL 的功能，通常都是重新建一個 Image 類的實例開始。新建 Image 類的實例有多種方法。你能夠用 Image 模塊的 open() 函數打開已有的圖片檔案，也能夠處理其它的實例，或者從零開始構建一個實例。

from PIL import Image
sourceFileName = "source.png"
avatar = Image.open(sourceFileName)

上述代碼引入了 Image 模塊，並以 open() 方法打開了 source.png 這個圖像，構建了名爲 avatar 的實例。若是打開失敗，則會拋出 IOError 異常。

接下來你可使用 show() 方法來查看實例。

注意，PIL 會將實例暫存爲一個臨時文件，然後打開它，具體實現代碼以下

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# @Time    : 2018/6/11 16:38
# @Author  : zhouyuyao
# @File    : demon4.py

from PIL import Image
sourceFileName = "E:/GitHub/Python-Learning/LIVE_PYTHON/2018-06-10/theWayToGo.png"
avatar = Image.open(sourceFileName)    # 打開圖片，並構建 avatar 爲名的實例
avatar.show()                          # 查看實例

運行以後將會打開圖片

3.3 查看實例的屬性

Image 類的實例有 5 個屬性，分別是：

1）format

以 string 返回圖片檔案的格式（JPG, PNG, BMP, None, etc.）；若是不是從打開文件獲得的實例，則返回 None。

2）mode

以 string 返回圖片的模式（RGB, CMYK, etc.）；完整的列表參見官方說明·圖片模式列表

3）size

以二元 tuple 返回圖片檔案的尺寸 (width, height)

4）palette

僅當 mode 爲 P 時有效，返回 ImagePalette 示例

5）info

以字典形式返回示例的信息

若是咱們想要獲得圖片的格式、尺寸和模式，則能夠加入以下操做

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# @Time    : 2018/6/11 16:38
# @Author  : zhouyuyao
# @File    : demon4.py
from PIL import Image
sourceFileName = "E:/GitHub/Python-Learning/LIVE_PYTHON/2018-06-10/theWayToGo.png"
avatar = Image.open(sourceFileName)
# avatar.show()

'''獲取圖片的格式、尺寸和模式'''
print("The picture's format is {0}, the size is {1}, the mode is {2}.".format(avatar.format, avatar.size, avatar.mode))

運行結果

The picture's format is PNG, the size is (718, 726), the mode is RGBA.

這裏咱們看到返回了圖片的格式 PNG、圖片的大小 (718, 726) 和圖片的模式 RGBA。

3.4 實例的方法

Image 類定義了許多方法，官方說明取自 http://effbot.org/imagingbook/image.htm

# image的方法
# image.show()
# image.open(file)
# image.save(outputfile)
# image.crop(left, upper, right, lower)            #摳圖

1）圖片 IO - 轉換圖片格式

Image 模塊提供了 open() 函數打開圖片檔案，Image 類則提供了 save() 方法將圖片實例保存爲圖片檔案。

save() 函數能夠以特定的圖片格式保存圖片檔案。好比 save('target.jpg', 'JPG') 將會以 JPG 格式將圖片示例保存爲 target.jpg。不過，大多數時候也能夠省略圖片格式。此時，save() 方法會根據文件擴展名來選擇相應的圖片格式。

咱們以一個轉換圖片格式的腳本進行分析。

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# @Time    : 2018/6/11 16:38
# @Author  : zhouyuyao
# @File    : demon4.py

import os, sys
from PIL import Image

for infile in sys.argv[1:]:
    f, e = os.path.splitext(infile)
    outfile = f + ".jpg"
    if infile != outfile:
        try:
            Image.open(infile).save(outfile)
        except IOError:
            print("cannot convert", infile)

這裏，f 是除去擴展名以外的文件名。在 try 語句中，咱們嘗試打開圖片檔案，而後以 .jpg 爲擴展名保存圖片檔案。save() 方法會根據擴展名，將圖片以 JPG 格式保存爲檔案。若是圖片檔案沒法打開，則在終端上打印沒法轉換的消息。

代碼中須要咱們進行傳參，在 PyCharm 中以下所示

運行以後的結果，能夠生成圖片，可是圖片錯誤不能打開，查了內外網大部分都只是展示了代碼，官網給出的解釋比較籠統，具體緣由待查證

後來查了下，是由於選取的那張圖片沒法轉換格式，遂從網上下載了一個圖片，則能夠轉換成功。

2）製做縮略圖

Image 類的 thumbnail() 方法能夠用來製做縮略圖。它接受一個二元數組做爲縮略圖的尺寸，而後將示例縮小到指定尺寸。

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# @Time    : 2018/6/11 22:15
# @Author  : zhouyuyao
# @File    : demon5.py

import os, sys
from PIL import Image

for infile in sys.argv[1:]:
    outfile = os.path.splitext(infile)[0] + ".thumbnail"
    if infile != outfile:
        try:
            im   = Image.open(infile)
            x, y = im.size
            im.thumbnail((x//2, y//2))
            im.save(outfile, "JPEG")
        except IOError:
            print("cannot create thumbnail for", infile)

這裏咱們用 im.size 獲取原圖檔的尺寸，而後以 thumbnail() 製做縮略圖，大小則是原先圖檔的四分之一。一樣，若是圖檔沒法打開，則在終端上打印沒法執行的提示。

3）剪裁圖檔

按照 horizon 和 vertic 兩個變量切割當前目錄下全部圖片（包括子目錄）

import Image as img
import os

imgTypes = ['.png','.jpg','.bmp']

horizon = 8
vertic  = 1

for root, dirs, files in os.walk('.'):
    for currentFile in files:
        crtFile = root + '\\' + currentFile
        if crtFile[crtFile.rindex('.'):].lower() in imgTypes:
            crtIm = img.open(crtFile)
            crtW, crtH = crtIm.size
            hStep = crtW // horizon
            vStep = crtH // vertic
            for i in range(vertic):
                for j in range(horizon):
                    crtOutFileName = crtFile[:crtFile.rindex('.')] + \
                        '_' + str(i) + '_' + str(j)\
                        + crtFile[crtFile.rindex('.'):].lower()
                    box = (j * hStep, i * vStep, (j + 1) * hStep, (i + 1) * vStep)
                    cropped = crtIm.crop(box)
                    cropped.save(crtOutFileName)

4）變形與粘貼

transpose() 方法能夠將圖片左右顛倒、上下顛倒、旋轉 90°、旋轉 180° 或旋轉 270°。paste() 方法則能夠將一個 Image 示例粘貼到另外一個 Image 示例上。

咱們嘗試將一張圖片的左半部分截取下來，左右顛倒以後旋轉 180°；將圖片的右半邊不做更改粘貼到左半部分；最後將修改過的左半部分粘貼到右半部分。

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# @Time    : 2018/6/11 22:53
# @Author  : zhouyuyao
# @File    : demon6.py
from PIL import Image

imageFName = 'E:/GitHub/Python-Learning/LIVE_PYTHON/2018-06-10/theWayToGo.png'

def iamge_transpose(image):
    '''
        Input: a Image instance
        Output: a transposed Image instance
        Function:
            * switches the left and the right part of a Image instance
            * for the left part of the original instance, flips left and right\
                and then make it upside down.
    '''
    xsize, ysize = image.size
    xsizeLeft    = xsize // 2 # while xsizeRight = xsize - xsizeLeft

    boxLeft      = (0, 0, xsizeLeft, ysize)
    boxRight     = (xsizeLeft, 0, xsize, ysize)
    boxLeftNew   = (0, 0, xsize - xsizeLeft, ysize)
    boxRightNew  = (xsize - xsizeLeft, 0, xsize, ysize)

    partLeft     = image.crop(boxLeft).transpose(Image.FLIP_LEFT_RIGHT).\
        transpose(Image.ROTATE_180)
    partRight    = image.crop(boxRight)

    image.paste(partRight, boxLeftNew)
    image.paste(partLeft, boxRightNew)
    return image

avatar = Image.open(imageFName)
avatar = iamge_transpose(avatar)
avatar.show()

運行以後，圖片經過代碼實現了另外一種展現效果

4、生成驗證碼

首先須要安裝模塊「Pillow」

pip install pillow

代碼實現以下

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# @Time    : 2018/6/11 15:11
# @Author  : zhouyuyao
# @File    : demon3.py

import random
import string
from PIL import Image, ImageDraw, ImageFont, ImageFilter

class VerCode(object):
    def __init__(self):
        # 字體的位置，不一樣版本的系統會有不一樣
        self.font_path = 'E:/GitHub/Python-Learning/LIVE_PYTHON/2018-06-10/msyh.ttc'
        # 生成幾位數的驗證碼
        self.number = 4
        # 生成驗證碼圖片的高度和寬度
        self.size = (100, 30)
        # 背景顏色，默認爲白色
        self.bgcolor = (255, 255, 255)
        # 字體顏色，默認爲藍色
        self.fontcolor = (0, 0, 255)
        # 干擾線顏色。默認爲紅色
        self.linecolor = (255, 0, 0)
        # 是否要加入干擾線
        self.draw_line = True
        # 加入干擾線條數的上下限
        self.line_number = 20

    # 用來隨機生成一個字符串
    def gene_text(self):
        self.source = list(string.ascii_letters)
        for self.index in range(0, 10):
            self.source.append(str(self.index))
        return ''.join(random.sample(self.source, self.number))     # number是生成驗證碼的位數

    # 用來繪製干擾線
    def gene_line(self, draw, width, height):
        self.begin = (random.randint(0, width), random.randint(0, height))
        self.end = (random.randint(0, width), random.randint(0, height))
        draw.line([self.begin, self.end], fill=self.linecolor)

    # 生成驗證碼
    def gene_code(self):
        self.width, self.height = self.size                                      # 寬和高
        self.image = Image.new('RGBA', (self.width, self.height), self.bgcolor)  # 建立圖片
        self.font = ImageFont.truetype(self.font_path, 25)                       # 驗證碼的字體
        self.draw = ImageDraw.Draw(self.image)                                   # 建立畫筆
        self.text = self.gene_text()                                             # 生成字符串
        self.font_width, self.font_height = self.font.getsize(self.text)
        self.draw.text(((self.width - self.font_width) / self.number, (self.height - self.font_height) / self.number),
                       self.text, font=self.font, fill=self.fontcolor)           # 填充字符串
        if self.draw_line:
            for i in range(self.line_number):
                self.gene_line(self.draw, self.width, self.height)

    def effect(self):
        # self.image = self.image.transform((self.width + 20, self.height + 10), Image.AFFINE, (1, -0.3, 0, -0.1, 1, 0), Image.BILINEAR)                                            # 建立扭曲
        self.image = self.image.filter(ImageFilter.EDGE_ENHANCE_MORE)            # 濾鏡，邊界增強
        self.image.save('validateCodePic.png')                                   # 保存驗證碼圖片
        # self.image.show()

if __name__ == "__main__":
    # 進行封裝
    vco = VerCode()
    vco.gene_code()
    vco.effect()

運行以後，咱們獲得的一個驗證碼圖片以下所示，實際應用中咱們每次調用都會從新生成