介紹幾個python的音頻處理庫

時間 2019-11-12

原文原文鏈接

　　1、eyeD3html

　　直接在google上搜索python mp3 process ，推薦比較多的就是這個第三方庫了。先來看看官方介紹吧。python

About

eyeD3 is a Python tool for working with audio files, specifically mp3 files containing ID3 metadata (i.e. song info).linux

It provides a command-line tool (eyeD3) and a Python library (import eyed3) that can be used to write your own applications or plugins that are callable from the command-line tool.git

For example, to set some song information in an mp3 file called song.mp3:github

$ eyeD3 -a Nobunny -A "Love Visions" -t "I Am a Girlfriend" -n 4 song.mp3

簡單來講，eyeD3 這個庫只要是用來處理MP3文件的，特別是帶ID3 metadata的文件(通常MP3文件都會帶有一些額外信息，好比說歌手、專輯之類的，後面會說怎麼提取這些信息)。eyeD3 提供了兩種使用方法，一種是使用command line 直接在命令行中執行 eyeD3 --...就能夠對MP3進行處理，還有一種是在python中使用 import eyed3 導入。shell

上面的例子就是官方提供的一個使用eyeD3 命令行執行的語句，-a 是 --artist 的簡寫，即添加歌手信息，-A 是 --album的簡寫，即添加專輯信息，-t 是 --title的簡寫，即添加歌曲名字，-n 是 --track-num 的簡寫，即添加磁道數。這些通常都是 MP3文件ID3 tag 的默認屬性。咱們若是直接輸入 eyeD3 song.mp3 就會直接顯示歌曲的基本信息，大概長下面這個樣子：ubuntu

$ eyeD3 song.mp3windows

song.mp3      [ 3.06 MB ]
-------------------------------------------------------------------------
ID3 v2.4:
title: I Am a Girlfriend
artist: Nobunny
album: Love Visions
album artist: Various Artists
track: 4
-------------------------------------------------------------------------
若是你在windows下使用eyeD3,也許你會發現直接在cmd中輸入eyeD3 會提示找不到這個命令，我就遇到了這個問題。
由於我是直接pip安裝的，去site-packages目錄下，沒有找到名爲 eyeD3.exe的可執行文件，所有是.py文件，官網對此沒有說明，
去google上找了半天也沒有找到相應的解決辦法。沒辦法，只好轉到linux下試一下了。linux下pip install eyeD3 以後，而後在
shell執行 eyeD3 song.mp3 是沒有問題的。下圖1是我在ubuntu14下的執行結果（部分）：

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　圖1 

能夠看出確實常見的信息都顯示出來了。順便提一下一個事情，這首MP3是我從win下直接傳到unix下的，之前我在windows與linux下互傳文件都是經過相似百度雲的方式來傳遞的，

可是這種傳遞方式有時候太過於麻煩。因此我在網上搜索了一下，發現了Putty這個好東西。PuTTY is an SSH and telnet client, developed originally by Simon Tatham for the Windows platform.

PuTTY is open source software that is available with source code and is developed and supported by a group of volunteers.
即Putty是一個SSH(安全外殼協議)遠程登陸客戶端，專門在windows下使用的。下載地址在這:http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html。下載安裝以後，打開putty，而後會出現以下的界面（圖2）：



　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　圖2
這個時候，打開你的linux機器，固然個人是裝在vmware虛擬機裏面的了，在shell裏面輸入ifconfig 命令，找到你的linux ip地址，以下圖3所示：

　　　　　　　　　　　　　　　　　　　　　　　　　　　　圖 3

好比個人機器的地址就是 192.168.152.130。而後將這個ip地址填入 圖2 的Host Name 一欄，注意默認端口爲22，不要去改它，而後點擊open，就會彈出一個登錄界面，接着輸入你在linux
下登錄的用戶名和密碼就能夠ssh連上linux啦。http://www.infoworld.com/article/2617683/linux/linux-moving-files-between-unix-and-windows-systems.html這篇文章介紹了
幾種在unix與windows之間傳遞文件的方法。我只講第一種，使用putty的pscp的secure copy（安全拷貝）。你安裝了putty以後，會自帶一個叫 pscp.exe的可執行文件，將其目錄加入windows
環境變量。而後 cmd執行命令 pscp myfile.txt shs@unixserver:/home/shs 便可以把你的文件 myfile.txt 複製到 linux 下的 /home/shs 目錄下了。解釋一下，你在執行的時候須要把shs
換成你的用戶名，@後面跟你的linux ip地址， /home/shs 是你要把文件移動的位置。執行完以後，文件就會成功拷貝過去了。其餘方法不講了，有興趣能夠自行搜索，或者看我上面說的那篇文章。
　　回到正題，eyeD3 命令行講完以後，再來講下如何在Python中使用。仍是看官方給的例子吧：

 1 import eyed3
 2 
 3 audiofile = eyed3.load("song.mp3")
 4 audiofile.tag.artist = u"Nobunny"
 5 audiofile.tag.album = u"Love Visions"
 6 audiofile.tag.album_artist = u"Various Artists"
 7 audiofile.tag.title = u"I Am a Girlfriend"
 8 audiofile.tag.track_num = 4
 9 
10 audiofile.tag.save()

上面的代碼，使用 import eyed3 導入eyeD3 庫，而後使用load方法加載mp3文件，後面的幾行分別是設置 artist,album等等 ID3 tag ，直接看代碼就能看出來，就不說了。若是想顯示mp3文件內部的ID3 tag信息，直接print 相應的tag就好了，好比 print(audiofile.tag.artist)等等，固然，前提是你的MP3 metadata得儲存了這些信息。其實還有一些更復雜和高級的用法，我就不講了，你們有興趣直接去官方文檔看吧，地址：http://eyed3.nicfit.net/index.html。eyeD3 主要就是處理 MP3文件的metadata的，至於解析音頻之類的就得用其餘的庫了。安全

2、pydubmarkdown

　　第一個介紹的eyeD3 通常只能處理MP3文件，功能上相對來講也是比較簡單一點。下面介紹的pydub庫就要強大的多。老規矩，仍是
先看一下它的官方介紹：
Manipulate audio with a simple and easy high level interface http://pydub.com 就一句話，簡單，易用的處理音頻
的高度抽象的接口，嘿，這不就是咱們要找的麼。github項目地址爲：https://github.com/jiaaro/pydub/ 有1800多的star，說明這個
庫仍是很受歡迎的。安裝直接很簡單，直接 pip install pydub 就能夠安裝。可是須要注意的是：

Dependencies

You can open and save WAV files with pure python. For opening and saving non-wav files – like mp3 – you'll need ffmpeg orlibav.

這裏是說python自帶的wave模塊只能處理 wav 格式的音頻文件，若是要想處理相似MP3格式的文件，就得要裝 ffmpeg或者libav了。

什麼是ffmpeg 呢？

A complete, cross-platform solution to record, convert and stream audio and video.

ffmpeg 是一個跨平臺的能夠用來記錄、轉化音頻與視頻的工具，若是你作過數字信號處理方面的工做，對它應該不陌生。還有一個libav，實際上是從ffmpeg分出來的一個分支，功能和 ffmpeg差很少，兩者你任選一個下載就能夠了。windows下直接選擇可執行文件安裝便可。

仍是看官網的例子來介紹吧。

I：打開 mp3或者mp4等文件

能夠採用以下的命令：

 1 from pydub import AudioSegment
 2 
 3 song = AudioSegment.from_wav("never_gonna_give_you_up.wav")
 4 song = AudioSegment.from_mp3("never_gonna_give_you_up.mp3")
 5 
 6 ogg_version = AudioSegment.from_ogg("never_gonna_give_you_up.ogg")
 7 flv_version = AudioSegment.from_flv("never_gonna_give_you_up.flv")
 8 
 9 mp4_version = AudioSegment.from_file("never_gonna_give_you_up.mp4", "mp4")
10 wma_version = AudioSegment.from_file("never_gonna_give_you_up.wma", "wma")
11 aac_version = AudioSegment.from_file("never_gonna_give_you_up.aiff", "aac")

能夠打開任何 ffmpeg支持的文件類型，從上面能夠看出，主要有 from_filetype()方法,filetype爲具體的文件類型，好比 wav,mp3等

或者通用的 from_file()方法，可是這個方法必須在第二個參數指定打開文件的類型，返回的結果都是 AudioSegment對象。

II：切割音頻

1 # pydub does things in milliseconds
2 ten_seconds = 10 * 1000
3 
4 first_10_seconds = song[:ten_seconds]
5 
6 last_5_seconds = song[-5000:]

注意pydub中的標準時間爲毫秒，上面的代碼就獲得了音樂的前10秒和後5秒，很是簡單。

III：調整音量

1 # boost volume by 6dB
2 beginning = first_10_seconds + 6
3 
4 # reduce volume by 3dB
5 end = last_5_seconds - 3

+6 就表示將音樂的音量提升6分貝，-3就表示將音樂的音量下降3分貝

IV: 拼接兩段音樂

without_the_middle = beginning + end

without_the_middle.duration_seconds

拼接以後的音樂時長是兩段音樂時長之和，能夠經過 .duration_seconds方法來獲取一段音樂的時長。這與使用 len(audio)/1000.0獲得的結果是同樣的。

V：將音樂翻轉(reverse)

1 # song is not modified
2 # AudioSegments are immutable
3 backwards = song.reverse()

注意 AudioSegment 對象是不可變的，上面使用reverse 方法不會改變song這個對象，而是會返回一個新的AudioSegment對象，其餘的方法也是這樣，須要注意。reverse簡單來講就是將音樂從尾部向頭部開始逆序播放，我試了一下，發現轉換以後還真的挺有意思的。

VI：crossfade(交叉漸入漸出方法)

1 # 1.5 second crossfade
2 with_style = beginning.append(end, crossfade=1500)

crossfade 就是讓一段音樂平緩地過渡到另外一段音樂，上面的crossfade = 1500 表示過渡的時間是1.5秒。

VII：repeat（重複音樂片斷）

# repeat the clip twice
do_it_over = with_style * 2

上面的代碼讓音樂重複播放兩次

VIII：fade in and fade out（逐漸加強與逐漸減弱）

# 2 sec fade in, 3 sec fade out
awesome = do_it_over.fade_in(2000).fade_out(3000)

逐漸加強2秒，逐漸減弱3秒

XI：save(保存)

awesome.export("mashup.mp3", format="mp3")

awesome.export("mashup.mp3", format="mp3", tags={'artist': 'Various artists', 'album': 'Best of 2011', 'comments': 'This album is awesome!'})

這裏展現了兩種保存的形式，都是使用export方法，要指定保存的格式，使用format 參數，但第二種方法多了一個tags參數，其實看一下應該就很容易明白，是保存歌曲ID3 tag信息的。

以上只是pydub 使用方法的初步介紹，還有其餘很是多的功能，請自行移步官方API 文檔：https://github.com/jiaaro/pydub/blob/master/API.markdown

介紹的很是詳細。

3、PyAudio

又是一個功能強大的處理音頻庫。官方介紹：

PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library. With PyAudio, you can easily use Python to play and record audio on a variety of platforms. PyAudio is inspired by:

pyPortAudio/fastaudio: Python bindings for PortAudio v18 API.
tkSnack: cross-platform sound toolkit for Tcl/Tk and Python.

Pyaudio 提供了對於跨平臺的 PortAudio（處理 audio輸入輸出的庫）的綁定，PyAudio可讓你輕鬆錄製與播放音頻。

廢話很少說，直接看官方文檔（https://people.csail.mit.edu/hubert/pyaudio/docs/）提供的一個quick start 的代碼

 1 """PyAudio Example: Play a wave file."""
 2 
 3 import pyaudio
 4 import wave
 5 import sys
 6 
 7 CHUNK = 1024
 8 
 9 if len(sys.argv) < 2:
10     print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
11     sys.exit(-1)
12 
13 wf = wave.open(sys.argv[1], 'rb')
14 
15 # instantiate PyAudio (1)
16 p = pyaudio.PyAudio()
17 
18 # open stream (2)
19 stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
20                 channels=wf.getnchannels(),
21                 rate=wf.getframerate(),
22                 output=True)
23 
24 # read data
25 data = wf.readframes(CHUNK)
26 
27 # play stream (3)
28 while len(data) > 0:
29     stream.write(data)
30     data = wf.readframes(CHUNK)
31 
32 # stop stream (4)
33 stream.stop_stream()
34 stream.close()
35 
36 # close PyAudio (5)
37 p.terminate()

固然，這個提供的是使用命令行參數接收音頻文件的形式，CHUNK 是一次讀取的音頻byte數量，p = pyaudio.PyAudio()初始化一個

PyAudio對象，而後使用其open方法打開一個輸入輸出流，這裏指定了output=True說明這是一個輸出流，即咱們是往stream中添加data，若是這裏改成 input = True就是變成輸入流了，通常是從設備的標準 audio device ，對於電腦來講可能就是麥克風了，來讀取音頻data。使用wave打開一個 .wav 文件，而後使用 readframes方法每次讀取 CHUNK 這麼多的數據，將數據寫入 stream，直到讀完爲止。寫入stream的audio data 就會不斷經過麥克風播放出來了，因而咱們就能夠聽到音樂了。最後在結束的時候，注意要關閉相應的對象以釋放資源。

還有一種方法是使用callback（回調函數）函數，代碼以下：

 1 """PyAudio Example: Play a wave file (callback version)."""
 2 
 3 import pyaudio
 4 import wave
 5 import time
 6 import sys
 7 
 8 if len(sys.argv) < 2:
 9     print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
10     sys.exit(-1)
11 
12 wf = wave.open(sys.argv[1], 'rb')
13 
14 # instantiate PyAudio (1)
15 p = pyaudio.PyAudio()
16 
17 # define callback (2)
18 def callback(in_data, frame_count, time_info, status):
19     data = wf.readframes(frame_count)
20     return (data, pyaudio.paContinue)
21 
22 # open stream using callback (3)
23 stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
24                 channels=wf.getnchannels(),
25                 rate=wf.getframerate(),
26                 output=True,
27                 stream_callback=callback)
28 
29 # start the stream (4)
30 stream.start_stream()
31 
32 # wait for stream to finish (5)
33 while stream.is_active():
34     time.sleep(0.1)
35 
36 # stop stream (6)
37 stream.stop_stream()
38 stream.close()
39 wf.close()
40 
41 # close PyAudio (7)
42 p.terminate()

不細說了。

下面來看一個使用pyaudio + numpy + pylab 可視化音頻的代碼，下面的代碼打開電腦的麥克風，而後接受音頻輸入，再以圖像的形式展現出來。

 1 # -*- coding: utf-8 -*-
 2 """
 3 Created on Fri May 12 10:30:00 2017
 4 @author: Lyrichu
 5 @description: show the sound in graphs
 6 """
 7 import pyaudio
 8 import numpy as np
 9 import pylab
10 import time
11 
12 RATE = 44100
13 CHUNK = int(RATE/20) # RATE/number of updates per second
14 
15 def sound_plot(stream):
16     t1 = time.time() # time starting
17     data = np.fromstring(stream.read(CHUNK),dtype = np.int16)
18     pylab.plot(data)
19     pylab.title(i)
20     pylab.grid()
21     pylab.axis([0,len(data),-2**8,2**8])
22     pylab.savefig("sound.png",dpi=50)
23     pylab.show(block = False)
　　　  time.sleep(0.5)
24     pylab.close('all')
25     print("took %.2f ms." % (time.time() - t1)*1000)
26 
27 if __name__ == '__main__':
28     p = pyaudio.PyAudio()
29     stream = p.open(format = pyaudio.paInt16,channels = 1,rate = RATE,
30                     input = True,frames_per_buffer = CHUNK)
31     for i in range(int(20*RATE/CHUNK)): 
32         # for 10 seconds
33         sound_plot(stream)
34     stream.stop_stream()
35     stream.close()
36     p.terminate()

代碼應該比較容易理解。獲得的大概是像下面這樣的圖形（圖4）：

圖 4

須要注意的是，若是不是在交互式命令下執行pylab或者matplotlib的plot命令，其plt.show()函數是一個block函數，這會致使最後的

plt.close('all') 關閉全部的窗口只會在手動關閉了圖像以後纔會執行，全部咱們沒法看到連續變化的圖像，爲了解決這個問題，咱們將plt.show()函數block參數設爲False，這樣show函數就不是block函數了，能夠直接執行plt.close('all')命令，爲了避免由於圖像刷新太快咱們看不清變化，因此使用time.sleep(0.5) 暫停0.5秒。

其實還沒介紹完，還有pygame模塊（python的一個作遊戲的模塊）以及librosa庫（專業的數字信號處理庫）等沒有講，等有機會再更吧。敬請關注！

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。