音頻數據加強及python實現

時間 2020-10-03

原文原文鏈接

博客做者：凌逆戰html

博客地址：http://www.javashuo.com/article/p-xuaoyyer-nt.htmlpython

　　音頻時域波形具備如下特徵：音調，響度，質量。咱們在進行數據加強時，最好只作一些小改動，使得加強數據和源數據存在較小差別便可，切記不能改變原有數據的結構，否則將產生「髒數據」，經過對音頻數據進行數據加強，能有助於咱們的模型避免過分擬合併變得更加通用。數組

　　我發現對聲波的如下改變是有用的：Noise addition（增長噪音）、增長混響、Time shifting（時移）、Pitch shifting（改變音調）和Time stretching（時間拉伸）。dom

本章須要使用的python庫：ide

matplotlib：繪製圖像
librosa：音頻數據處理
numpy：矩陣數據處理

使用先畫出原始語音數據的語譜圖和波形圖函數

import librosa
import numpy as np
import matplotlib.pyplot as plt

plt.rcParams['font.sans-serif'] = ['SimHei']  # 用來正常顯示中文標籤
plt.rcParams['axes.unicode_minus'] = False  # 用來正常顯示符號
fs = 16000

wav_data, _ = librosa.load("./p225_001.wav", sr=fs, mono=True)

# ########### 畫圖
plt.subplot(2, 2, 1)
plt.title("語譜圖", fontsize=15)
plt.specgram(wav_data, Fs=16000, scale_by_freq=True, sides='default', cmap="jet")
plt.xlabel('秒/s', fontsize=15)
plt.ylabel('頻率/Hz', fontsize=15)

plt.subplot(2, 2, 2)
plt.title("波形圖", fontsize=15)
time = np.arange(0, len(wav_data)) * (1.0 / fs)
plt.plot(time, wav_data)
plt.xlabel('秒/s', fontsize=15)
plt.ylabel('振幅', fontsize=15)

plt.tight_layout()
plt.show()

加噪

　　添加的噪聲爲均值爲0，標準差爲1的高斯白噪聲，有兩種方法對數據進行加噪spa

第一種：控制噪聲因子code

def add_noise1(x, w=0.004):
    # w：噪聲因子
    output = x + w * np.random.normal(loc=0, scale=1, size=len(x))
    return output

Augmentation = add_noise1(x=wav_data, w=0.004)

第二種：控制信噪比orm

　　經過信噪比的公式推導出噪聲。htm

$$SNR=10*log_{10}(\frac{S^2}{N^2})$$

$$N=\sqrt{\frac{S^2}{10^{\frac{SNR}{10}}}}$$

def add_noise2(x, snr):
    # snr：生成的語音信噪比
    P_signal = np.sum(abs(x) ** 2) / len(x)  # 信號功率
    P_noise = P_signal / 10 ** (snr / 10.0)  # 噪聲功率
    return x + np.random.randn(len(x)) * np.sqrt(P_noise)

Augmentation = add_noise2(x=wav_data, snr=50)

波形位移

　　語音波形移動使用numpy.roll函數向右移動shift距離

numpy.roll(a, shift, axis=None)

參數：

a：數組
shift：滾動的長度
axis：滾動的維度。0爲垂直滾動，1爲水平滾動，參數爲None時，會先將數組扁平化，進行滾動操做後，恢復原始形狀

x = np.arange(10)
# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

print(np.roll(x, 2))
# array([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])

波形位移函數：

def time_shift(x, shift):
    # shift：移動的長度
    return np.roll(x, int(shift))

Augmentation = time_shift(wav_data, shift=fs//2)

波形拉伸

　　在不影響音高的狀況下改變聲音的速度 / 持續時間。這可使用librosa的time_stretch函數來實現。

def time_stretch(x, rate):
    # rate：拉伸的尺寸，
    # rate > 1 加快速度
    # rate < 1 放慢速度
    return librosa.effects.time_stretch(x, rate)

Augmentation = time_stretch(wav_data, rate=2)

音高修正（Pitch Shifting）

　　音高修正只改變音高而不影響音速，我發現-5到5之間的步數更合適

def pitch_shifting(x, sr, n_steps, bins_per_octave=12):
    # sr: 音頻採樣率
    # n_steps: 要移動多少步
    # bins_per_octave: 每一個八度音階(半音)多少步
    return librosa.effects.pitch_shift(x, sr, n_steps, bins_per_octave=bins_per_octave)

# 向上移三音（若是bins_per_octave爲12，則六步）
Augmentation = pitch_shifting(wav_data, sr=fs, n_steps=6, bins_per_octave=12)
# 向上移三音（若是bins_per_octave爲24，則3步）
Augmentation = pitch_shifting(wav_data, sr=fs, n_steps=3, bins_per_octave=24)
# 向下移三音（若是bins_per_octave爲12，則六步）
Augmentation = pitch_shifting(wav_data, sr=fs, n_steps=-6, bins_per_octave=12)