1. 背景html
當一段時域信號很長時,一般咱們須要將一長段信號切成一小段一小段的信號進行處理,好比 短時傅里葉變換stft或小波wavelet變換等等。python
一般,爲了信號的平滑過渡,N個一小段信號中 , 前一個小段信號與後一個小段信號之間存在着一段重合的部分,咱們叫作overlap。數組
在前一段隨筆(如何將聲學的spectrogram(聲譜圖)從新反變換成時域語音信號 )中,咱們也遇到過這種分幀形式。less
2. 實現方法 (python代碼爲主)ide
不管哪一種方法,首先咱們要獲取一個概況:函數
假設咱們有一個信號 sigData, 數據總長爲sigLen,咱們每一幀的數據個數爲blkSize, 重合的百分比爲 Overlapspa
stepSize : 那麼每次咱們向前移動的數據個數stepSize 爲 int( blkSize*(1-Overlap) ) ,且必須大於1。code
frameNumSize: 一共會分爲的數據塊個數 frameNumSize : frameNumSize = 1+ floor ( (Length(sigData) - blkSize) / stepSize )htm
2.1 循環取數的方法blog
#%% method 1 import numpy as np def cut_to_sigBlks_test1(sigData,blkSize,Overlap): if Overlap > 1: Overlap = Overlap/100 # 1.獲取其實idx的step ,因爲overlap 存在 ,stepSize 小於等於blkSize sigLen = np.size(sigData) stepSize = int( np.floor(blkSize*(1-Overlap)) ) if stepSize < 1: stepSize =int(1) frameNumSize = int( ((sigLen-blkSize)//stepSize) +1) # 得到一共有多少個 片斷 # 2.3 循環得到數據 sigBlks = np.zeros((frameNumSize,blkSize),dtype= sigData.dtype)for i in np.arange(frameNumSize): sigBlks[i,:] = sigData[i*stepSize:i*stepSize+blkSize] return sigBlks #%% Test sigData = np.arange(20) blkSize = 7 Overlap = 0.3 sigBlks = cut_to_sigBlks_test1(sigData,blkSize,Overlap) print('sigData: \n',sigData) print('sigBlks: \n',sigBlks)
顯示結果爲:
sigData:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
sigBlks:
[[ 0 1 2 3 4 5 6]
[ 4 5 6 7 8 9 10]
[ 8 9 10 11 12 13 14]
[12 13 14 15 16 17 18 ] ]
2.2 引索取數方法
#%% method 2 import numpy as np def cut_to_sigBlks_test2(sigData,blkSize,Overlap): if Overlap > 1: return print('overlap need less than 1') Overlap = Overlap/100 # 1.獲取其實idx的step ,因爲overlap 存在 ,stepSize 小於等於blkSize sigLen = np.size(sigData) stepSize = int( np.floor(blkSize*(1-Overlap)) ) if stepSize < 1: stepSize =int(1) frameNumSize = int( ((sigLen-blkSize)//stepSize) +1) # 得到一共有多少個 片斷 # 2.2 method 2 得到idxArray, [向量化方法] # 生成 引索數組, 大小爲 row nums = frameNumSize, col nums = blocksize # 生成開始引索序列,間隔爲 stepSize ,考慮上 overlap startIdxArry = np.arange(0,stepSize*frameNumSize,stepSize) # 生成信號分塊的引索數組,按行分塊 idxArray = np.tile(np.r_[0:blkSize],(frameNumSize,1)) + startIdxArry[:,np.newaxis] sigBlks = sigData[idxArray] return sigBlks #%% Test sigData = np.arange(20) sigData.astype(np.float64) blkSize = 7 Overlap = 0.3 # sigBlks = cut_to_sigBlks_test1(sigData,blkSize,Overlap) sigBlks = cut_to_sigBlks_test2(sigData,blkSize,Overlap) print('sigData: \n',sigData) print('sigBlks: \n',sigBlks)
顯示結果爲:
sigData:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
sigBlks:
[[ 0 1 2 3 4 5 6]
[ 4 5 6 7 8 9 10]
[ 8 9 10 11 12 13 14]
[12 13 14 15 16 17 18 ] ]
2.3 使用python 中 as_strides 方法,至關於引索,不過是numpy內置的引索函數,不過要求必須是內存中連續存放的一段數據。 stride至關於上文中的step
#%% method 3 import numpy as np def cut_to_sigBlks_test3(sigData,blkSize,Overlap,axis=0): if Overlap > 1: return print('overlap need less than 1') Overlap = Overlap/100 # 1.獲取其實idx的step ,因爲overlap 存在 ,stepSize 小於等於blkSize sigLen = np.size(sigData) stepSize = int( np.floor(blkSize*(1-Overlap)) ) if stepSize < 1: stepSize =int(1) frameNumSize = int( ((sigLen-blkSize)//stepSize) +1) # 得到一共有多少個 片斷 # 2.2 method 3 得到idxArray, [向量化方法] sigData = np.ascontiguousarray(sigData) # 將x轉化爲連續內存存儲 strides = np.asarray(sigData.strides) new_stride = np.prod(strides[strides > 0] // sigData.itemsize) * sigData.itemsize axis=0 # 切分數據 按行存儲 if axis == -1: shape = list(sigData.shape)[:-1] + [blkSize, frameNumSize] strides = list(strides) + [stepSize * new_stride] elif axis == 0: shape = [frameNumSize, blkSize] + list(sigData.shape)[1:] strides = [stepSize * new_stride] + list(strides) else: print('error') sigBlks = np.lib.stride_tricks.as_strided(sigData, shape=shape, strides=strides) return sigBlks #%% Test sigData = np.arange(20) sigData.astype(np.float64) blkSize = 7 Overlap = 0.3 # sigBlks = cut_to_sigBlks_test1(sigData,blkSize,Overlap) # sigBlks = cut_to_sigBlks_test2(sigData,blkSize,Overlap) sigBlks = cut_to_sigBlks_test3(sigData,blkSize,Overlap) print('sigData: \n',sigData) print('sigBlks: \n',sigBlks)
顯示結果爲:
sigData:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
sigBlks:
[[ 0 1 2 3 4 5 6]
[ 4 5 6 7 8 9 10]
[ 8 9 10 11 12 13 14]
[12 13 14 15 16 17 18 ] ]
3.比較這3中運算的時間效率
這三種方法中,無疑越日後方法越好,第一種是方便理解的循環思惟,第二種是向量化思惟,第三種也是向量化思惟同時運用了一個numpy庫的as_stride性質
第三種的運算時間比較短
建立一個1000000個數據點,每1024個點分幀,overlap = 0.3。每種方法循環1000次,用的時間分別爲:
#%% Test cost time import time as time sigData = np.arange(1000000) sigData = np.array(sigData,dtype='float64') blkSize = 1024 Overlap = 0.3 st= time.time() for i in np.arange(100): sigBlks1 = cut_to_sigBlks_test1(sigData,blkSize,Overlap) et= time.time() print('cut_to_sigBlks_test1:',et-st) st= time.time() for i in np.arange(100): sigBlks2 = cut_to_sigBlks_test2(sigData,blkSize,Overlap) et= time.time() print('cut_to_sigBlks_test2:',et-st) st= time.time() for i in np.arange(100): sigBlks3 = cut_to_sigBlks_test3(sigData,blkSize,Overlap) et= time.time() print('cut_to_sigBlks_test3:',et-st)
cut_to_sigBlks_test1: 1.0691425800323486
cut_to_sigBlks_test2: 1.8650140762329102
cut_to_sigBlks_test3: 0.003989458084106445
可見耗時 爲 method 3 < method 1 < method 2
原本覺得第一種比第二種方法耗時間長,實驗出乎意料啊。不過第二種寫法更優美,哈哈!