Python製做詞雲視頻，經過詞雲圖來看小姐姐跳舞

時間 2021-02-21

標籤 php python web 數組 ruby bash app dom ide 工具欄目 Python 简体版

原文原文鏈接

本文用 python 作了一個詞雲圖視頻，視頻左半部分是小姐姐跳舞視頻，右半部分是根據動做生成的的詞雲視頻php

製做過程分爲如下幾個部分python

1.視頻下載

首先須要下載一個小姐姐跳舞的視頻，這裏我用的是 you-get 工具，可藉助 Python 的 pip 命令進行安裝web

pip install you-get

you-get 支持下載平臺包括：Youtube、Blili、TED、騰訊、優酷、愛奇藝(涵蓋全部視頻平臺下載連接)，數組

以 youtube 視頻爲例，you-get 下載命令ruby

you-get -o ~/Videos(存放視頻路徑) -O zoo.webm(視頻命名) 'https://www.youtube.com/watch?v=jNQXAC9IVRw'

這裏經過 os 模塊來實現 you-get 下載命令，使用時傳入三個參數便可：1，視頻連接，2，要存放視頻的文件路徑；3，視頻命名；bash

def download(video_url,save_path,video_name): ''' youget 下載視頻 :param video_url:視頻連接 :param save_path: 保存路徑 :param video_name: 視頻命名 :return: ''' cmd = 'you-get -o {} -O {} {}'.format(save_path,video_name,video_url) res = os.popen(cmd,) res.encoding = 'utf-8' print(res.read())# 打印輸出

關於 you-get 更多用法，可參考官網，裏面關於用法介紹的很是詳細：app

https://you-get.org/#getting-starteddom

2.B 站彈幕下載

作詞雲圖須要有文本數據支持，這裏選取 B 站彈幕爲素材；關於 B 站視頻彈幕下載方式，這裏一個快捷方法，用 requests 訪問指定視頻的 API 接口，就能獲得該視頻下的所有彈幕ide

http://comment.bilibili.com/{cid}.xml # cid 爲B站視頻的cid 編號

但 API 接口的構造，須要知道視頻的 cid 編號工具

B站視頻 cid 編號獲取方式：F12打開開發者模式->NetWork->XHR->v2?cid=.... 連接，該網頁連接中有一個」cid=一串數字「的字符串，其中等號後面的連續數字就是該視頻的 cid 編號

以上面視頻爲例，291424805 就是這個視頻的 cid 編號，

有了 cid 以後，經過 requests 請求 API 接口，就能獲取到裏面的彈幕數據

http://comment.bilibili.com/291424805.xml

def download_danmu(): '''彈幕下載並存儲''' cid = '141367679'# video_id url = 'http://comment.bilibili.com/{}.xml'.format(cid) f = open('danmu.txt','w+',encoding='utf-8') #打開 txt 文件 res = requests.get(url) res.encoding = 'utf-8' soup = BeautifulSoup(res.text,'lxml') items = soup.find_all('d')# 找到 d 標籤 for item in items: text = item.text print('---------------------------------'*10) print(text) seg_list = jieba.cut(text,cut_all =True)# 對字符串進行分詞處理，方便後面製做詞雲圖 for j in seg_list: print(j) f.write(j) f.write('\n') f.close()

3.視頻切幀，人像分割

下載到視頻以後，先把視頻拆分紅一幀一幀圖像；

vc = cv2.VideoCapture(video_path)
    c =0 if vc.isOpened(): rval,frame = vc.read()# 讀取視頻幀 else: rval=False while rval: rval,frame = vc.read()# 讀取每一視頻幀，並保存至圖片中 cv2.imwrite(os.path.join(Pic_path,'{}.jpg'.format(c)),frame) c += 1 print('第 {} 張圖片存放成功！'.format(c))

對每一幀中的小姐姐進行識別提取，也就是人像分割，這裏藉助了百度 API 接口，

APP_ID = "23633750" API_KEY = 'uqnHjMZfChbDHvPqWgjeZHCR' SECRET_KEY = '************************************' client = AipBodyAnalysis(APP_ID, API_KEY, SECRET_KEY) # 文件夾 jpg_file = os.listdir(jpg_path) # 要保存的文件夾 for i in jpg_file: open_file = os.path.join(jpg_path,i) save_file = os.path.join(save_path,i) if not os.path.exists(save_file):#文件不存在時，進行下步操做 img = cv2.imread(open_file) # 獲取圖像尺寸 height, width, _ = img.shape if crop_path:# 若Crop_path 不爲 None,則不進行裁剪 crop_file = os.path.join(crop_path,i) img = img[100:-1,300:-400] #圖片太大，對圖像進行裁剪裏面參數根據本身狀況設定 cv2.imwrite(crop_file,img) image= get_file_content(crop_file) else: image = get_file_content(open_file) res = client.bodySeg(image)#調用百度API 對人像進行分割 labelmap = base64.b64decode(res['labelmap']) labelimg = np.frombuffer(labelmap,np.uint8)# 轉化爲np數組 0-255 labelimg = cv2.imdecode(labelimg,1) labelimg = cv2.resize(labelimg,(width,height),interpolation=cv2.INTER_NEAREST) img_new = np.where(labelimg==1,255,labelimg)# 將 1 轉化爲 255 cv2.imwrite(save_file,img_new) print(save_file,'save successfully')

將含有人像的圖像轉化爲二值化圖像，前景爲人物，其他部分爲背景

API 使用以前須要用本身帳號在百度智能雲平臺建立一個人體分析應用，裏面須要三個參數：ID、AK、SK

關於百度 API 使用方法，可參考官方文檔資料

4.對分割後的圖像製做詞雲圖

根據步驟 3 中獲得了小姐姐人像 Mask，

藉助 wordcloud 詞雲庫和採集到的彈幕信息，對每一張二值化圖像繪製詞雲圖(在製做以前，請確保每一張都是二值化圖像，所有爲黑色像素圖像須要剔除)

word_list = []
    with open('danmu.txt',encoding='utf-8') as f: con = f.read().split('\n')# 讀取txt文本詞雲文本 for i in con: if re.findall('[\u4e00-\u9fa5]+', str(i), re.S): #去除無中文的詞頻 word_list.append(i) for i in os.listdir(mask_path): open_file = os.path.join(mask_path,i) save_file = os.path.join(cloud_path,i) if not os.path.exists(save_file): # 隨機索引前 start 頻率詞 start = random.randint(0, 15) word_counts = collections.Counter(word_list) word_counts = dict(word_counts.most_common()[start:]) background = 255- np.array(Image.open(open_file)) wc =WordCloud( background_color='black', max_words=500, mask=background, mode = 'RGB', font_path ="D:/Data/fonts/HGXK_CNKI.ttf",# 設置字體路徑，用於設置中文, ).generate_from_frequencies(word_counts) wc.to_file(save_file) print(save_file,'Save Sucessfully!')

5.圖片拼接，合成視頻

詞雲圖所有生成完畢以後，若是一張一張圖像看確定沒意思，若是把處理後的詞雲圖合成視頻會更酷一點！

爲了視頻先後對比效果這裏我多加了一個步驟，在合併以前先對原圖和詞雲圖進行拼接，合成效果以下：

num_list = [int(str(i).split('.')[0]) for i in os.listdir(origin_path)] fps = 24# 視頻幀率，越大越流暢 height,width,_=cv2.imread(os.path.join(origin_path,'{}.jpg'.format(num_list[0]))).shape # 視頻高度和寬度 width = width*2 # 建立一個寫入操做; video_writer = cv2.VideoWriter(video_path,cv2.VideoWriter_fourcc(*'mp4v'),fps,(width,height)) for i in sorted(num_list): i = '{}.jpg'.format(i) ori_jpg = os.path.join(origin_path,str(i)) word_jpg = os.path.join(wordart_path,str(i)) # com_jpg = os.path.join(Composite_path,str(i)) ori_arr = cv2.imread(ori_jpg) word_arr = cv2.imread(word_jpg) # 利用 Numpy 進行拼接 com_arr = np.hstack((ori_arr,word_arr)) # cv2.imwrite(com_jpg,com_arr)# 合成圖保存 video_writer.write(com_arr) # 將每一幀畫面寫入視頻流中 print("{} Save Sucessfully---------".format(ori_jpg))

再加上背景音樂，視頻又能提高一個檔次~

最後

關於視頻中的素材，特此聲明彈幕取自B站 Up 主半佛仙人《【半佛】你知道奶茶加盟到底有多坑人嗎？》小姐姐跳舞視頻取自Youtube Channel Lilifilm Official 《LILI's FILM #3 - LISA Dance Performance Video》最後，感謝你們的閱讀，咱們下期見！

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。