基於 WebRTC 實現自定義編碼分辨率發送

時間 2021-05-31

標籤算法微信併發 ide 源碼分析學習編碼 spa 設計 code 欄目字符編碼简体版

原文原文鏈接

2020年若是問什麼技術領域最火？毫無疑問：音視頻。2020年遠程辦公和在線教育的強勢發展，都離不開音視頻的身影，視頻會議、在線教學、娛樂直播等都是音視頻的典型應用場景。算法

更加豐富的使用場景更須要咱們考慮如何提供更多的可配置能力項，好比分辨率、幀率、碼率等，以實現更好的用戶體驗。本文將主要從「分辨率」展開具體分享。微信

如何實現自定義編碼分辨率

咱們先來看看「分辨率」的定義。分辨率：是度量圖像內像素數據量多少的一個參數，是衡量一幀圖像或視頻質量的關鍵指標。分辨率越高，圖像體積（字節數）越大，畫質越好。對於一個YUV i420 格式、分辨率 1080p 的視頻流來講，一幀圖像的體積爲 1920x1080x1.5x8/1024/1024≈23.73Mbit，幀率 30，則 1s 的大小是 30x23.73≈711.9Mbit。可見數據量之大，對碼率要求之高，因此在實際傳輸過程當中就須要對視頻進行壓縮編碼。所以，視頻採集設備採集出的原始數據分辨率咱們稱採集分辨率，實際送進編碼器的數據分辨率咱們就稱之爲編碼分辨率。併發

視頻畫面是否清晰、比例是否合適，這些都會直接影響用戶體驗。攝像頭採集分辨率的選擇是有限的，有時咱們想要的分辨率並不能直接經過攝像頭採集到。那麼，根據場景配置合適編碼分辨率的能力就相當重要了。如何將採集到的視頻轉換成咱們想要的編碼分辨率去發送？這就是咱們今天的主要分享的內容。ide

WebRTC 是 Google 開源的，功能強大的實時音視頻項目，市面上大多開發者都是基於 WebRTC 構建實時音視頻通訊的解決方案。在 WebRTC 中各個模塊都有很好的抽象解耦處理, 對咱們進行二次開發很是友好。在咱們構建實時音視頻通訊解決方案時，須要去了解和學習 WebRTC 的設計思想及代碼模塊，並具有二次開發和擴展的能力。本文咱們基於 WebRTC Release 72 版本，聊聊如何實現自定義編碼分辨率。源碼分析

首先，咱們思考下面幾個問題：學習

視頻數據從採集到編碼發送，其 Pipeline 是怎樣的？
怎麼根據設置的編碼分辨率選擇合適的採集分辨率？
怎麼能獲得想要的編碼分辨率？

本文內容也將從以上三點展開具體分享。編碼

視頻數據的 Pipeline

首先，咱們來了解一下視頻數據的 Pipeline。視頻數據由 VideoCapturer 產生，VideoCapturer 採集數據後通過 VideoAdapter 處理，而後經由 VideoSource 的 VideoBroadcaster 分發給註冊的 VideoSink ，VideoSink 即編碼器 Encoder Sink 和本地預覽 Preview Sink。spa

對視頻分辨率來講，流程是：將想要的分辨率設置給 VideoCapturer，VideoCapturer 選擇合適的分辨率去採集，原始的採集分辨率數據再通過 VideoAdapter 計算，不符合預期後再進行縮放裁剪獲得編碼分辨率的視頻數據，將數據再送進編碼器編碼後發送。設計

這裏就有兩個關鍵性問題：code

VideoCapturer 如何選擇合適的採集分辨率？
VideoAdapter 如何將採集分辨率轉換成編碼分辨率？

如何選擇合適的採集分辨率

採集分辨率的選擇

WebRTC 中對視頻採集抽象出一個 Base 類：videocapturer.cc，咱們把抽象稱爲 VideoCapturer，在 VideoCapturer 中設置參數屬性，好比視頻分辨率、幀率、支持的像素格式等，VideoCapturer 將根據設置的參數，計算出最佳的採集格式，再用這個採集格式去調用各個平臺的 VDM（Video Device Module，視頻硬件設備模塊）。具體的設置以下：

代碼摘自 WebRTC 中 src/media/base/videocapturer.h

VideoCapturer.h
bool GetBestCaptureFormat(const VideoFormat& desired, VideoFormat* best_format);//內部遍歷設備支持的全部採集格式調用GetFormatDistance()計算出每一個格式的distance，選出distance最小的那個格式
int64_t GetFormatDistance(const VideoFormat& desired, const VideoFormat& supported);//根據算法計算出設備支持的格式與咱們想要的採集格式的差距，distance爲0即恰好知足咱們的設置
void SetSupportedFormats(const std::vector<VideoFormat>& formats);//設置採集設備支持的格式fps，resolution，NV12， I420，MJPEG等

根據設置的參數，有時 GetBestCaptureFormat() 並不能獲得比較符合咱們設置的採集格式，由於不一樣的設備採集能力不一樣，iOS、Android、PC、Mac 原生的攝像採集和外置 USB 攝像採集對分辨率的支持是不一樣的，尤爲外置 USB 攝像採集能力良莠不齊。所以，咱們須要對 GetFormatDistance() 稍做調整以知足咱們的需求，下面咱們就來聊聊具體應該如何進行代碼調整以知足需求。

選擇策略源碼分析

咱們先分析一下 GetFormatDistance() 的源碼，摘取部分代碼：

代碼摘自 WebRTC 中 src/media/base/videocapturer.cc

// Get the distance between the supported and desired formats.
int64_t VideoCapturer::GetFormatDistance(const VideoFormat& desired,
                                         const VideoFormat& supported) {
  //....省略部分代碼
  // Check resolution and fps.
  int desired_width = desired.width;//編碼分辨率寬
  int desired_height = desired.height;//編碼分辨率高
  int64_t delta_w = supported.width - desired_width;//寬的差
  
  float supported_fps = VideoFormat::IntervalToFpsFloat(supported.interval);//採集設備支持的幀率
  float delta_fps = supported_fps - VideoFormat::IntervalToFpsFloat(desired.interval);//幀率差
  int64_t aspect_h = desired_width
                         ? supported.width * desired_height / desired_width
                         : desired_height;//計算出設置的寬高比的高，採集設備的分辨率支持通常寬>高
  int64_t delta_h = supported.height - aspect_h;//高的差
  int64_t delta_fourcc;//設置的支持像素格式優先順序，好比優先設置了NV12，一樣分辨率和幀率的狀況優先使用NV12格式採集
  
  //....省略部分降級策略代碼，主要針對設備支持的分辨率和幀率不知足設置後的降級策略
  
  int64_t distance = 0;
  distance |=
      (delta_w << 28) | (delta_h << 16) | (delta_fps << 8) | delta_fourcc;

  return distance;
}

咱們主要關注 Distance 這個參數。Distance 是 WebRTC 中的概念，它是設置的採集格式與設備支持的採集格式按照必定算法策略計算出的差值，差值越小表明設備支持的採集格式與設置想要的格式越接近，爲 0 即恰好匹配。

Distance 由四部分組成 delta_w，delta_h，delta_fps，delta_fourcc，其中 delta_w（分辨率寬）權重最重，delta_h（分辨率高）其次，delta_fps（幀率）再次，delta_fourcc（像素格式）最後。這樣致使的問題是寬的比重過高, 高的比重過低，沒法匹配到比較精確支持的分辨率。

Example:

以 iPhone xs Max 800x800 fps:10 爲例，咱們摘取部分採集格式的 distance, 原生的 GetFormatDistance() 的算法是不知足需求的，想要的是 800x800，能夠從下圖看出結果 Best 是960x540，不符合預期：

Supported NV12 192x144x10 distance 489635708928
Supported NV12 352x288x10 distance 360789835776
Supported NV12 480x360x10 distance 257721630720
Supported NV12 640x480x10 distance 128880476160
Supported NV12 960x540x10 distance 43032248320
Supported NV12 1024x768x10 distance 60179873792
Supported NV12 1280x720x10 distance 128959119360
Supported NV12 1440x1080x10 distance 171869470720
Supported NV12 1920x1080x10 distance 300812861440
Supported NV12 1920x1440x10 distance 300742082560
Supported NV12 3088x2316x10 distance 614332104704
Best NV12 960x540x10 distance 43032248320

選擇策略調整

爲了獲取咱們想要的分辨率，按照咱們分析，須要明確調整 GetFormatDisctance() 的算法，將分辨率的權重調整爲最高，幀率其次，在沒有指定像素格式的狀況下，像素格式最後，那麼修改狀況以下：

int64_t VideoCapturer::GetFormatDistance(const VideoFormat& desired,
const VideoFormat& supported) {
 //....省略部分代碼
  // Check resolution and fps.
int desired_width = desired.width; //編碼分辨率寬
int desired_height = desired.height; //編碼分辨率高
  int64_t delta_w = supported.width - desired_width;
  int64_t delta_h = supported.height - desired_height;
  int64_t delta_fps = supported.framerate() - desired.framerate();
  distance = std::abs(delta_w) + std::abs(delta_h);
  //....省略降級策略, 好比設置了1080p，可是攝像採集設備最高支持720p，須要降級
  distance = (distance << 16 | std::abs(delta_fps) << 8 | delta_fourcc);
return distance;
}

修改後：Distance 由三部分組成分辨率 (delta_w+delta_h)，幀率 delta_fps，像素 delta_fourcc，其中 (delta_w+delta_h) 比重最高，delta_fps 其次，delta_fourcc 最後。

Example:

仍是以 iPhone xs Max 800x800 fps:10 爲例，咱們摘取部分採集格式的 Distance, GetFormatDistance() 修改後, 咱們想要的是 800x800, 選擇的 Best 是1440x1080, 咱們能夠經過縮放裁剪獲得 800x800, 符合預期（對分辨率要求不是特別精確的狀況下，能夠調整降級策略，選擇1024x768）：

Supported NV12 192x144x10 distance 828375040
Supported NV12 352x288x10 distance 629145600
Supported NV12 480x360x10 distance 498073600
Supported NV12 640x480x10 distance 314572800
Supported NV12 960x540x10 distance 275251200
Supported NV12 1024x768x10 distance 167772160
Supported NV12 1280x720x10 distance 367001600
Supported NV12 1440x1080x10 distance 60293120
Supported NV12 1920x1080x10 distance 91750400
Supported NV12 1920x1440x10 distance 115343360
Supported NV12 3088x2316x10 distance 249298944
Best NV12 1440x1080x10 distance 60293120

如何實現採集分辨率到編碼分辨率

視頻數據採集完成後，會通過 VideoAdapter (WebRTC中的抽象) 處理再分發到對應的 Sink (WebRTC中的抽象)。咱們在 VideoAdapter 中稍做調整以計算出縮放裁剪所需的參數，再把視頻數據用 LibYUV 縮放再裁剪到編碼分辨率（爲了儘量保留多的畫面圖像信息，先用縮放處理，寬高比不一致時再裁剪多餘的像素信息）。這裏咱們重點分析兩個問題：

仍是選用上面的例子，咱們想要的分辨率爲 800x800 ，可是咱們獲得的最佳採集分辨率爲 1440x1080，那麼，如何從 1440x1080 採集分辨率獲得設置的編碼分辨率 800x800 呢？
在視頻數據從 VideoCapture 流到 VideoSink 的過程當中會通過 VideoAdapter 的處理，VideoAdapter 具體作了哪些事呢？

下面咱們就這兩個問題展開具體的分析，咱們先了解一下 VideoAdapter 是什麼。

VideoAdapter 介紹

WebRTC 中對 VideoAdapter 是這樣描述的：

VideoAdapter adapts an input video frame to an output frame based on the specified input and output formats. The adaptation includes dropping frames to reduce frame rate and scaling frames.VideoAdapter is
thread safe.

咱們能夠理解爲：VideoAdapter 是數據輸入輸出控制的模塊，能夠對幀率、分辨率作對應的幀率控制和分辨率降級。在 VQC（Video Quality Control）視頻質量控制模塊裏，經過對 VideoAdapter 的配置，能夠作到在低帶寬、高 CPU 狀況下對幀率進行動態降幀，對分辨率進行動態縮放，以保證視頻的流暢性，從而提升用戶體驗。

摘自 src/media/base/videoadapter.h

VideoAdapter.h
bool AdaptFrameResolution(int in_width,
int in_height,
                            int64_t in_timestamp_ns,
int* cropped_width,
int* cropped_height,
int* out_width,
int* out_height);
void OnOutputFormatRequest(
const absl::optional<std::pair<int, int>>& target_aspect_ratio,
const absl::optional<int>& max_pixel_count,
const absl::optional<int>& max_fps);
void OnOutputFormatRequest(const absl::optional<VideoFormat>& format);

VideoAdapter 源碼分析

VideoAdapter 中根據設置的 desried_format，調用 AdaptFrameResolution()，能夠計算出採集分辨率到編碼分辨率應該縮放和裁剪的 cropped_width, cropped_height, out_width, out_height 參數, WebRTC 原生的 adaptFrameResolution 是根據計算像素面積計算縮放參數，而不能獲得精確的寬&高：

摘自src/media/base/videoadapter.cc

bool VideoAdapter::AdaptFrameResolution(int in_width,
int in_height,
                                        int64_t in_timestamp_ns,
int* cropped_width,
int* cropped_height,
int* out_width,
int* out_height) {
//.....省略部分代碼
// Calculate how the input should be cropped.
if (!target_aspect_ratio || target_aspect_ratio->first <= 0 ||
        target_aspect_ratio->second <= 0) {
      *cropped_width = in_width;
      *cropped_height = in_height;
    } else {
const float requested_aspect =
          target_aspect_ratio->first /
static_cast<float>(target_aspect_ratio->second);
      *cropped_width =
          std::min(in_width, static_cast<int>(in_height * requested_aspect));
      *cropped_height =
          std::min(in_height, static_cast<int>(in_width / requested_aspect));
    }
const Fraction scale;//vqc 縮放係數 ....省略代碼
    // Calculate final output size.
    *out_width = *cropped_width / scale.denominator * scale.numerator;
    *out_height = *cropped_height / scale.denominator * scale.numerator;
 }

Example：

以 iPhone xs Max 800x800 fps:10 爲例，設置編碼分辨率爲 800x800，採集分辨率是 1440x1080，根據原生的算法，計算獲得的新的分辨率爲 720x720, 不符合預期。

VideoAdapter 調整

VideoAdapter 是 VQC（視頻質量控制模塊）中對視頻質量作調整的重要部分，VQC 之因此能夠完成幀率控制、分辨率縮放等操做，主要依賴於 VideoAdapter，所以修改須要考慮對 VQC 的影響。

爲了能精確得到想要的分辨率，且不影響 VQC 模塊對分辨率的控制，咱們對 AdaptFrameResolution() 作如下調整：

bool VideoAdapter::AdaptFrameResolution(int in_width,
int in_height,
                                        int64_t in_timestamp_ns,
int* cropped_width,
int* cropped_height,
int* out_width,
int* out_height) {
  //....省略部分代碼
bool in_more =
        (static_cast<float>(in_width) / static_cast<float>(in_height)) >=
        (static_cast<float>(desired_width_) /
static_cast<float>(desired_height_));
if (in_more) {
        *cropped_height = in_height;
        *cropped_width = *cropped_height * desired_width_ / desired_height_;
    } else {
      *cropped_width = in_width;
      *cropped_height = *cropped_width * desired_height_ / desired_width_;
    }
    *out_width = desired_width_;
    *out_height = desired_height_;
    //....省略部分代碼
return true;
}

Example：

一樣以 iPhone xs Max 800x800 fps:10 爲例，設置編碼分辨率爲 800x800，採集分辨率是 1440x1080，根據調整後的算法，計算獲得的編碼分辨率爲 800x800, 符合預期。