TensorFlow新功能：TensorFlow Probability機率編程工具箱介紹

時間 2019-12-07

標籤 tensorflow 功能 probability 機率編程工具箱介紹简体版

原文原文鏈接

在2018年TensorFlow開發者峯會上，咱們宣佈了TensorFlow Probability：一種機率編程工具箱，用於機器學習研究人員和其餘從業人員快速可靠地利用最早進硬件構建複雜模型。若是出現如下狀況，咱們推薦你使用TensorFlow Probability：算法

你想創建一個生成數據的模型，並推理其隱藏的過程。
你須要量化預測中的不肯定性，而不是預測單個值。
你的訓練集具備大量相對於數據點數量的特徵。
你的數據是結構化的，例如，使用組、空間、圖表或語言語義 - 而且你但願使用先前的信息來捕獲此結構。
你有一個相反的問題：見TFDS'18談話：重建測量融合等離子體。

TensorFlow Probability爲你提供解決上述這些問題的工具，此外，它還繼承了TensorFlow的優點，如自動差別化，以及跨多種平臺（CPU，GPU和TPU）擴展性能的能力。編程

什麼是TensorFlow Probability？

咱們此次發佈的機器學習工具爲TensorFlow生態系統中的機率推理和統計分析提供了模塊化抽象。網絡

TensorFlow機率的概述。機率編程工具箱爲從數據科學家和統計人員到全部TensorFlow用戶的用戶提供了好處。dom

第0層：TensorFlow的數值運算。特別是，LinearOperator類實現了無矩陣計算，能夠利用特殊結構（對角線，低秩矩陣等）進行高效計算。它由TensorFlow Probability團隊構建和維護，如今是TF中tf.linalg核心的一部分。機器學習

第1層：統計構建模塊

分佈（tf.contrib.distributions，tf.distributions）：一個包含了批量和廣播語義的機率分佈和相關統計的大量集合。
Bijectors（tf.contrib.distributions.bijectors）：支持隨機變量的可逆和可組合變換。Bijectors提供了豐富的變換分佈類別，從經典的例子（如對數正態分佈）到複雜的深度學習模型（如masked自迴歸流）。

（有關更多信息，請參閱TensorFlow Distributions白皮書。）編程語言

第2層：模型構建

Edward2（tfp.edward2）：一種用於將靈活的機率模型指定爲程序的機率編程語言。
機率圖層（tfp.layers）：對其所表明的功能具備不肯定性的神經網絡圖層，擴展了TensorFlow圖層。
可訓練分佈（tfp.trainable_distributions）：由單個張量參數化的機率分佈，使創建輸出機率分佈的神經網絡變得容易。

第3層：機率推理

馬爾可夫鏈Monte Carlo（tfp.mcmc）：經過採樣來近似積分的算法。包括Hamiltonian Monte
Carlo，隨機遊走Metropolis-Hastings，以及構建自定義過渡內核的能力。
變分推理（tfp.vi）：經過優化來近似積分的算法。
優化器（tfp.optimizer）：隨機優化方法，擴展TensorFlow優化器，包括隨機梯度Langevin動力學。
蒙特卡洛（tfp.monte_carlo）：用於計算蒙特卡羅指望的工具。

第4層：預製模型和推理（相似於TensorFlow的預製估算器）

貝葉斯結構時間序列：用於擬合時間序列模型的高級接口（即相似於R的BSTS包）。
廣義線性混合模型：用於擬合混合效應迴歸模型的高級界面（即與R的lme4軟件包類似）。

TensorFlow Probability團隊致力於經過尖端功能，持續更新代碼和錯誤修復來支持用戶和貢獻者，咱們將繼續添加端到端的示例和教程。ide

讓咱們看看一些例子！模塊化

Edward2打造的線性混合效應模型

線性混合效應模型是對數據中結構化關係進行建模的簡單方法，也能夠稱爲分級線性模型，它分享各組數據點之間的統計強度，以便改進對任何單個數據點的推論。函數

做爲演示，請考慮R中流行的lme4包中的InstEval數據集，其中包含大學課程及其評估評級。使用TensorFlow Probability，咱們將模型指定爲Edward2機率程序（tfp.edward2），它擴展了Edward。下面的程序根據其生成過程來肯定模型:工具

import tensorflow as tf
from tensorflow_probability import edward2 as ed
def model(features):
  # Set up fixed effects and other parameters.
  intercept = tf.get_variable("intercept", [])
  service_effects = tf.get_variable("service_effects", [])
  student_stddev_unconstrained = tf.get_variable(
      "student_stddev_pre", [])
  instructor_stddev_unconstrained = tf.get_variable(
      "instructor_stddev_pre", [])
  # Set up random effects.
  student_effects = ed.MultivariateNormalDiag(
      loc=tf.zeros(num_students),
      scale_identity_multiplier=tf.exp(
          student_stddev_unconstrained),
      name="student_effects")
  instructor_effects = ed.MultivariateNormalDiag(
      loc=tf.zeros(num_instructors),
      scale_identity_multiplier=tf.exp(
          instructor_stddev_unconstrained),
      name="instructor_effects")
  # Set up likelihood given fixed and random effects.
  ratings = ed.Normal(
      loc=(service_effects * features["service"] +
           tf.gather(student_effects, features["students"]) +
           tf.gather(instructor_effects, features["instructors"]) +
           intercept),
      scale=1.,
      name="ratings")
return ratings

該模型將「服務」「學生」和「教師」的特徵字典做爲輸入，它們是每一個元素描述單個課程的向量。該模型迴歸這些輸入，假設潛在的隨機變量，並返回課程評估評分的分佈。在此輸出上運行的TensorFlow會話將返回一代評級。

查看「線性混合效應模型」教程，詳細瞭解如何使用tfp.mcmc.HamiltonianMonteCarlo算法訓練模型，以及如何使用後預測來探索和解釋模型。

高斯Copulas與TFP Bijectors
Copulas是一個多元機率分佈，其中每一個變量的邊緣機率分佈是均勻的。要構建使用TFP內在函數的copula，可使用Bijectors和TransformedDistribution，這些抽象能夠輕鬆建立複雜的分佈，例如：

import tensorflow_probability as tfp
tfd = tfp.distributions
tfb = tfp.distributions.bijectors
# Example: Log-Normal Distribution
log_normal = tfd.TransformedDistribution(
    distribution=tfd.Normal(loc=0., scale=1.),
    bijector=tfb.Exp())
# Example: Kumaraswamy Distribution
Kumaraswamy = tfd.TransformedDistribution(
    distribution=tfd.Uniform(low=0., high=1.),
    bijector=tfb.Kumaraswamy(
        concentration1=2.,
        concentration0=2.))
# Example: Masked Autoregressive Flow
# https://arxiv.org/abs/1705.07057
shift_and_log_scale_fn = tfb.masked_autoregressive_default_template(
    hidden_layers=[512, 512],
    event_shape=[28*28])
maf = tfd.TransformedDistribution(
    distribution=tfd.Normal(loc=0., scale=1.),     
    bijector=tfb.MaskedAutoregressiveFlow(
        shift_and_log_scale_fn=shift_and_log_scale_fn))

該「高斯系 Copula」建立了一些自定義Bijectors，而後展現瞭如何輕鬆地創建多個不一樣的Copula函數。有關分配的更多背景信息，請參閱「瞭解張量流量分佈形狀」。它介紹瞭如何管理抽樣，批量訓練和建模事件的形狀。

帶有TFP實用程序的變分自動編碼器
變分自動編碼器是一種機器學習模型，其使用一個學習系統來表示一些低維空間中的數據，而且使用第二學習系統來將低維表示還原爲原本是輸入的。因爲TF支持自動分化，所以黑盒變換推理是一件垂手可得的事！例：

import tensorflow as tf
import tensorflow_probability as tfp
# Assumes user supplies `likelihood`, `prior`, `surrogate_posterior`
# functions and that each returns a 
# tf.distribution.Distribution-like object.
elbo_loss = tfp.vi.monte_carlo_csiszar_f_divergence(
    f=tfp.vi.kl_reverse,  # Equivalent to "Evidence Lower BOund"
    p_log_prob=lambda z: likelihood(z).log_prob(x) + prior().log_prob(z),
    q=surrogate_posterior(x),
    num_draws=1)
train = tf.train.AdamOptimizer(
    learning_rate=0.01).minimize(elbo_loss)

具備TFP機率層的貝葉斯神經網絡

貝葉斯神經網絡是一個神經網絡，它的權重和誤差具備先驗分佈。它經過這些先驗提供了改進的不肯定性。貝葉斯神經網絡也能夠解釋爲神經網絡的無限集合：分配給每一個神經網絡配置的機率是根據先前的。

做爲一個小例子，咱們使用了具備特徵（形狀爲32 x 32 x 3的圖像）和標籤（值爲0到9）的CIFAR-10數據集。爲了擬合神經網絡，咱們將使用變分推理，這是一套方法來逼近神經網絡在權重和誤差上的後驗分佈。也就是說，咱們在TensorFlow Probabilistic Layers模塊（）中使用最近發佈的Flipout估計器tfp.layers。

class MNISTModel(tf.keras.Model):
  def __init__(self):
    super(MNISTModel, self).__init__()
    self.dense1 = tfp.layers.DenseFlipout(units=10)
    self.dense2 = tfp.layers.DenseFlipout(units=10)
  def call(self, input):
    """Run the model."""
    result = self.dense1(input)
    result = self.dense2(result)
    # reuse variables from dense2 layer
    result = self.dense2(result)  
    return result
model = MNISTModel()