pytorch 0.4.0遷移指南

時間 2019-11-18

標籤 pytorch 0.4.0 遷移指南简体版

原文原文鏈接

總說

因爲pytorch 0.4版本更新實在太大了, 之前版本的代碼必須有必定程度的更新. 主要的更新在於 Variable和Tensor的合併., 固然還有Windows的支持, 其餘一些就是支持scalar tensor以及修復bug和提高性能吧. Variable和Tensor的合併致使之前的代碼會出錯, 因此須要遷移, 其實遷移代價並不大.python

Tensor和Variable的合併

說是合併, 實際上是按照之前(0.1-0.3版本)的觀點是: Tensor如今默認requires_grad=False的Variable了. torch.Tensor和torch.autograd.Variable如今實際上是同一個類! 沒有本質的區別! 因此也就是說, 如今已經沒有純粹的Tensor了, 是個Tensor, 它就支持自動求導! 你如今要不要給Tensor包一下Variable, 都沒有任何意義了.安全

查看`Tensor`的類型

使用.isinstance()或是x.type(), 用type()不能看tensor的具體類型.ide

>>> x = torch.DoubleTensor([1, 1, 1]) >>> print(type(x)) # was torch.DoubleTensor "<class 'torch.Tensor'>" >>> print(x.type()) # OK: 'torch.DoubleTensor' 'torch.DoubleTensor' >>> print(isinstance(x, torch.DoubleTensor)) # OK: True True

requires_grad 已是Tensor的一個屬性了

>>> x = torch.ones(1) >>> x.requires_grad #默認是False False >>> y = torch.ones(1) >>> z = x + y >>> # 顯然z的該屬性也是False >>> z.requires_grad False >>> # 全部變量都不須要grad, 因此會出錯 >>> z.backward() RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn >>> >>> # 能夠將`requires_grad`做爲一個參數, 構造tensor >>> w = torch.ones(1, requires_grad=True) >>> w.requires_grad True >>> total = w + z >>> total.requires_grad True >>> # 如今能夠backward了 >>> total.backward() >>> w.grad tensor([ 1.]) >>> # x,y,z都是不須要梯度的,他們的grad也沒有計算 >>> z.grad == x.grad == y.grad == None True

經過.requires_grad()來進行使得Tensor須要梯度.函數

不要隨便用.data

之前.data是爲了拿到Variable中的Tensor,可是後來, 兩個都合併了. 因此 .data返回一個新的requires_grad=False的Tensor! 然而新的這個Tensor與之前那個Tensor是共享內存的. 因此不安全, 由於性能

y = x.data # x須要進行autograd # y和x是共享內存的,可是這裏y已經不須要grad了, # 因此會致使原本須要計算梯度的x也沒有梯度能夠計算.從而x不會獲得更新!

因此, 推薦用x.detach(), 這個仍舊是共享內存的, 也是使得y的requires_grad爲False, 可是,若是x須要求導, 仍舊是能夠自動求導的!ui

scalar的支持

這個很是重要啊! 之前indexing一個一維Tensor,返回的是一個number類型,可是indexing一個Variable確實返回一個size爲(1,)的vector. 再好比一些reduction操做, 好比tensor.sum()返回一個number, 可是variable.sum()返回的是一個size爲(1,)的vector.this

scalar是0-維度的Tensor, 因此咱們不能簡單的用之前的方法建立, 咱們用一個torch.tensor注意,是小寫的!lua

>>> torch.tensor(3.1416) # 用torch.tensor來建立scalar tensor(3.1416) # 注意 scalar是打印出來是沒有[]的 >>> torch.tensor(3.1416).size() # size是0 torch.Size([]) >>> torch.tensor([3]).size() # compare to a vector of size 1 torch.Size([1]) # 若是是tensor, 打印出來會用`[]`包上 >>> >>> vector = torch.arange(2, 6) # this is a vector >>> vector tensor([ 2., 3., 4., 5.]) >>> vector[3] # 如今, indexing一個一維tensor返回的是一個tensor了! tensor(5.) >>> vector[3].item() # 須要額外加上.item() 來得到裏面的值 5.0 >>> mysum = torch.tensor([2, 3]).sum() # 而這種reduction操做, 返回的是一個scalar了(0-dimension的tensor) >>> mysum tensor(5) >>> mysum.size() torch.Size([])

從上面例子能夠看出, 經過引入scalar, 能夠將返回值的類型進行統一.
重點:
1. 取得一個tensor的值(返回number), 用.item()
2. 建立scalar的話,須要用torch.tensor(number)
3. torch.tensor(list)也能夠進行建立tensorspa

累加loss

之前了累加loss(爲了看loss的大小)通常是用total_loss+=loss.data[0] , 比較詭異的是, 爲啥是.data[0]? 這是由於, 這是由於loss是一個Variable, 因此之後累加loss, 用loss.item().
這個是必須的, 若是直接加, 那麼隨着訓練的進行, 會致使後來的loss具備很是大的graph, 可能會超內存. 然而total_loss只是用來看的, 因此不必進行維持這個graph!scala

棄用 `volatile`

如今這個flag已經沒用了. 被替換成torch.no_grad(), torch.set_grad_enable(grad_mode)等函數

>>> x = torch.zeros(1, requires_grad=True) >>> with torch.no_grad(): ... y = x * 2 >>> y.requires_grad False >>> >>> is_train = False >>> with torch.set_grad_enabled(is_train): ... y = x * 2 >>> y.requires_grad False >>> torch.set_grad_enabled(True) # this can also be used as a function >>> y = x * 2 >>> y.requires_grad True >>> torch.set_grad_enabled(False) >>> y = x * 2 >>> y.requires_grad False

`dypes`,`devices`以及numpy-style的構造函數

dtype是data types, 對應關係以下:

經過.dtype能夠獲得

其餘就是之前寫device type都是用.cup()或是.cuda(), 如今獨立成一個函數, 咱們能夠

>>> device = torch.device("cuda:1") >>> x = torch.randn(3, 3, dtype=torch.float64, device=device) tensor([[-0.6344, 0.8562, -1.2758], [ 0.8414, 1.7962, 1.0589], [-0.1369, -1.0462, -0.4373]], dtype=torch.float64, device='cuda:1') >>> x.requires_grad # default is False False >>> x = torch.zeros(3, requires_grad=True) >>> x.requires_grad True

新的建立`Tensor`方法

主要是能夠指定 dtype以及device.

>>> device = torch.device("cuda:1") >>> x = torch.randn(3, 3, dtype=torch.float64, device=device) tensor([[-0.6344, 0.8562, -1.2758], [ 0.8414, 1.7962, 1.0589], [-0.1369, -1.0462, -0.4373]], dtype=torch.float64, device='cuda:1') >>> x.requires_grad # default is False False >>> x = torch.zeros(3, requires_grad=True) >>> x.requires_grad True

用 torch.tensor來建立Tensor

這個等價於numpy.array,用途:
1.將python list的數據用來建立Tensor
2. 建立scalar

# 從列表中, 建立tensor >>> cuda = torch.device("cuda") >>> torch.tensor([[1], [2], [3]], dtype=torch.half, device=cuda) tensor([[ 1], [ 2], [ 3]], device='cuda:0') >>> torch.tensor(1) # 建立scalar tensor(1)

torch.like以及torch.new_

第一個是能夠建立, shape相同, 數據類型相同.

>>> x = torch.randn(3, dtype=torch.float64) >>> torch.zeros_like(x) tensor([ 0., 0., 0.], dtype=torch.float64) >>> torch.zeros_like(x, dtype=torch.int) tensor([ 0, 0, 0], dtype=torch.int32)

固然若是是單純想要獲得屬性與前者相同的Tensor, 可是shape不想要一致:

>>> x = torch.randn(3, dtype=torch.float64) >>> x.new_ones(2) # 屬性一致 tensor([ 1., 1.], dtype=torch.float64) >>> x.new_ones(4, dtype=torch.int) tensor([ 1, 1, 1, 1], dtype=torch.int32)

書寫 device-agnostic 的代碼

這個含義是, 不要顯示的指定是gpu, cpu之類的. 利用.to()來執行.

# at beginning of the script device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") ... # then whenever you get a new Tensor or Module # this won't copy if they are already on the desired device input = data.to(device) model = MyModule(...).to(device)

遷移代碼對比

之前的寫法

model = MyRNN()
  if use_cuda: model = model.cuda() # train total_loss = 0 for input, target in train_loader: input, target = Variable(input), Variable(target) hidden = Variable(torch.zeros(*h_shape)) # init hidden if use_cuda: input, target, hidden = input.cuda(), target.cuda(), hidden.cuda() ... # get loss and optimize total_loss += loss.data[0] # evaluate for input, target in test_loader: input = Variable(input, volatile=True) if use_cuda: ... ...

如今的寫法

REFERENCES:https://zhuanlan.zhihu.com/p/36116749# torch.device object used throughout this script device = torch.device("cuda" if use_cuda else "cpu") model = MyRNN().to(device) # train total_loss = 0 for input, target in train_loader: input, target = input.to(device), target.to(device) hidden = input.new_zeros(*h_shape) # has the same device & dtype as `input` ... # get loss and optimize total_loss += loss.item() # get Python number from 1-element Tensor # evaluate with torch.no_grad(): # operations inside don't track history for input, target in test_loader: ...