前言
在上篇文章《淺談深度學習:如何計算模型以及中間變量的顯存佔用大小》中咱們對如何計算各類變量所佔顯存大小進行了一些探索。而這篇文章咱們着重講解如何利用Pytorch深度學習框架的一些特性,去查看咱們當前使用的變量所佔用的顯存大小,以及一些優化工做。如下代碼所使用的平臺框架爲Pytorch。
優化顯存
在Pytorch中優化顯存是咱們處理大量數據時必要的作法,由於咱們並不可能擁有無限的顯存。顯存是有限的,而數據是無限的,咱們只有優化顯存的使用量纔可以最大化地利用咱們的數據,實現多種多樣的算法。
估測模型所佔的內存
上篇文章中說過,一個模型所佔的顯存無非是這兩種:
- 模型權重參數
- 模型所儲存的中間變量
其實權重參數通常來講並不會佔用不少的顯存空間,主要佔用顯存空間的仍是計算時產生的中間變量,當咱們定義了一個model以後,咱們能夠經過如下代碼簡單計算出這個模型權重參數所佔用的數據量:
import numpy as np# model是咱們在pytorch定義的神經網絡層
# model.parameters()取出這個model全部的權重參數
para = sum([np.prod(list(p.size())) for p in model.parameters()])
假設咱們有這樣一個model:
Sequential(
(conv_1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu_1): ReLU(inplace)
(conv_2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu_2): ReLU(inplace)
(pool_2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv_3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
而後咱們獲得的para
是112576
,可是咱們計算出來的僅僅是權重參數的「數量」,單位是B,咱們須要轉化一下:
# 下面的type_size是4,由於咱們的參數是float32也就是4B,4個字節
print('Model {} : params: {:4f}M'.format(model._get_name(), para * type_size / 1000 / 1000))
這樣就能夠打印出:
Model Sequential : params: 0.450304M
可是咱們以前說過一個神經網絡的模型,不只僅有權重參數還要計算中間變量的大小。怎麼去計算,咱們能夠假設一個輸入變量,而後將這個輸入變量投入這個模型中,而後咱們主動提取這些計算出來的中間變量:
# model是咱們加載的模型
# input是實際中投入的input(Tensor)變量
# 利用clone()去複製一個input,這樣不會對input形成影響
input_ = input.clone()
# 確保不須要計算梯度,由於咱們的目的只是爲了計算中間變量而已
input_.requires_grad_(requires_grad=False)
mods = list(model.modules())
out_sizes = []
for i in range(1, len(mods)):
m = mods[i]
# 注意這裏,若是relu激活函數是inplace則不用計算
if isinstance(m, nn.ReLU):
if m.inplace:
continue
out = m(input_)
out_sizes.append(np.array(out.size()))
input_ = out
total_nums = 0
for i in range(len(out_sizes)):
s = out_sizes[i]
nums = np.prod(np.array(s))
total_nums += nums
上面獲得的值是模型在運行時候產生全部的中間變量的「數量」,固然咱們須要換算一下:
# 打印兩種,只有 forward 和 foreward、backward的狀況
print('Model {} : intermedite variables: {:3f} M (without backward)'
.format(model._get_name(), total_nums * type_size / 1000 / 1000))
print('Model {} : intermedite variables: {:3f} M (with backward)'
.format(model._get_name(), total_nums * type_size*2 / 1000 / 1000))
由於在backward
的時候全部的中間變量須要保存下來再來進行計算,因此咱們在計算backward
的時候,計算出來的中間變量須要乘個2。
而後咱們得出,上面這個模型的中間變量須要的佔用的顯存,很顯然,中間變量佔用的值比模型自己的權重值多多了。若是進行一次backward
那麼須要的就更多。
Model Sequential : intermedite variables: 336.089600 M (without backward)
Model Sequential : intermedite variables: 672.179200 M (with backward)
咱們總結一下以前的代碼:
# 模型顯存佔用監測函數
# model:輸入的模型
# input:實際中須要輸入的Tensor變量
# type_size 默認爲 4 默認類型爲 float32
def modelsize(model, input, type_size=4):
para = sum([np.prod(list(p.size())) for p in model.parameters()])
print('Model {} : params: {:4f}M'.format(model._get_name(), para * type_size / 1000 / 1000))
input_ = input.clone()
input_.requires_grad_(requires_grad=<span class="hljs-keyword">False</span>)
mods = list(model.modules())
out_sizes = []
<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, len(mods)):
m = mods[i]
<span class="hljs-keyword">if</span> isinstance(m, nn.ReLU):
<span class="hljs-keyword">if</span> m.inplace:
<span class="hljs-keyword">continue</span>
out = m(input_)
out_sizes.append(np.array(out.size()))
input_ = out
total_nums = <span class="hljs-number">0</span>
<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(out_sizes)):
s = out_sizes[i]
nums = np.prod(np.array(s))
total_nums += nums
print(<span class="hljs-string">'Model {} : intermedite variables: {:3f} M (without backward)'</span>
.format(model._get_name(), total_nums * type_size / <span class="hljs-number">1000</span> / <span class="hljs-number">1000</span>))
print(<span class="hljs-string">'Model {} : intermedite variables: {:3f} M (with backward)'</span>
.format(model._get_name(), total_nums * type_size*<span class="hljs-number">2</span> / <span class="hljs-number">1000</span> / <span class="hljs-number">1000</span>))
input_ = input.clone()
input_.requires_grad_(requires_grad=<span class="hljs-keyword">False</span>)
mods = list(model.modules())
out_sizes = []
<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, len(mods)):
m = mods[i]
<span class="hljs-keyword">if</span> isinstance(m, nn.ReLU):
<span class="hljs-keyword">if</span> m.inplace:
<span class="hljs-keyword">continue</span>
out = m(input_)
out_sizes.append(np.array(out.size()))
input_ = out
total_nums = <span class="hljs-number">0</span>
<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(out_sizes)):
s = out_sizes[i]
nums = np.prod(np.array(s))
total_nums += nums
print(<span class="hljs-string">'Model {} : intermedite variables: {:3f} M (without backward)'</span>
.format(model._get_name(), total_nums * type_size / <span class="hljs-number">1000</span> / <span class="hljs-number">1000</span>))
print(<span class="hljs-string">'Model {} : intermedite variables: {:3f} M (with backward)'</span>
.format(model._get_name(), total_nums * type_size*<span class="hljs-number">2</span> / <span class="hljs-number">1000</span> / <span class="hljs-number">1000</span>))
固然咱們計算出來的佔用顯存值僅僅是作參考做用,由於Pytorch
在運行的時候須要額外的顯存值開銷,因此實際的顯存會比咱們計算的稍微大一些。
關於inplace=False
咱們都知道激活函數Relu()
有一個默認參數inplace
,默認設置爲False
,當設置爲True
時,咱們在經過relu()
計算時的獲得的新值不會佔用新的空間而是直接覆蓋原來的值,這也就是爲何當inplace參數設置爲True時能夠節省一部份內存的緣故。
犧牲計算速度減小顯存使用量
在Pytorch-0.4.0
出來了一個新的功能,能夠將一個計算過程分紅兩半,也就是若是一個模型須要佔用的顯存太大了,咱們就能夠先計算一半,保存後一半須要的中間結果,而後再計算後一半。
也就是說,新的checkpoint
容許咱們只存儲反向傳播所須要的部份內容。若是當中缺乏一個輸出(爲了節省內存而致使的),checkpoint
將會從最近的檢查點從新計算中間輸出,以便減小內存使用(固然計算時間增長了):
# 輸入
input = torch.rand(1, 10)
# 假設咱們有一個很是深的網絡
layers = [nn.Linear(10, 10) for _ in range(1000)]
model = nn.Sequential(*layers)
output = model(input)
上面的模型須要佔用不少的內存,由於計算中會產生不少的中間變量。爲此checkpoint
就能夠幫助咱們來節省內存的佔用了。
# 首先設置輸入的input=>requires_grad=True
# 若是不設置可能會致使獲得的gradient爲0
input = torch.rand(1, 10, requires_grad=True)
layers = [nn.Linear(10, 10) for _ in range(1000)]
# 定義要計算的層函數,能夠看到咱們定義了兩個
# 一個計算前500個層,另外一個計算後500個層
def run_first_half(*args):
x = args[0]
for layer in layers[:500]:
x = layer(x)
return x
def run_second_half(*args):
x = args[0]
for layer in layers[500:-1]:
x = layer(x)
return x
# 咱們引入新加的checkpoint
from torch.utils.checkpoint import checkpoint
x = checkpoint(run_first_half, input)
x = checkpoint(run_second_half, x)
# 最後一層單獨調出來執行
x = layers-1
x.sum.backward() # 這樣就能夠了
對於Sequential-model
來講,由於Sequential()
中能夠包含不少的block
,因此官方提供了另外一個功能包:
input = torch.rand(1, 10, requires_grad=True)
layers = [nn.Linear(10, 10) for _ in range(1000)]
model = nn.Sequential(*layers)
from torch.utils.checkpoint import checkpoint_sequential
# 分紅兩個部分
num_segments = 2
x = checkpoint_sequential(model, num_segments, input)
x.sum().backward() # 這樣就能夠了
跟蹤顯存使用狀況
顯存的使用狀況,在編寫程序中咱們可能沒法精確計算,可是咱們能夠經過pynvml這個Nvidia的Python環境庫和Python的垃圾回收工具,能夠實時地打印咱們使用的顯存以及哪些Tensor使用了咱們的顯存。
相似於下面的報告:
# 08-Jun-18-17:56:51-gpu_mem_prof
At __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">39</span> Total Used Memory:<span class="hljs-number">399.4</span> Mb
At __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">40</span> Total Used Memory:<span class="hljs-number">992.5</span> Mb
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">40</span> (<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">682</span>, <span class="hljs-number">700</span>) <span class="hljs-number">1.82</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.Tensor'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">40</span> (<span class="hljs-number">1</span>, <span class="hljs-number">3</span>, <span class="hljs-number">682</span>, <span class="hljs-number">700</span>) <span class="hljs-number">5.46</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.Tensor'</span>&<span class="hljs-keyword">gt</span>;
At __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> Total Used Memory:<span class="hljs-number">1088.5</span> Mb
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">64</span>, <span class="hljs-number">64</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-number">0</span>.<span class="hljs-number">14</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">128</span>, <span class="hljs-number">64</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-number">0</span>.<span class="hljs-number">28</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">128</span>, <span class="hljs-number">128</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-number">0</span>.<span class="hljs-number">56</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">64</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-number">0</span>.<span class="hljs-number">00</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">256</span>, <span class="hljs-number">256</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-number">2.25</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">512</span>, <span class="hljs-number">256</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-number">4.5</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">512</span>, <span class="hljs-number">512</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-number">9.0</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">64</span>,) <span class="hljs-number">0</span>.<span class="hljs-number">00</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">1</span>, <span class="hljs-number">3</span>, <span class="hljs-number">682</span>, <span class="hljs-number">700</span>) <span class="hljs-number">5.46</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.Tensor'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">128</span>,) <span class="hljs-number">0</span>.<span class="hljs-number">00</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">256</span>,) <span class="hljs-number">0</span>.<span class="hljs-number">00</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">512</span>,) <span class="hljs-number">0</span>.<span class="hljs-number">00</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">3</span>,) <span class="hljs-number">1.14</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.Tensor'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">256</span>, <span class="hljs-number">128</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-number">1.12</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
...</code></pre>
如下是相關的代碼,目前代碼依然有些地方須要修改,等修改完善好我會將完整代碼以及使用說明放到github上:https://github.com/Oldpan/Pytorch-Memory-Utils 請你們多多留意。
import datetime
import linecache
import os
import gc
import pynvml
import torch
import numpy as np
print_tensor_sizes = True
last_tensor_sizes = set()
gpu_profile_fn = f'{datetime.datetime.now():%d-%b-%y-%H:%M:%S}-gpu_mem_prof.txt'
# if 'GPU_DEBUG' in os.environ:
# print('profiling gpu usage to ', gpu_profile_fn)
lineno = None
func_name = None
filename = None
module_name = None
# fram = inspect.currentframe()
# func_name = fram.f_code.co_name
# filename = fram.f_globals["__file__"]
# ss = os.path.dirname(os.path.abspath(filename))
# module_name = fram.f_globals["__name__"]
def gpu_profile(frame, event):
# it is _about to_ execute (!)
global last_tensor_sizes
global lineno, func_name, filename, module_name
if event == 'line':
try:
# about _previous_ line (!)
if lineno is not None:
pynvml.nvmlInit()
# handle = pynvml.nvmlDeviceGetHandleByIndex(int(os.environ['GPU_DEBUG']))
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle)
line = linecache.getline(filename, lineno)
where_str = module_name+' '+func_name+':'+' line '+str(lineno)
with open(gpu_profile_fn, 'a+') as f:
f.write(f"At {where_str:<50}"
f"Total Used Memory:{meminfo.used/1024**2:<7.1f}Mb\n")
if print_tensor_sizes is True:
for tensor in get_tensors():
if not hasattr(tensor, 'dbg_alloc_where'):
tensor.dbg_alloc_where = where_str
new_tensor_sizes = {(type(x), tuple(x.size()), np.prod(np.array(x.size()))*4/1024**2,
x.dbg_alloc_where) for x in get_tensors()}
for t, s, m, loc in new_tensor_sizes - last_tensor_sizes:
f.write(f'+ {loc:<50} {str(s):<20} {str(m)[:4]} M {str(t):<10}\n')
for t, s, m, loc in last_tensor_sizes - new_tensor_sizes:
f.write(f'- {loc:<50} {str(s):<20} {str(m)[:4]} M {str(t):<10}\n')
last_tensor_sizes = new_tensor_sizes
pynvml.nvmlShutdown()
# save details about line _to be_ executed
lineno = None
func_name = frame.f_code.co_name
filename = frame.f_globals["__file__"]
if (filename.endswith(".pyc") or
filename.endswith(".pyo")):
filename = filename[:-1]
module_name = frame.f_globals["__name__"]
lineno = frame.f_lineno
return gpu_profile
except Exception as e:
print('A exception occured: {}'.format(e))
return gpu_profile
def get_tensors():
for obj in gc.get_objects():
try:
if torch.is_tensor(obj):
tensor = obj
else:
continue
if tensor.is_cuda:
yield tensor
except Exception as e:
print('A exception occured: {}'.format(e))
須要注意的是,linecache中的getlines只能讀取緩衝過的文件,若是這個文件沒有運行過則返回無效值。Python 的垃圾收集機制會在變量沒有應引用的時候立馬進行回收,可是爲何模型中計算的中間變量在執行結束後還會存在呢。既然都沒有引用了爲何還會佔用空間?
一種可能的狀況是這些引用不在Python代碼中,而是在神經網絡層的運行中爲了backward被保存爲gradient,這些引用都在計算圖中,咱們在程序中是沒法看到的:
後記
實際中咱們會有些只使用一次的模型,爲了節省顯存,咱們須要一邊計算一遍清除中間變量,使用del進行操做。限於篇幅這裏不進行講解,下一篇會進行說明。
原文地址:如何在Pytorch中精細化利用顯存
<br>
原創文章,轉載請註明 :<a href="https://ptorch.com/news/181.html" target="_blank">如何在Pytorch中精細化利用顯存以及提升Pytorch顯存利用率 - pytorch中文網</a><br>
原文出處: https://ptorch.com/news/181.html<br>
問題交流羣 :168117787
</div>
At __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">39</span> Total Used Memory:<span class="hljs-number">399.4</span> Mb
At __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">40</span> Total Used Memory:<span class="hljs-number">992.5</span> Mb
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">40</span> (<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">682</span>, <span class="hljs-number">700</span>) <span class="hljs-number">1.82</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.Tensor'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">40</span> (<span class="hljs-number">1</span>, <span class="hljs-number">3</span>, <span class="hljs-number">682</span>, <span class="hljs-number">700</span>) <span class="hljs-number">5.46</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.Tensor'</span>&<span class="hljs-keyword">gt</span>;
At __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> Total Used Memory:<span class="hljs-number">1088.5</span> Mb
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">64</span>, <span class="hljs-number">64</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-number">0</span>.<span class="hljs-number">14</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">128</span>, <span class="hljs-number">64</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-number">0</span>.<span class="hljs-number">28</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">128</span>, <span class="hljs-number">128</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-number">0</span>.<span class="hljs-number">56</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">64</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-number">0</span>.<span class="hljs-number">00</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">256</span>, <span class="hljs-number">256</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-number">2.25</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">512</span>, <span class="hljs-number">256</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-number">4.5</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">512</span>, <span class="hljs-number">512</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-number">9.0</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">64</span>,) <span class="hljs-number">0</span>.<span class="hljs-number">00</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">1</span>, <span class="hljs-number">3</span>, <span class="hljs-number">682</span>, <span class="hljs-number">700</span>) <span class="hljs-number">5.46</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.Tensor'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">128</span>,) <span class="hljs-number">0</span>.<span class="hljs-number">00</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">256</span>,) <span class="hljs-number">0</span>.<span class="hljs-number">00</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">512</span>,) <span class="hljs-number">0</span>.<span class="hljs-number">00</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">3</span>,) <span class="hljs-number">1.14</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.Tensor'</span>&<span class="hljs-keyword">gt</span>;
+ __main_<span class="hljs-number">_</span> &<span class="hljs-keyword">lt</span>;module&<span class="hljs-keyword">gt</span>;: line <span class="hljs-number">126</span> (<span class="hljs-number">256</span>, <span class="hljs-number">128</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-number">1.12</span> M &<span class="hljs-keyword">lt</span>;class <span class="hljs-string">'torch.nn.parameter.Parameter'</span>&<span class="hljs-keyword">gt</span>;
...</code></pre>import datetime
import linecache
import os
import gc
import pynvml
import torch
import numpy as np
print_tensor_sizes = True
last_tensor_sizes = set()
gpu_profile_fn = f'{datetime.datetime.now():%d-%b-%y-%H:%M:%S}-gpu_mem_prof.txt'
# if 'GPU_DEBUG' in os.environ:
# print('profiling gpu usage to ', gpu_profile_fn)
lineno = None
func_name = None
filename = None
module_name = None
# fram = inspect.currentframe()
# func_name = fram.f_code.co_name
# filename = fram.f_globals["__file__"]
# ss = os.path.dirname(os.path.abspath(filename))
# module_name = fram.f_globals["__name__"]
def gpu_profile(frame, event):
# it is _about to_ execute (!)
global last_tensor_sizes
global lineno, func_name, filename, module_name
if event == 'line':
try:
# about _previous_ line (!)
if lineno is not None:
pynvml.nvmlInit()
# handle = pynvml.nvmlDeviceGetHandleByIndex(int(os.environ['GPU_DEBUG']))
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle)
line = linecache.getline(filename, lineno)
where_str = module_name+' '+func_name+':'+' line '+str(lineno)
with open(gpu_profile_fn, 'a+') as f:
f.write(f"At {where_str:<50}"
f"Total Used Memory:{meminfo.used/1024**2:<7.1f}Mb\n")
if print_tensor_sizes is True:
for tensor in get_tensors():
if not hasattr(tensor, 'dbg_alloc_where'):
tensor.dbg_alloc_where = where_str
new_tensor_sizes = {(type(x), tuple(x.size()), np.prod(np.array(x.size()))*4/1024**2,
x.dbg_alloc_where) for x in get_tensors()}
for t, s, m, loc in new_tensor_sizes - last_tensor_sizes:
f.write(f'+ {loc:<50} {str(s):<20} {str(m)[:4]} M {str(t):<10}\n')
for t, s, m, loc in last_tensor_sizes - new_tensor_sizes:
f.write(f'- {loc:<50} {str(s):<20} {str(m)[:4]} M {str(t):<10}\n')
last_tensor_sizes = new_tensor_sizes
pynvml.nvmlShutdown()
# save details about line _to be_ executed
lineno = None
func_name = frame.f_code.co_name
filename = frame.f_globals["__file__"]
if (filename.endswith(".pyc") or
filename.endswith(".pyo")):
filename = filename[:-1]
module_name = frame.f_globals["__name__"]
lineno = frame.f_lineno
return gpu_profile
except Exception as e:
print('A exception occured: {}'.format(e))
return gpu_profile
def get_tensors():
for obj in gc.get_objects():
try:
if torch.is_tensor(obj):
tensor = obj
else:
continue
if tensor.is_cuda:
yield tensor
except Exception as e:
print('A exception occured: {}'.format(e))<br>
原創文章,轉載請註明 :<a href="https://ptorch.com/news/181.html" target="_blank">如何在Pytorch中精細化利用顯存以及提升Pytorch顯存利用率 - pytorch中文網</a><br>
原文出處: https://ptorch.com/news/181.html<br>
問題交流羣 :168117787
</div>