- 最近項目使用到Kmeans算法,考慮到CPU實現速度上的限制,須要使用GPU加速,所以查到
libKMCUDA
庫。- 記錄安裝使用過程當中遇到的一些問題。
項目地址:kmcudalinux
項目內容:Large scale K-means and K-nn implementation on NVIDIA GPU / CUDAgit
該項目具體的介紹可參照github
上的說明。github
性能以下:算法
從技術上來說,該項目是一個共享庫,可導出kmcuda.h
中定義的兩個函數:kmeasn_cuda
和knn_cuda
。它具備內置的Python3和R語言本機擴展支持,所以能夠從libKMCUDA
導入kmeans_cuda
或dyn.load("libKMCUDA.so")
。shell
Github
上給出的安裝命令:數組
git clone https://github.com/src-d/kmcuda
cd src
cmake -DCMAKE_BUILD_TYPE=Release . && make
複製代碼
有幾個參數須要注意一下:bash
-D DISABLE_PYTHON
: 若是不想編譯Python
支持模塊,將該項值爲y
,即增長-D DISABLE_PYTHON=y
服務器
-D DISABLE_R
: 若是不想編譯R
支持模塊,增長-D DISABLE_R=y
less
-D CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-10.0(修改成本身的路徑)
:若是CUDA
沒法自動找到,則增長該項dom
-D CUDA_ARCH=52
:指定當前機器的CUDA計算能力(GPU Compute Capability)
gcc
:有文章提到,低版本的gcc編譯器不支持,我當前版本是5.4,可知足需求。
若版本太低,可安裝gcc-5.4,具體安裝參考以下博文:
經過NVIDIA官網查詢本身GPU服務器的GPU算力,地址以下:
我當前使用的服務器是GeForce RTX 2070
,對應的算力是7.5
。 所以,CUDA_ARCH
設置爲75, -D CUDA_ARCH=75
爲了可以自動查找相關庫的路徑,將cuda
的路徑配置到配置文件中。當前系統使用的shell爲zsh
:
在~/.zshrc
中增長以下項:
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARAY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda:$CUDA_TOOLKIT_BOOT_DIR
export CUDA_INCLUDE_DIRS=/usr/local/cuda/include
複製代碼
激活生效:
source ~/.zshrc
當前設備參數:
完整安裝命令:
git clone https://github.com/src-d/kmcuda
cd src
cmake -DCMAKE_BUILD_TYPE=Release -D DISABLE_R-y -D CUDA_ARCH=75 . && make
複製代碼
測試:
安裝命令以下:
CUDA_ARCH=75 pip install libKMCUDA
出現錯誤:
使用pip安裝源文件
安裝命令:
pip install git+https://github.com/src-d/kmcuda.git#subdirectory=src
出現以下錯誤:
由此錯誤可見,使用該方式安裝,默認使用使用的-DCUDA_ARCH
爲61,與當前實際不符。
未指定GPU算力
安裝命令:
cmake -DCMAKE_BUILD_TYPE=Release -D DISABLE_R=y . && make
以下圖,顯示安裝成功
在進行測試的時候,會出現以下錯誤:
提示計算能力與設備不匹配。
import numpy
from matplotlib import pyplot
from libKMCUDA import kmeans_cuda
numpy.random.seed(0)
arr = numpy.empty((10000, 2), dtype=numpy.float32)
arr[:2500] = numpy.random.rand(2500, 2) + [0, 2]
arr[2500:5000] = numpy.random.rand(2500, 2) - [0, 2]
arr[5000:7500] = numpy.random.rand(2500, 2) + [2, 0]
arr[7500:] = numpy.random.rand(2500, 2) - [2, 0]
centroids, assignments = kmeans_cuda(arr, 4, verbosity=1, seed=3)
print(centroids)
pyplot.scatter(arr[:, 0], arr[:, 1], c=assignments)
pyplot.scatter(centroids[:, 0], centroids[:, 1], c="white", s=150)
pyplot.show()
複製代碼
import numpy
from matplotlib import pyplot
from libKMCUDA import kmeans_cuda
numpy.random.seed(0)
arr = numpy.empty((10000, 2), dtype=numpy.float32)
angs = numpy.random.rand(10000) * 2 * numpy.pi
for i in range(10000):
arr[i] = numpy.sin(angs[i]), numpy.cos(angs[i])
centroids, assignments, avg_distance = kmeans_cuda(
arr, 4, metric="cos", verbosity=1, seed=3, average_distance=True)
print("Average distance between centroids and members:", avg_distance)
print(centroids)
pyplot.scatter(arr[:, 0], arr[:, 1], c=assignments)
pyplot.scatter(centroids[:, 0], centroids[:, 1], c="white", s=150)
pyplot.show()
複製代碼
結果以下:
def kmeans_cuda(samples, clusters, tolerance=0.01, init="k-means++",
yinyang_t=0.1, metric="L2", average_distance=False,
seed=time(), device=0, verbosity=0)
複製代碼
參數:
samples
:shape爲[樣本數,特徵數]的numpy數組,或者元組(raw device pointer (int), device index (int)
clusters
: int
類型,聚類簇的數目
tolerance
:float
類型,若是相對從新分配數量降低到該值如下,則算法中止。init
:string
或numpy數組,設置質心初始化方法,能夠是k-means++
,afk-mc2
,random
或指定shape的numpy數組[cluster, 特徵數],類型必須是float32
yinynag_t
:float
類型,一般設置爲0.1metric
:str
類型,使用的距離度量名稱。默認爲Duclidean(L2)
,能夠改成cos
。請注意,後一種狀況下,樣本必須歸一化。average_distance
:boolean
類型,該值表示是否計算類內元素與相應質心之間的平均距離,對於尋找最優K有用,做爲第三個元組元素返回。seed
:int
類型,隨機生成器種子用於再現結果。device
:int
類型,CUDA設備索引,如1表示第一個設備,2表示第二個,3表示使用第一個和第二個。指定爲0表示啓用全部設備,默認爲0.verbosity
:int
類型,0意味着徹底無輸出,1表示僅記錄進度,2表示大量輸出。返回值:元組(centroids, assignments, [average_distance])
。 若是samples
是numpy數組偶主機指針元組,則類型是numpy數組,不然,原始指針(整數)分配在同一設備上。 若是samples
是float16
,則返回的質心也是float16
。
def knn_cuda(k, samples, centroids, assignments, metric="L2", device=0, verbosity=0)
複製代碼
參數:
k: integer, the number of neighbors to search for each sample. Must be ≤ 116.
samples: numpy array of shape [number of samples, number of features] or tuple(raw device pointer (int), device index (int), shape (tuple(number of samples, number of features[, fp16x2 marker]))). In the latter case, negative device index means host pointer. Optionally, the tuple can be 1 item longer with the preallocated device pointer for neighbors. dtype must be either float16 or convertible to float32.
centroids: numpy array with precalculated clusters' centroids (e.g., using K-means/kmcuda/kmeans_cuda()). dtype must match samples. If samples is a tuple then centroids must be a length-2 tuple, the first element is the pointer and the second is the number of clusters. The shape is (number of clusters, number of features).
assignments: numpy array with sample-cluster associations. dtype is expected to be compatible with uint32. If samples is a tuple then assignments is a pointer. The shape is (number of samples,).
metric: str, the name of the distance metric to use. The default is Euclidean (L2), it can be changed to "cos" to change the algorithm to Spherical K-means with the angular distance. Please note that samples must be normalized in the latter case.
device: integer, bitwise OR-ed CUDA device indices, e.g. 1 means first device, 2 means second device, 3 means using first and second device. Special value 0 enables all available devices. The default is 0.
verbosity: integer, 0 means complete silence, 1 means mere progress logging, 2 means lots of output.
返回值: neighbor indices. If samples was a numpy array or a host pointer tuple, the return type is numpy array, otherwise, a raw pointer (integer) allocated on the same device. The shape is (number of samples, k).