KMCUDA： GPU/CUDA 實現Kmeans

時間 2019-11-06

標籤 kmcuda gpu cuda 實現 kmeans 简体版

原文原文鏈接

最近項目使用到Kmeans算法，考慮到CPU實現速度上的限制，須要使用GPU加速，所以查到libKMCUDA庫。

記錄安裝使用過程當中遇到的一些問題。

一、kmcuda簡介

項目地址：kmcudalinux

項目內容：Large scale K-means and K-nn implementation on NVIDIA GPU / CUDAgit

該項目具體的介紹可參照github上的說明。github

性能以下：算法

從技術上來說，該項目是一個共享庫，可導出kmcuda.h中定義的兩個函數：kmeasn_cuda和knn_cuda。它具備內置的Python3和R語言本機擴展支持，所以能夠從libKMCUDA導入kmeans_cuda或dyn.load("libKMCUDA.so")。shell

二、安裝

Github上給出的安裝命令：數組

git clone https://github.com/src-d/kmcuda
cd src
cmake -DCMAKE_BUILD_TYPE=Release . && make
複製代碼

有幾個參數須要注意一下：bash

-D DISABLE_PYTHON: 若是不想編譯Python支持模塊，將該項值爲y,即增長-D DISABLE_PYTHON=y服務器
-D DISABLE_R: 若是不想編譯R支持模塊，增長-D DISABLE_R=yless
-D CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-10.0(修改成本身的路徑)：若是CUDA沒法自動找到，則增長該項dom
-D CUDA_ARCH=52:指定當前機器的CUDA計算能力（GPU Compute Capability）
gcc：有文章提到，低版本的gcc編譯器不支持，我當前版本是5.4，可知足需求。

1. 查詢gcc版本

若版本太低，可安裝gcc-5.4，具體安裝參考以下博文：

linux下安裝gcc詳解

2. 查詢GPU算力

經過NVIDIA官網查詢本身GPU服務器的GPU算力，地址以下：

CUDA GPUs | NVIDIA Developer

我當前使用的服務器是GeForce RTX 2070，對應的算力是7.5。所以，CUDA_ARCH設置爲75， -D CUDA_ARCH=75

3. 配置GPU路徑

爲了可以自動查找相關庫的路徑，將cuda的路徑配置到配置文件中。當前系統使用的shell爲zsh:

在~/.zshrc中增長以下項:

export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARAY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda:$CUDA_TOOLKIT_BOOT_DIR
export CUDA_INCLUDE_DIRS=/usr/local/cuda/include
複製代碼

激活生效:

source ~/.zshrc

三、完整安裝命令

當前設備參數：

gcc: 版本 5.4
GPU算力： 7.5
僅須要Python版本支持

完整安裝命令：

git clone https://github.com/src-d/kmcuda
cd src
cmake -DCMAKE_BUILD_TYPE=Release -D DISABLE_R-y -D CUDA_ARCH=75 . && make
複製代碼

測試：

四、安裝遇到的問題

1. 使用pip安裝

安裝命令以下：

CUDA_ARCH=75 pip install libKMCUDA

出現錯誤：

2. 未指定GPU算力或使用默認值

使用pip安裝源文件

安裝命令：

pip install git+https://github.com/src-d/kmcuda.git#subdirectory=src

出現以下錯誤：

由此錯誤可見，使用該方式安裝，默認使用使用的-DCUDA_ARCH爲61，與當前實際不符。
未指定GPU算力

安裝命令：

cmake -DCMAKE_BUILD_TYPE=Release -D DISABLE_R=y . && make

以下圖，顯示安裝成功

在進行測試的時候，會出現以下錯誤：

提示計算能力與設備不匹配。

五、Python測試用例

1. K-means, L2 (Euclidean) distance

import numpy
from matplotlib import pyplot
from libKMCUDA import kmeans_cuda

numpy.random.seed(0)
arr = numpy.empty((10000, 2), dtype=numpy.float32)
arr[:2500] = numpy.random.rand(2500, 2) + [0, 2]
arr[2500:5000] = numpy.random.rand(2500, 2) - [0, 2]
arr[5000:7500] = numpy.random.rand(2500, 2) + [2, 0]
arr[7500:] = numpy.random.rand(2500, 2) - [2, 0]
centroids, assignments = kmeans_cuda(arr, 4, verbosity=1, seed=3)
print(centroids)
pyplot.scatter(arr[:, 0], arr[:, 1], c=assignments)
pyplot.scatter(centroids[:, 0], centroids[:, 1], c="white", s=150)
pyplot.show()
複製代碼

2. K-means, angular (cosine) distance + average

import numpy
from matplotlib import pyplot
from libKMCUDA import kmeans_cuda

numpy.random.seed(0)
arr = numpy.empty((10000, 2), dtype=numpy.float32)
angs = numpy.random.rand(10000) * 2 * numpy.pi
for i in range(10000):
    arr[i] = numpy.sin(angs[i]), numpy.cos(angs[i])
centroids, assignments, avg_distance = kmeans_cuda(
    arr, 4, metric="cos", verbosity=1, seed=3, average_distance=True)
print("Average distance between centroids and members:", avg_distance)
print(centroids)
pyplot.scatter(arr[:, 0], arr[:, 1], c=assignments)
pyplot.scatter(centroids[:, 0], centroids[:, 1], c="white", s=150)
pyplot.show()
複製代碼

結果以下：

六、Python API

1. kmeans_cuda()

def kmeans_cuda(samples, clusters, tolerance=0.01, init="k-means++",
                yinyang_t=0.1, metric="L2", average_distance=False,
                seed=time(), device=0, verbosity=0)
複製代碼

參數:

samples：shape爲[樣本數，特徵數]的numpy數組，或者元組(raw device pointer (int), device index (int)
- 注："samples" must be a 2D float32 or float16 numpy array
clusters: int類型，聚類簇的數目
- 注："clusters" must be greater than 1 and less than (1 << 32) - 1
tolerance：float類型，若是相對從新分配數量降低到該值如下，則算法中止。
init：string或numpy數組，設置質心初始化方法，能夠是k-means++，afk-mc2,random或指定shape的numpy數組[cluster, 特徵數]，類型必須是float32
yinynag_t：float類型，一般設置爲0.1
metric：str類型，使用的距離度量名稱。默認爲Duclidean(L2)，能夠改成cos。請注意，後一種狀況下，樣本必須歸一化。
average_distance：boolean類型，該值表示是否計算類內元素與相應質心之間的平均距離，對於尋找最優K有用，做爲第三個元組元素返回。
seed：int類型，隨機生成器種子用於再現結果。
device：int類型，CUDA設備索引，如1表示第一個設備，2表示第二個，3表示使用第一個和第二個。指定爲0表示啓用全部設備，默認爲0.
verbosity：int類型，0意味着徹底無輸出，1表示僅記錄進度，2表示大量輸出。

返回值：元組(centroids, assignments, [average_distance])。若是samples是numpy數組偶主機指針元組，則類型是numpy數組，不然，原始指針（整數）分配在同一設備上。若是samples是float16，則返回的質心也是float16。

2. knn_cuda()

def knn_cuda(k, samples, centroids, assignments, metric="L2", device=0, verbosity=0)
複製代碼

參數：

k： integer, the number of neighbors to search for each sample. Must be ≤ 116.
samples： numpy array of shape [number of samples, number of features] or tuple(raw device pointer (int), device index (int), shape (tuple(number of samples, number of features[, fp16x2 marker]))). In the latter case, negative device index means host pointer. Optionally, the tuple can be 1 item longer with the preallocated device pointer for neighbors. dtype must be either float16 or convertible to float32.
centroids： numpy array with precalculated clusters' centroids (e.g., using K-means/kmcuda/kmeans_cuda()). dtype must match samples. If samples is a tuple then centroids must be a length-2 tuple, the first element is the pointer and the second is the number of clusters. The shape is (number of clusters, number of features).
assignments： numpy array with sample-cluster associations. dtype is expected to be compatible with uint32. If samples is a tuple then assignments is a pointer. The shape is (number of samples,).
metric： str, the name of the distance metric to use. The default is Euclidean (L2), it can be changed to "cos" to change the algorithm to Spherical K-means with the angular distance. Please note that samples must be normalized in the latter case.
device： integer, bitwise OR-ed CUDA device indices, e.g. 1 means first device, 2 means second device, 3 means using first and second device. Special value 0 enables all available devices. The default is 0.
verbosity： integer, 0 means complete silence, 1 means mere progress logging, 2 means lots of output.

返回值： neighbor indices. If samples was a numpy array or a host pointer tuple, the return type is numpy array, otherwise, a raw pointer (integer) allocated on the same device. The shape is (number of samples, k).

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。