KMCUDA: GPU/CUDA 實現Kmeans

  • 最近項目使用到Kmeans算法,考慮到CPU實現速度上的限制,須要使用GPU加速,所以查到libKMCUDA庫。
  • 記錄安裝使用過程當中遇到的一些問題。

一、kmcuda簡介

項目地址:kmcudalinux

項目內容:Large scale K-means and K-nn implementation on NVIDIA GPU / CUDAgit

該項目具體的介紹可參照github上的說明。github

性能以下:算法

從技術上來說,該項目是一個共享庫,可導出kmcuda.h中定義的兩個函數:kmeasn_cudaknn_cuda。它具備內置的Python3和R語言本機擴展支持,所以能夠從libKMCUDA導入kmeans_cudadyn.load("libKMCUDA.so")shell

二、安裝

Github上給出的安裝命令:數組

git clone https://github.com/src-d/kmcuda
cd src
cmake -DCMAKE_BUILD_TYPE=Release . && make
複製代碼

有幾個參數須要注意一下:bash

  • -D DISABLE_PYTHON: 若是不想編譯Python支持模塊,將該項值爲y,即增長-D DISABLE_PYTHON=y服務器

  • -D DISABLE_R: 若是不想編譯R支持模塊,增長-D DISABLE_R=yless

  • -D CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-10.0(修改成本身的路徑):若是CUDA沒法自動找到,則增長該項dom

  • -D CUDA_ARCH=52:指定當前機器的CUDA計算能力(GPU Compute Capability)

  • gcc:有文章提到,低版本的gcc編譯器不支持,我當前版本是5.4,可知足需求。

1. 查詢gcc版本

若版本太低,可安裝gcc-5.4,具體安裝參考以下博文:

linux下安裝gcc詳解

2. 查詢GPU算力

經過NVIDIA官網查詢本身GPU服務器的GPU算力,地址以下:

CUDA GPUs | NVIDIA Developer

我當前使用的服務器是GeForce RTX 2070,對應的算力是7.5。 所以,CUDA_ARCH設置爲75, -D CUDA_ARCH=75

3. 配置GPU路徑

爲了可以自動查找相關庫的路徑,將cuda的路徑配置到配置文件中。當前系統使用的shell爲zsh:

~/.zshrc中增長以下項:

export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARAY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda:$CUDA_TOOLKIT_BOOT_DIR
export CUDA_INCLUDE_DIRS=/usr/local/cuda/include
複製代碼

激活生效:

source ~/.zshrc

三、完整安裝命令

當前設備參數:

  • gcc: 版本 5.4
  • GPU算力: 7.5
  • 僅須要Python版本支持

完整安裝命令:

git clone https://github.com/src-d/kmcuda
cd src
cmake -DCMAKE_BUILD_TYPE=Release -D DISABLE_R-y -D CUDA_ARCH=75 . && make
複製代碼

測試:

四、安裝遇到的問題

1. 使用pip安裝

安裝命令以下:

CUDA_ARCH=75 pip install libKMCUDA

出現錯誤:

2. 未指定GPU算力或使用默認值

  • 使用pip安裝源文件

    安裝命令:

    pip install git+https://github.com/src-d/kmcuda.git#subdirectory=src

    出現以下錯誤:

    由此錯誤可見,使用該方式安裝,默認使用使用的-DCUDA_ARCH爲61,與當前實際不符。

  • 未指定GPU算力

    安裝命令:

    cmake -DCMAKE_BUILD_TYPE=Release -D DISABLE_R=y . && make

    以下圖,顯示安裝成功

在進行測試的時候,會出現以下錯誤:

提示計算能力與設備不匹配。

五、Python測試用例

1. K-means, L2 (Euclidean) distance

import numpy
from matplotlib import pyplot
from libKMCUDA import kmeans_cuda

numpy.random.seed(0)
arr = numpy.empty((10000, 2), dtype=numpy.float32)
arr[:2500] = numpy.random.rand(2500, 2) + [0, 2]
arr[2500:5000] = numpy.random.rand(2500, 2) - [0, 2]
arr[5000:7500] = numpy.random.rand(2500, 2) + [2, 0]
arr[7500:] = numpy.random.rand(2500, 2) - [2, 0]
centroids, assignments = kmeans_cuda(arr, 4, verbosity=1, seed=3)
print(centroids)
pyplot.scatter(arr[:, 0], arr[:, 1], c=assignments)
pyplot.scatter(centroids[:, 0], centroids[:, 1], c="white", s=150)
pyplot.show()
複製代碼

2. K-means, angular (cosine) distance + average

import numpy
from matplotlib import pyplot
from libKMCUDA import kmeans_cuda

numpy.random.seed(0)
arr = numpy.empty((10000, 2), dtype=numpy.float32)
angs = numpy.random.rand(10000) * 2 * numpy.pi
for i in range(10000):
    arr[i] = numpy.sin(angs[i]), numpy.cos(angs[i])
centroids, assignments, avg_distance = kmeans_cuda(
    arr, 4, metric="cos", verbosity=1, seed=3, average_distance=True)
print("Average distance between centroids and members:", avg_distance)
print(centroids)
pyplot.scatter(arr[:, 0], arr[:, 1], c=assignments)
pyplot.scatter(centroids[:, 0], centroids[:, 1], c="white", s=150)
pyplot.show()
複製代碼

結果以下:

六、Python API

1. kmeans_cuda()

def kmeans_cuda(samples, clusters, tolerance=0.01, init="k-means++",
                yinyang_t=0.1, metric="L2", average_distance=False,
                seed=time(), device=0, verbosity=0)
複製代碼

參數:

  • samples:shape爲[樣本數,特徵數]的numpy數組,或者元組(raw device pointer (int), device index (int)
    • 注:"samples" must be a 2D float32 or float16 numpy array
  • clusters: int類型,聚類簇的數目
    • 注:"clusters" must be greater than 1 and less than (1 << 32) - 1
  • tolerancefloat類型,若是相對從新分配數量降低到該值如下,則算法中止。
  • initstring或numpy數組,設置質心初始化方法,能夠是k-means++afk-mc2,random或指定shape的numpy數組[cluster, 特徵數],類型必須是float32
  • yinynag_tfloat類型,一般設置爲0.1
  • metricstr類型,使用的距離度量名稱。默認爲Duclidean(L2),能夠改成cos。請注意,後一種狀況下,樣本必須歸一化。
  • average_distanceboolean類型,該值表示是否計算類內元素與相應質心之間的平均距離,對於尋找最優K有用,做爲第三個元組元素返回。
  • seedint類型,隨機生成器種子用於再現結果。
  • deviceint類型,CUDA設備索引,如1表示第一個設備,2表示第二個,3表示使用第一個和第二個。指定爲0表示啓用全部設備,默認爲0.
  • verbosityint類型,0意味着徹底無輸出,1表示僅記錄進度,2表示大量輸出。

返回值:元組(centroids, assignments, [average_distance])。 若是samples是numpy數組偶主機指針元組,則類型是numpy數組,不然,原始指針(整數)分配在同一設備上。 若是samplesfloat16,則返回的質心也是float16

2. knn_cuda()

def knn_cuda(k, samples, centroids, assignments, metric="L2", device=0, verbosity=0)
複製代碼

參數:

  • k: integer, the number of neighbors to search for each sample. Must be ≤ 116.

  • samples: numpy array of shape [number of samples, number of features] or tuple(raw device pointer (int), device index (int), shape (tuple(number of samples, number of features[, fp16x2 marker]))). In the latter case, negative device index means host pointer. Optionally, the tuple can be 1 item longer with the preallocated device pointer for neighbors. dtype must be either float16 or convertible to float32.

  • centroids: numpy array with precalculated clusters' centroids (e.g., using K-means/kmcuda/kmeans_cuda()). dtype must match samples. If samples is a tuple then centroids must be a length-2 tuple, the first element is the pointer and the second is the number of clusters. The shape is (number of clusters, number of features).

  • assignments: numpy array with sample-cluster associations. dtype is expected to be compatible with uint32. If samples is a tuple then assignments is a pointer. The shape is (number of samples,).

  • metric: str, the name of the distance metric to use. The default is Euclidean (L2), it can be changed to "cos" to change the algorithm to Spherical K-means with the angular distance. Please note that samples must be normalized in the latter case.

  • device: integer, bitwise OR-ed CUDA device indices, e.g. 1 means first device, 2 means second device, 3 means using first and second device. Special value 0 enables all available devices. The default is 0.

  • verbosity: integer, 0 means complete silence, 1 means mere progress logging, 2 means lots of output.

返回值: neighbor indices. If samples was a numpy array or a host pointer tuple, the return type is numpy array, otherwise, a raw pointer (integer) allocated on the same device. The shape is (number of samples, k).

相關文章
相關標籤/搜索