上一篇博客《梯度降低法求多元線性迴歸及Java實現》簡單了介紹了梯度降低法,並用Java實現了一個梯度降低法求迴歸的例子。本篇博客,嘗試用dl4j的張量運算庫nd4j來實現梯度降低法求多元線性迴歸,並比較GPU和CPU計算的性能差別。java
1、ND4J簡介算法
ND4J是DL4J提供的張量運算庫,提供了多種張量運算的封裝,如下內容複雜於ND4J官網:apache
ND4J和ND4S是JVM的科學計算庫,併爲生產環境設計,亦即例程運行速度快,RAM要求低。編程
主要特色:數組
因爲易用性上存在的缺口,Java、Scala和Clojure編程人員沒法充分利用NumPy或Matlab等數據分析方面最強大的工具。Breeze等其餘庫則不支持多維數組或張量,而這倒是深度學習和其餘任務的關鍵。ND4J和ND4S正獲得國家級實驗室的使用,以完成氣候建模等任務。這類任務要求完成計算密集的模擬運算。dom
ND4J在開源、分佈式、支持GPU的庫內,爲JVM帶來了符合直覺的、Python編程人員所用的科學計算工具。在結構上,ND4J與SLF4J類似。ND4J讓生產環境下的工程師可以輕鬆將算法和界面移植到Java和Scala體系內的其餘庫內。maven
更詳細的特性,能夠去nd4j官網瞭解,地址:https://nd4j.org/cn分佈式
2、具體實現函數
一、maven配置工具
首先建一個maven工程,pom.xml完整配置以下:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.dl4j</groupId> <artifactId>linear-regression</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>linear-regression</name> <url>http://maven.apache.org</url> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <logback.version>1.1.7</logback.version> <nd4j.version>1.0.0-beta</nd4j.version> <!-- Change the nd4j.backend property to nd4j-cuda-8.0-platform, nd4j-cuda-9.0-platform or nd4j-cuda-9.1-platform to use CUDA GPUs --> <!-- <nd4j.backend>nd4j-cuda-8.0-platform</nd4j.backend> --> <nd4j.backend>nd4j-native-platform</nd4j.backend> </properties> <dependencies> <dependency> <groupId>org.nd4j</groupId> <artifactId>${nd4j.backend}</artifactId> <version>${nd4j.version}</version> </dependency> <dependency> <groupId>ch.qos.logback</groupId> <artifactId>logback-classic</artifactId> <version>${logback.version}</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> </dependencies> </project>
說明:<nd4j.backend>nd4j-native-platform</nd4j.backend> 計算後臺爲CPU
也能夠換爲Cuda GPU計算,1.0.0 beta版支持cuda8.0、cuda9.0、cuda9.1,對應的<nd4j.backend>配置可改成nd4j-cuda-8.0-platform, nd4j-cuda-9.0-platform 或者 nd4j-cuda-9.1-platform
二、首先構建訓練集,這裏咱們用的仍是以下函數:
代碼以下:
int exampleCount = 100; Random random = new Random(); double[] data = new double[exampleCount * 3]; double[] param = new double[exampleCount * 3]; for (int i = 0; i < exampleCount * 3; i++) { data[i] = random.nextDouble(); } for (int i = 0; i < exampleCount * 3; i++) { param[i] = 3; param[++i] = 4; param[++i] = 5; } INDArray features = Nd4j.create(data, new int[] { exampleCount, 3 }); INDArray params = Nd4j.create(param, new int[] { exampleCount, 3 }); INDArray label = features.mul(params).sum(1).add(10);
mul:表示兩個矩陣對應的維度相乘
sum:指定沿着某一維的方向求和,這裏sum(1)表示沿着列的方向求和
add:給張量中的每個值都加上一個標量,固然也能夠是加上張量
三、批量梯度降低(BGD)實現
private static void BGD(INDArray features, INDArray label, double learningRate, double[] parameter) { INDArray temp=features.getColumn(0).mul(parameter[0]).add(features.getColumn(1).mul(parameter[1])) .add(features.getColumn(2).mul(parameter[2])).add(parameter[3]).sub(label); parameter[0]=parameter[0]-2*learningRate*temp.mul(features.getColumn(0)).sum(0).getDouble(0)/features.size(0); parameter[1]=parameter[1]-2*learningRate*temp.mul(features.getColumn(1)).sum(0).getDouble(0)/features.size(0); parameter[2]=parameter[2]-2*learningRate*temp.mul(features.getColumn(2)).sum(0).getDouble(0)/features.size(0); parameter[3]=parameter[3]-2*learningRate*temp.sum(0).getDouble(0)/features.size(0); INDArray functionResult=features.getColumn(0).mul(parameter[0]).add(features.getColumn(1).mul(parameter[1])) .add(features.getColumn(2).mul(parameter[2])).add(parameter[3]).sub(label);//用最新的參數計算總損失用 double totalLoss=functionResult.mul(functionResult).sum(0).getDouble(0); System.out.println("totalLoss:"+totalLoss); System.out.println(parameter[0] + " " + parameter[1] + " " + parameter[2] + " " + parameter[3]); }
四、完整代碼以下,咱們循環3000次,基本找出了參數。
public class LinearRegression { public static void main(String[] args) { int exampleCount = 100; double learningRate = 0.01; Random random = new Random(); double[] data = new double[exampleCount * 3]; double[] param = new double[exampleCount * 3]; for (int i = 0; i < exampleCount * 3; i++) { data[i] = random.nextDouble(); } for (int i = 0; i < exampleCount * 3; i++) { param[i] = 3; param[++i] = 4; param[++i] = 5; } INDArray features = Nd4j.create(data, new int[] { exampleCount, 3 }); INDArray params = Nd4j.create(param, new int[] { exampleCount, 3 }); INDArray label = features.mul(params).sum(1).add(10); double[] parameter = new double[] { 1.0, 1.0, 1.0, 1.0 }; long startTime = System.currentTimeMillis(); for (int i = 0; i < 3000; i++) { BGD(features, label, learningRate, parameter); } System.out.println("耗時:" + (System.currentTimeMillis() - startTime)); } private static void BGD(INDArray features, INDArray label, double learningRate, double[] parameter) { INDArray temp = features.getColumn(0).mul(parameter[0]).add(features.getColumn(1).mul(parameter[1])) .add(features.getColumn(2).mul(parameter[2])).add(parameter[3]).sub(label); parameter[0] = parameter[0] - 2 * learningRate * temp.mul(features.getColumn(0)).sum(0).getDouble(0) / features.size(0); parameter[1] = parameter[1] - 2 * learningRate * temp.mul(features.getColumn(1)).sum(0).getDouble(0) / features.size(0); parameter[2] = parameter[2] - 2 * learningRate * temp.mul(features.getColumn(2)).sum(0).getDouble(0) / features.size(0); parameter[3] = parameter[3] - 2 * learningRate * temp.sum(0).getDouble(0) / features.size(0); INDArray functionResult = features.getColumn(0).mul(parameter[0]).add(features.getColumn(1).mul(parameter[1])) .add(features.getColumn(2).mul(parameter[2])).add(parameter[3]).sub(label);// 用最新的參數計算總損失用 double totalLoss = functionResult.mul(functionResult).sum(0).getDouble(0); System.out.println("totalLoss:" + totalLoss); System.out.println(parameter[0] + " " + parameter[1] + " " + parameter[2] + " " + parameter[3]); } }
五、運行結果
totalLoss:0.2690272927284241
參數:3.1580112185120623 、4.09253967668414三、 5.087942487003652五、 9.820665847778292
3、GPU與CPU運行性能對比
操做系統:window10
CPU:Intel core(TM) i7-5700HQ CPU @2.70GHz
GPU:NVIDIA Geforce GTX 950M
CUDA:Cuda 8.0,V8.0.44
樣本數量 | CPU耗時 | GPU耗時 |
1000 | 1314ms | 11013ms |
10000 | 3430ms | 11608ms |
100000 | 16544ms | 17873ms |
500000 | 78713ms | 49151ms |
1000000 |
156387ms | 83001ms |
GPU在矩陣運算加速有優點,在深度學習中,當參數規模和樣本規模比較大的時候,選擇GPU加速,計算速度會有明顯的提高
快樂源於分享。
此博客乃做者原創, 轉載請註明出處