前面兩篇讓代碼飛起來——高性能 Julia 學習筆記(一) 讓代碼飛起來——高性能 Julia 學習筆記(二), 介紹瞭如何寫出高性能的 Julia 代碼, 這篇結合我最近的項目, 簡單測試對比一下各類語言用 monte carlo 算法計算 pi 的效率。html
首先聲明一下, 本文不能算嚴格意義上的性能測試, 也不想挑起語言聖戰, 我的能力有限, 實現的不一樣語言版本代碼也未必是最高效的, 基本都是 naive 實現。node
若是對 Monte Carlo 算法不熟悉, 能夠參考下面兩個資料, 我就不浪費時間重複了:ios
機器是 2015 年的 MacPro:c++
Processor: 2.5GHz Intel Core i7 Memory: 16GB 1600 MHZ DDR3 Os: macOS High Sierra Version 10.13.4
function pi(n) { let inCircle = 0; for (let i = 0; i <= n; i++) { x = Math.random(); y = Math.random(); if (x * x + y * y < 1.0) { inCircle += 1; } } return (4.0 * inCircle) / n; } const N = 100000000; console.log(pi(N));
結果:git
➜ me.magicly.performance git:(master) ✗ node --version v10.11.0 ➜ me.magicly.performance git:(master) ✗ time node mc.js 3.14174988 node mc.js 10.92s user 0.99s system 167% cpu 7.091 total
package main import ( "math/rand" ) func PI(samples int) (result float64) { inCircle := 0 r := rand.New(rand.NewSource(42)) for i := 0; i < samples; i++ { x := r.Float64() y := r.Float64() if (x*x + y*y) < 1 { inCircle++ } } return float64(inCircle) / float64(samples) * 4.0 } func main() { samples := 100000000 PI(samples) }
結果:github
➜ me.magicly.performance git:(master) ✗ go version go version go1.11 darwin/amd64 ➜ me.magicly.performance git:(master) ✗ time go run monte_carlo.go go run monte_carlo.go 2.17s user 0.10s system 101% cpu 2.231 total
#include <stdlib.h> #include <stdio.h> #include <math.h> #include <string.h> #define SEED 42 int main(int argc, char **argv) { int niter = 100000000; double x, y; int i, count = 0; double z; double pi; srand(SEED); count = 0; for (i = 0; i < niter; i++) { x = (double)rand() / RAND_MAX; y = (double)rand() / RAND_MAX; z = x * x + y * y; if (z <= 1) count++; } pi = (double)count / niter * 4; printf("# of trials= %d , estimate of pi is %g \n", niter, pi); }
結果:算法
➜ me.magicly.performance git:(master) ✗ gcc --version Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 9.1.0 (clang-902.0.39.2) Target: x86_64-apple-darwin17.5.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin ➜ me.magicly.performance git:(master) ✗ gcc -O2 -o mc-pi-c mc-pi.c ➜ me.magicly.performance git:(master) ✗ time ./mc-pi-c # of trials= 100000000 , estimate of pi is 3.14155 ./mc-pi-c 1.22s user 0.00s system 99% cpu 1.226 total
#include <iostream> #include <cstdlib> //defines rand(), srand(), RAND_MAX #include <cmath> //defines math functions using namespace std; int main() { const int SEED = 42; int interval, i; double x, y, z, pi; int inCircle = 0; srand(SEED); const int N = 100000000; for (i = 0; i < N; i++) { x = (double)rand() / RAND_MAX; y = (double)rand() / RAND_MAX; z = x * x + y * y; if (z < 1) { inCircle++; } } pi = double(4 * inCircle) / N; cout << "\nFinal Estimation of Pi = " << pi << endl; return 0; }
結果:segmentfault
➜ me.magicly.performance git:(master) ✗ c++ --version Apple LLVM version 9.1.0 (clang-902.0.39.2) Target: x86_64-apple-darwin17.5.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin ➜ me.magicly.performance git:(master) ✗ c++ -O2 -o mc-pi-cpp mc-pi.cpp ➜ me.magicly.performance git:(master) ✗ time ./mc-pi-cpp Final Estimation of Pi = 3.14155 ./mc-pi-cpp 1.23s user 0.01s system 99% cpu 1.239 total
function pi(N::Int) inCircle = 0 for i = 1:N x = rand() * 2 - 1 y = rand() * 2 - 1 r2 = x*x + y*y if r2 < 1.0 inCircle += 1 end end return inCircle / N * 4.0 end N = 100_000_000 println(pi(N))
結果:bash
➜ me.magicly.performance git:(master) ✗ julia _ _ _ _(_)_ | Documentation: https://docs.julialang.org (_) | (_) (_) | _ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 1.0.1 (2018-09-29) _/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release |__/ | julia> versioninfo() Julia Version 1.0.1 Commit 0d713926f8 (2018-09-29 19:05 UTC) Platform Info: OS: macOS (x86_64-apple-darwin14.5.0) CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-6.0.0 (ORCJIT, haswell) ➜ me.magicly.performance git:(master) ✗ time julia mc.jl 3.14179496 julia mc.jl 0.85s user 0.17s system 144% cpu 0.705 total
另外 Rust 開發環境升級搞出了點問題, 沒弄好, 不過根據以前的經驗, 我估計跟 C++差很少。併發
github 上找到一份對比, 包含了更多的語言, 有興趣的能夠參考一下https://gist.github.com/jmoir... , LuaJIT 竟然跟 Rust 差很少同樣快, 跟 Julia 官網的 benchmark 比較一致https://julialang.org/benchma... 。
另外實現了兩個 Go 的併發版本:
package main import ( "fmt" "math/rand" "runtime" "time" ) type Job struct { n int } var threads = runtime.NumCPU() var rands = make([]*rand.Rand, 0, threads) func init() { fmt.Printf("cpus: %d\n", threads) runtime.GOMAXPROCS(threads) for i := 0; i < threads; i++ { rands = append(rands, rand.New(rand.NewSource(time.Now().UnixNano()))) } } func MultiPI2(samples int) float64 { t1 := time.Now() threadSamples := samples / threads jobs := make(chan Job, 100) results := make(chan int, 100) for w := 0; w < threads; w++ { go worker2(w, jobs, results, threadSamples) } go func() { for i := 0; i < threads; i++ { jobs <- Job{ n: i, } } close(jobs) }() var total int for i := 0; i < threads; i++ { total += <-results } result := float64(total) / float64(samples) * 4 fmt.Printf("MultiPI2: %d times, value: %f, cost: %s\n", samples, result, time.Since(t1)) return result } func worker2(id int, jobs <-chan Job, results chan<- int, threadSamples int) { for range jobs { // fmt.Printf("worker id: %d, job: %v, remain jobs: %d\n", id, job, len(jobs)) var inside int // r := rand.New(rand.NewSource(time.Now().UnixNano())) r := rands[id] for i := 0; i < threadSamples; i++ { x, y := r.Float64(), r.Float64() if x*x+y*y <= 1 { inside++ } } results <- inside } } func MultiPI(samples int) float64 { t1 := time.Now() threadSamples := samples / threads results := make(chan int, threads) for j := 0; j < threads; j++ { go func() { var inside int r := rand.New(rand.NewSource(time.Now().UnixNano())) for i := 0; i < threadSamples; i++ { x, y := r.Float64(), r.Float64() if x*x+y*y <= 1 { inside++ } } results <- inside }() } var total int for i := 0; i < threads; i++ { total += <-results } result := float64(total) / float64(samples) * 4 fmt.Printf("MultiPI: %d times, value: %f, cost: %s\n", samples, result, time.Since(t1)) return result } func PI(samples int) (result float64) { t1 := time.Now() var inside int = 0 r := rand.New(rand.NewSource(time.Now().UnixNano())) for i := 0; i < samples; i++ { x := r.Float64() y := r.Float64() if (x*x + y*y) < 1 { inside++ } } ratio := float64(inside) / float64(samples) result = ratio * 4 fmt.Printf("PI: %d times, value: %f, cost: %s\n", samples, result, time.Since(t1)) return } func main() { samples := 100000000 PI(samples) MultiPI(samples) MultiPI2(samples) }
結果:
➜ me.magicly.performance git:(master) ✗ time go run monte_carlo.1.go cpus: 8 PI: 100000000 times, value: 3.141778, cost: 2.098006252s MultiPI: 100000000 times, value: 3.141721, cost: 513.008435ms MultiPI2: 100000000 times, value: 3.141272, cost: 485.336029ms go run monte_carlo.1.go 9.41s user 0.18s system 285% cpu 3.357 total
能夠看出, 效率提高了 4 倍。 爲何明明有8 個 CPU, 只提高了 4 倍呢? 其實個人 macpro 就是 4 核的, 8 是超線程出來的虛擬核,在 cpu 密集計算上並不能額外提高效率。 能夠參考這篇文章: 物理 CPU、CPU 核數、邏輯 CPU、超線程 。
下一篇,咱們就來看一下 Julia 中如何利用並行進一步提升效率。
歡迎加入知識星球一塊兒分享討論有趣的技術話題。