乾貨丨DolphinDB即時編譯（JIT）詳解

時間 2021-03-05

標籤 git github 數據庫編程緩存數據結構 app 編程語言分佈式 ide 欄目 Git 简体版

原文原文鏈接

DolphinDB是高性能分佈式時序數據庫，內置了豐富的計算功能和強大多範式編程語言。爲了可以提升DolphinDB腳本的執行效率，從1.01版本開始，DolphinDB支持即時編譯（JIT）。git

1 JIT簡介

即時編譯(英文: Just-in-time compilation, 縮寫: JIT)，又譯及時編譯或實時編譯，是動態編譯的一種形式，可提升程序運行效率。github

一般程序有兩種運行方式：編譯執行和解釋執行。編譯執行在程序執行前所有翻譯爲機器碼，特色是運行效率較高，以C/C++爲表明。解釋執行是由解釋器對程序逐句解釋並執行，靈活性較強，可是執行效率較低，以Python爲表明。數據庫

即時編譯融合了二者的優勢，在運行時將代碼翻譯爲機器碼，能夠達到與靜態編譯語言相近的執行效率。Python的第三方實現PyPy經過JIT明顯改善了解釋器的性能。絕大多數的Java實現都依賴JIT以提升代碼的運行效率。編程

2 JIT在DolphinDB中的做用

DolphinDB的編程語言是解釋執行，運行程序時首先對程序進行語法分析生成語法樹，而後遞歸執行。在不能使用向量化的狀況下，解釋成本會比較高。這是因爲DolphinDB底層由C++實現，腳本中的一次函數調用會轉化爲屢次C++內的虛擬函數調用。for循環，while循環和if-else等語句中，因爲要反覆調用函數，十分耗時，在某些場景下不能知足實時性的需求。緩存

DolphinDB中的即時編譯功能顯著提升了for循環，while循環和if-else等語句的運行速度，特別適合於沒法使用向量化運算但又對運行速度有極高要求的場景，例如高頻因子計算、實時流數據處理等。數據結構

在下面的例子中，咱們對比使用和不使用JIT的狀況下，do-while循環計算1到1000000之和100次所須要的時間。app

def sum_without_jit(v) {
  s = 0l
  i = 1
  n = size(v)
  do {
    s += v[i]
    i += 1
  } while(i <= n)
  return s
}

@jit
def sum_with_jit(v) {
  s = 0l
  i = 1
  n = size(v)
  do {
    s += v[i]
    i += 1
  } while(i <= n)
  return s
}

vec = 1..1000000

timer(100) sum_without_jit(vec)     // 120552.740 ms
timer(100) sum_with_jit(vec)        //    290.065 ms
timer(100) sum(vec)                 //     48.922 ms

不使用JIT的耗時是使用JIT的415倍，使用內置sum函數耗時1/7左右，這裏內置函數比JIT快是由於JIT生成的代碼中有不少檢查NULL值的指令，內置的sum函數若是發現輸入的array沒有NULL值則會省略這一步操做。編程語言

vec[100] = NULL
timer(100) sum(vec)        // 118.063 ms

若是加上NULL值，內置sum的速度是JIT的2.5倍左右，這是因爲內置sum還進行了一些手動的展開優化。若是函數內涉及到更多的複雜計算，那麼JIT的速度則會超過向量化運算，這個咱們在下面會提到。分佈式

若任務可使用向量化計算，視狀況能夠不使用JIT，可是在諸如如高頻因子生成等實際應用中，如何把循環計算轉化爲向量化運算須要必定的技巧。ide

在知乎上的一篇專欄中，咱們展現瞭如何使用在DolphinDB中使用向量化運算，其中計算買賣信號的式子以下：

direction = (iif(signal>t1, 1h, iif(signal<t10, 0h, 00h)) - iif(signal<t2, 1h, iif(signal>t20, 0h, 00h))).ffill().nullFill(0h)

對於初學DolphinDB的人來講，須要瞭解iif函數纔可寫出以上語句。使用for循環改寫以上語句則較爲容易：

@jit
def calculate_with_jit(signal, n, t1, t10, t20, t2) {
  cur = 0
  idx = 0
  output = array(INT, n, n)
  for (s in signal) {
    if(s > t1) {           // (t1, inf)
      cur = 1
    } else if(s >= t10) {  // [t10, t1]
      if(cur == -1) cur = 0
    } else if(s > t20) {   // [t20, t10)
      cur = 0
    } else if(s >= t2) {   // [t2, t20]
      if(cur == 1) cur = 0
    } else {               // (-inf, t2)
      cur = -1
    }
    output[idx] = cur
    idx += 1
  }
  return output
}

把@jit去掉，獲得不使用JIT的自定義函數calculate_without_jit。對比三種方法的耗時：

n = 10000000
t1= 60
t10 = 50
t20 = 30
t2 = 20
signal = rand(100.0, n)

timer(100) (iif(signal >t1, 1h, iif(signal < t10, 0h, 00h)) - iif(signal <t2, 1h, iif(signal > t20, 0h, 00h))).ffill().nullFill(0h) // 41092.019 ms
timer(100) calculate_with_jit(calculate, signal, size(signal), t1, t10, t20, t2)       //    17075.127 ms
timer(100) calculate_without_jit(signal, size(signal), t1, t10, t20, t2)               //  1404406.413 ms

本例中，使用JIT的速度向量化運算的2.4倍，是不用JIT的82倍。這裏JIT的速度比向量化運算還要快，是由於向量化運算中調用了不少次DolphinDB的內置函數，產生了不少中間結果，

涉及到屢次內存分配以及虛擬函數調用，而JIT生成的代碼則沒有這些額外的開銷。

另一種狀況是，某些計算沒法使用向量化，好比計算期權隱含波動率(implied volatility)時，須要使用牛頓法，沒法使用向量化運算。這種狀況下若是須要知足必定的實時性，能夠選擇使用DolphinDB的插件，亦可以使用JIT。二者的區別在於，在任何場景下均可以使用插件，可是須要使用C++語言編寫，比較複雜；JIT的編寫相對而言較爲容易，可是適用的場景較爲有限。JIT的運行速度與使用C++插件的速度很是接近。

3 如何在DolphinDB中使用JIT

3.1 使用方法

DolphinDB目前僅支持對用戶自定義函數進行JIT。只需在用戶自定義函數以前的一行添加 @jit 的標識便可：

@jit
def myFunc(/* arguments */) {
  /* implementation */
}

用戶在調用此函數時，DolphinDB會將函數的代碼實時編譯爲機器碼後執行。

3.2 支持的語句

目前DolphinDB支持在JIT中使用如下幾種語句：

賦值語句，例如：

@jit
def func() {
  y = 1
}

請注意，multiple assign目前是不支持的，例如：

@jit
def func() {
  a, b = 1, 2
}
func()

運行以上語句會拋出異常。

return語句，例如：

@jit
def func() {
  return 1
}

if-else語句，好比：

@jit
def myAbs(x) {
  if(x > 0) return x
  else return -x
}

do-while語句，例如：

@jit
def mySqrt(x) {
    diff = 0.0000001
    guess = 1.0
    guess = (x / guess + guess) / 2.0
    do {
        guess = (x / guess + guess) / 2.0
    } while(abs(guess * guess - x) >= diff)
    return guess
}

for語句，例如：

@jit
def mySum(vec) {
  s = 0
  for(i in vec) {
    s += i
  }
  return s
}

DolphinDB支持在JIT中以上語句的任意嵌套。

3.3 支持的運算符和函數

目前DolphinDB支持在JIT中使用如下的運算符：add(+), sub(-), multiply(*), divide(/), and(&&), or(||), bitand(&), bitor(|), bitxor(^), eq(==), neq(!=), ge(>=), gt(>), le(<=), lt(<), neg(-), mod(%), seq(..), at([])，以上運算在全部數據類型下的實現都與非JIT的實現一致。

目前DolphinDB支持在JIT中使用如下的數學函數： exp, log, sin, asin, cos, acos, tan, atan, abs, ceil, floor, sqrt。以上數學函數在JIT中出現時，

若是接受的參數爲scalar，那麼在最後生成的機器碼中會調用glibc中對應的函數或者通過優化的C實現的函數；若是接收的參數爲array，那麼最後會調用DolphinDB

提供的數學函數。這樣的好處是經過直接調用C實現的代碼提高函數運行效率，減小沒必要要的虛擬函數調用和內存分配。

目前DolphinDB支持在JIT中使用如下的內置函數：take, array, size, isValid, rand,cdfNormal。

須要注意，array函數的第一個參數必須直接指定具體的數據類型，不能經過變量傳遞指定。這是因爲JIT編譯時必須知道全部變量的類型，而array函數返回結果的類型由第一個參數指定，所以編譯時必須該值必須已知。

3.4 空值的處理

JIT中全部的函數和運算符處理空值的方法都與原生函數和運算符一致，即每一個數據類型都用該類型的最小值來表示該類型的空值，用戶不須要專門處理空值。

3.5 JIT函數之間的調用

DolphinDB的JIT函數能夠調用另外一個JIT函數。例如：

@jit
def myfunc1(x) {
  return sqrt(x) + exp(x)
}

@jit
def myfunc2(x) {
  return myfunc1(x)
}

myfunc2(1.5)

在上面的例子中，內部會先編譯myfunc1, 生成一個簽名爲 double myfunc1(double) 的native函數，myfunc2生成的機器碼中直接調用這個函數，而不是在運行時判斷myfunc1是否爲JIT函數後再執行，從而達到最高的執行效率。

請注意，JIT函數內不能夠調用非JIT的用戶自定義函數，由於這樣沒法進行類型推導。關於類型推導下面會提到。

3.6 JIT的編譯成本以及緩存機制

DolphinDB的JIT底層依賴LLVM實現，每一個用戶自定義函數在編譯時都會生成本身的module，相互獨立。編譯主要包含如下幾個步驟：

LLVM相關變量和環境的初始化
根據DolphinDB腳本的語法樹生成LLVM的IR
調用LLVM優化第二步生成的IR，而後編譯爲機器碼

以上步驟中第一步耗時通常在5ms之內，後面兩步的耗時與實際腳本的複雜度成正比，整體而言編譯耗時基本上在50ms之內。

對於一個JIT函數以及一個參數類型組合，DolphinDB只會編譯一次。系統會對JIT函數編譯的結果進行緩存。系統根據用戶調用一個JIT函數時提供的參數的數據類型獲得一個對應的字符串，而後在一個哈希表中尋找這個字符串對應的編譯結果，若是存在則直接調用；若是不存在則開始編譯，並將編譯結果保存到此哈希表中，而後執行。

對須要反覆執行的任務，或者運行時間遠超編譯耗時的任務，JIT會顯著提升運行速度。

3.7 侷限

目前DolphinDB中JIT適用的場景還比較有限：

只支持用戶自定義函數的JIT。
只接受scalar和array類型的參數，另外的類型如table, dict，pair, string, symbol等暫不支持。
不接受subarray做爲參數。

4 類型推導

在使用LLVM生成IR以前，必須知道腳本中全部變量的類型，這個步驟就是類型推導。DolphinDB的JIT使用的類型推導方式是局部推導，好比：

@jit
def foo() {
  x = 1
  y = 1.1
  z = x + y
  return z
}

經過 x = 1 肯定x的類型是int；經過 y = 1.1 肯定y的類型是 double；經過 z = x + y 以及上面推得的x和y的類型，肯定z的類型也是double；經過 return z 肯定foo函數的返回類型是double。

若是函數有參數的話，好比：

@jit
def foo(x) {
  return x + 1
}

foo函數的返回類型就依賴於輸入值x的類型。

上面咱們提到了目前JIT支持的數據類型，若是函數內部出現了不支持的類型，或者輸入的變量類型不支持，那麼就會致使整個函數的變量類型推導失敗，在運行時會拋出異常。例如：

@jit
def foo(x) {
  return x + 1
}

foo(123)             // 正常執行
foo("abc")           // 拋出異常，由於目前不支持STRING
foo(1:2)             // 拋出異常，由於目前不支持pair
foo((1 2, 3 4, 5 6)) // 拋出異常，由於目前不支持tuple

@jit
def foo(x) {
  y = cumprod(x)
  z = y + 1
  return z
}

foo(1..10)             // 拋出異常，由於目前還不支持cumprod函數，不知道該函數返回的類型，致使類型推導失敗

所以，爲了可以正常使用JIT函數，用戶應該避免在函數內或者參數中使用諸如tuple或string等還未支持的類型，不要使用尚不支持的函數。

5 實例

5.1 計算隱含波動率 (implied volatility)

上面提到過某些計算沒法進行向量化運算，計算隱含波動率 (implied volatility)就是一個例子：

@jit
def GBlackScholes(future_price, strike, input_ttm, risk_rate, b_rate, input_vol, is_call) {
  ttm = input_ttm + 0.000000000000001;
  vol = input_vol + 0.000000000000001;

  d1 = (log(future_price/strike) + (b_rate + vol*vol/2) * ttm) / (vol * sqrt(ttm));
  d2 = d1 - vol * sqrt(ttm);

  if (is_call) {
    return future_price * exp((b_rate - risk_rate) * ttm) * cdfNormal(0, 1, d1) - strike * exp(-risk_rate*ttm) * cdfNormal(0, 1, d2);
  } else {
    return strike * exp(-risk_rate*ttm) * cdfNormal(0, 1, -d2) - future_price * exp((b_rate - risk_rate) * ttm) * cdfNormal(0, 1, -d1);
  }
}

@jit
def ImpliedVolatility(future_price, strike, ttm, risk_rate, b_rate, option_price, is_call) {
  high=5.0;
  low = 0.0;

  do {
    if (GBlackScholes(future_price, strike, ttm, risk_rate, b_rate, (high+low)/2, is_call) > option_price) {
      high = (high+low)/2;
    } else {
      low = (high + low) /2;
    }
  } while ((high-low) > 0.00001);

  return (high + low) /2;
}

@jit
def test_jit(future_price, strike, ttm, risk_rate, b_rate, option_price, is_call) {
	n = size(future_price)
	ret = array(DOUBLE, n, n)
	i = 0
	do {
		ret[i] = ImpliedVolatility(future_price[i], strike[i], ttm[i], risk_rate[i], b_rate[i], option_price[i], is_call[i])
		i += 1
	} while(i < n)
	return ret
}

n = 100000
future_price=take(rand(10.0,1)[0], n)
strike_price=take(rand(10.0,1)[0], n)
strike=take(rand(10.0,1)[0], n)
input_ttm=take(rand(10.0,1)[0], n)
risk_rate=take(rand(10.0,1)[0], n)
b_rate=take(rand(10.0,1)[0], n)
vol=take(rand(10.0,1)[0], n)
input_vol=take(rand(10.0,1)[0], n)
multi=take(rand(10.0,1)[0], n)
is_call=take(rand(10.0,1)[0], n)
ttm=take(rand(10.0,1)[0], n)
option_price=take(rand(10.0,1)[0], n)

timer(10) test_jit(future_price, strike, ttm, risk_rate, b_rate, option_price, is_call)          //  2621.73 ms
timer(10) test_non_jit(future_price, strike, ttm, risk_rate, b_rate, option_price, is_call)      //   302714.74 ms

上面的例子中，ImpliedVolatility會調用GBlackScholes函數。函數test_non_jit可經過把test_jit定義以前的@jit去掉以獲取。JIT版本test_jit運行速度是非JIT版本test_non_jit的115倍。

5.2 計算 Greeks

量化金融中常用Greeks進行風險評估，下面以Charm爲例展現JIT的使用：

@jit
def myMax(a,b){
	if(a>b){
		return a
	}else{
		return b
	}
}

@jit
def NormDist(x) {
  return cdfNormal(0, 1, x);
}

@jit
def ND(x) {
  return (1.0/sqrt(2*pi)) * exp(-(x*x)/2.0)
}

@jit
def CalculateCharm(future_price, strike_price, input_ttm, risk_rate, b_rate, vol, multi, is_call) {
  day_year = 245.0;

  d1 = (log(future_price/strike_price) + (b_rate + (vol*vol)/2.0) * input_ttm) / (myMax(vol,0.00001) * sqrt(input_ttm));
  d2 = d1 - vol * sqrt(input_ttm);

  if (is_call) {
    return -exp((b_rate - risk_rate) * input_ttm) * (ND(d1) * (b_rate/vol/sqrt(input_ttm) - d2/2.0/input_ttm) + (b_rate-risk_rate) * NormDist(d1)) * future_price * multi / day_year;
  } else {
    return -exp((b_rate - risk_rate) * input_ttm) * (ND(d1) * (b_rate/vol/sqrt(input_ttm) - d2/2.0/input_ttm) - (b_rate-risk_rate) * NormDist(-d1)) * future_price * multi / day_year;
  }
}

@jit
def test_jit(future_price, strike_price, input_ttm, risk_rate, b_rate, vol, multi, is_call) {
	n = size(future_price)
	ret = array(DOUBLE, n, n)
	i = 0
	do {
		ret[i] = CalculateCharm(future_price[i], strike_price[i], input_ttm[i], risk_rate[i], b_rate[i], vol[i], multi[i], is_call[i])
		i += 1
	} while(i < n)
	return ret
}


def ND_validate(x) {
  return (1.0/sqrt(2*pi)) * exp(-(x*x)/2.0)
}

def NormDist_validate(x) {
  return cdfNormal(0, 1, x);
}

def CalculateCharm_vectorized(future_price, strike_price, input_ttm, risk_rate, b_rate, vol, multi, is_call) {
	day_year = 245.0;

	d1 = (log(future_price/strike_price) + (b_rate + pow(vol, 2)/2.0) * input_ttm) / (max(vol, 0.00001) * sqrt(input_ttm));
	d2 = d1 - vol * sqrt(input_ttm);
	return iif(is_call,-exp((b_rate - risk_rate) * input_ttm) * (ND_validate(d1) * (b_rate/vol/sqrt(input_ttm) - d2/2.0/input_ttm) + (b_rate-risk_rate) * NormDist_validate(d1)) * future_price * multi / day_year,-exp((b_rate - risk_rate) * input_ttm) * (ND_validate(d1) * (b_rate/vol/sqrt(input_ttm) - d2/2.0/input_ttm) - (b_rate-risk_rate) * NormDist_validate(-d1)) * future_price * multi / day_year)
}

n = 1000000
future_price=rand(10.0,n)
strike_price=rand(10.0,n)
strike=rand(10.0,n)
input_ttm=rand(10.0,n)
risk_rate=rand(10.0,n)
b_rate=rand(10.0,n)
vol=rand(10.0,n)
input_vol=rand(10.0,n)
multi=rand(10.0,n)
is_call=rand(true false,n)
ttm=rand(10.0,n)
option_price=rand(10.0,n)

timer(10) test_jit(future_price, strike_price, input_ttm, risk_rate, b_rate, vol, multi, is_call)                     //   1834.342 ms
timer(10) test_none_jit(future_price, strike_price, input_ttm, risk_rate, b_rate, vol, multi, is_call)                // 224099.805 ms
timer(10) CalculateCharm_vectorized(future_price, strike_price, input_ttm, risk_rate, b_rate, vol, multi, is_call)    //   3117.761 ms

上面是一個更加複雜的例子，涉及到更多的函數調用和更復雜的計算，JIT版本比非JIT版本快121倍左右，比向量化版本快0.7倍左右。

5.3 計算止損點 (stoploss)

在這篇知乎專欄中，咱們展現瞭如何使用DolphinDB進行技術信號回測，下面咱們用JIT來實現其中的stoploss函數：

@jit
def stoploss_JIT(ret, threshold) {
	n = ret.size()
	i = 0
	curRet = 1.0
	curMaxRet = 1.0
	indicator = take(true, n)

	do {
		indicator[i] = false
		curRet *= (1 + ret[i])
		if(curRet > curMaxRet) { curMaxRet = curRet }
		drawDown = 1 - curRet / curMaxRet;
		if(drawDown >= threshold) {
			i = n // break is not supported for now
		}
		i += 1
	} while(i < n)

	return indicator
}

def stoploss_no_JIT(ret, threshold) {
	n = ret.size()
	i = 0
	curRet = 1.0
	curMaxRet = 1.0
	indicator = take(true, n)

	do {
		indicator[i] = false
		curRet *= (1 + ret[i])
		if(curRet > curMaxRet) { curMaxRet = curRet }
		drawDown = 1 - curRet / curMaxRet;
		if(drawDown >= threshold) {
			i = n // break is not supported for now
		}
		i += 1
	} while(i < n)

	return indicator
}

def stoploss_vectorization(ret, threshold){
	cumret = cumprod(1+ret)
 	drawDown = 1 - cumret / cumret.cummax()
	firstCutIndex = at(drawDown >= threshold).first() + 1
	indicator = take(false, ret.size())
	if(isValid(firstCutIndex) and firstCutIndex < ret.size())
		indicator[firstCutIndex:] = true
	return indicator
}
ret = take(0.0008 -0.0008, 1000000)
threshold = 0.10
timer(10) stoploss_JIT(ret, threshold)              //      58.674 ms
timer(10) stoploss_no_JIT(ret, threshold)           //   14622.142 ms
timer(10) stoploss_vectorization(ret, threshold)    //     151.884 ms

stoploss這個函數實際上只須要找到drawdown大於threshold的第一天，不須要把cumprod和cummax所有計算出來，所以用JIT實現的版本比向量化版本快了1.5倍左右，比非JIT版本快248倍左右。

若是數據中最後一天纔要stoploss，那麼JIT版本的速度會和向量化同樣，可是遠遠比非JIT版本快。