Local vectorhtml
Labeled pointapache
Local matrixapi
MLlib支持 在單獨節點上本地化存儲局部向量(local vectors) 和局部矩陣(local matrices),也能夠依賴一個或更多的RDD來進行分佈式的存儲矩陣。局部向量和局部矩陣是簡單的數據模型,被做爲公共接口。底層的線性代數操做由 Breeze 和 jblas 提供。在MLlib中,一個使用監督式學習的例子被叫作「labeled point」。code
A local vector has integer-typed and 0-based indices and double-typed values, stored on a single machine. MLlib supports two types of local vectors: dense and sparse. A dense vector is backed by a double array representing its entry values, while a sparse vector is backed by two parallel arrays: indices and values. For example, a vector (1.0, 0.0, 3.0)
can be represented in dense format as [1.0, 0.0, 3.0]
or in sparse format as (3, [0, 2], [1.0, 3.0])
, where 3
is the size of the vector.
一個局部向量由一個從0開始的整數類型索引和一個double類型的值組成,被存儲在一個單獨的機器上。MLlib支持兩種類型的局部向量:密集型和稀疏行。一個密集型依靠一個double型數組來表明他的entry值,而一個稀疏型向量依靠兩個並行數組:索引數組和值數組。舉個例子,一個向量(1.0,0.0,3.0)能夠被表示爲密集型格式:[1.0, 0.0, 3.0] 或者被表示爲稀疏型格式:(3, [0,2], [1.0, 3.0]),元組的第一個值3是向量的數量。
The base class of local vectors is Vector
, and we provide two implementations: DenseVector
and SparseVector
. We recommend using the factory methods implemented in Vectors
to create local vectors.
and SparseVector
咱們推薦使用 Vectors 已經實現了的
Refer to the Vector
Scala docs and Vectors
Scala docs for details on the API.
詳細信息請參閱 Vector
Scala docs and Vectors
Scala docs API.
import org.apache.spark.mllib.linalg.{Vector, Vectors} // Create a dense vector (1.0, 0.0, 3.0). val dv: Vector = Vectors.dense(1.0, 0.0, 3.0) // Create a sparse vector (1.0, 0.0, 3.0) by specifying its indices and values corresponding to nonzero entries. val sv1: Vector = Vectors.sparse(3, Array(0, 2), Array(1.0, 3.0)) // Create a sparse vector (1.0, 0.0, 3.0) by specifying its nonzero entries. val sv2: Vector = Vectors.sparse(3, Seq((0, 1.0), (2, 3.0))) //建立一個密集型局部向量(density) val dv = Vectors.dense(Array(1.0,0.0,3.0)) val densityVector = Vectors.dense(1.0,0.0,3.0) //建立一個稀疏型局部向量(sparse),兩種方式: //一:使用並行數組:格式-> (size,index[Int],values[Double]) val sv1 = Vectors.sparse(3,Array(0,2),Array(1.0,3.0)) //二:使用Seq:格式-> (size,Seq((index,values)+)) val sv2 = Vectors.sparse(3,Seq((0,1.0),(2,3.0))) println(dv) println(densityVector) println(sv1) println(sv2) println(sv3) result: [1.0,0.0,3.0] [1.0,0.0,3.0] (3,[0,2],[1.0,3.0]) (3,[0,2],[1.0,3.0]) (3,[0,2],[1.0,3.0])
Note: Scala imports scala.collection.immutable.Vector
by default, so you have to importorg.apache.spark.mllib.linalg.Vector
explicitly to use MLlib’s Vector