Local vectorhtml
Labeled pointapache
Local matrixapi
RowMatrix分佈式
BlockMatrixspa
MLlib支持 在單獨節點上本地化存儲局部向量(local vectors) 和局部矩陣(local matrices),也能夠依賴一個或更多的RDD來進行分佈式的存儲矩陣。局部向量和局部矩陣是簡單的數據模型,被做爲公共接口。底層的線性代數操做由 Breeze 和 jblas 提供。在MLlib中,一個使用監督式學習的例子被叫作「labeled point」。code
A local vector has integer-typed and 0-based indices and double-typed values, stored on a single machine. MLlib supports two types of local vectors: dense and sparse. A dense vector is backed by a double array representing its entry values, while a sparse vector is backed by two parallel arrays: indices and values. For example, a vector (1.0, 0.0, 3.0)
can be represented in dense format as [1.0, 0.0, 3.0]
or in sparse format as (3, [0, 2], [1.0, 3.0])
, where 3
is the size of the vector.
一個局部向量由一個從0開始的整數類型索引和一個double類型的值組成,被存儲在一個單獨的機器上。MLlib支持兩種類型的局部向量:密集型和稀疏行。一個密集型依靠一個double型數組來表明他的entry值,而一個稀疏型向量依靠兩個並行數組:索引數組和值數組。舉個例子,一個向量(1.0,0.0,3.0)能夠被表示爲密集型格式:[1.0, 0.0, 3.0] 或者被表示爲稀疏型格式:(3, [0,2], [1.0, 3.0]),元組的第一個值3是向量的數量。
The base class of local vectors is Vector
, and we provide two implementations: DenseVector
and SparseVector
. We recommend using the factory methods implemented in Vectors
to create local vectors.
局部向量的基本類型是Vector,咱們提供了兩種實現:DenseVector
and SparseVector
.
咱們推薦使用 Vectors 已經實現了的
工廠方法來建立局部向量。
Refer to the Vector
Scala docs and Vectors
Scala docs for details on the API.
詳細信息請參閱 Vector
Scala docs and Vectors
Scala docs API.
import org.apache.spark.mllib.linalg.{Vector, Vectors} // Create a dense vector (1.0, 0.0, 3.0). val dv: Vector = Vectors.dense(1.0, 0.0, 3.0) // Create a sparse vector (1.0, 0.0, 3.0) by specifying its indices and values corresponding to nonzero entries. val sv1: Vector = Vectors.sparse(3, Array(0, 2), Array(1.0, 3.0)) // Create a sparse vector (1.0, 0.0, 3.0) by specifying its nonzero entries. val sv2: Vector = Vectors.sparse(3, Seq((0, 1.0), (2, 3.0))) //建立一個密集型局部向量(density) val dv = Vectors.dense(Array(1.0,0.0,3.0)) val densityVector = Vectors.dense(1.0,0.0,3.0) //建立一個稀疏型局部向量(sparse),兩種方式: //一:使用並行數組:格式-> (size,index[Int],values[Double]) val sv1 = Vectors.sparse(3,Array(0,2),Array(1.0,3.0)) //二:使用Seq:格式-> (size,Seq((index,values)+)) val sv2 = Vectors.sparse(3,Seq((0,1.0),(2,3.0))) println(dv) println(densityVector) println(sv1) println(sv2) println(sv3) result: [1.0,0.0,3.0] [1.0,0.0,3.0] (3,[0,2],[1.0,3.0]) (3,[0,2],[1.0,3.0]) (3,[0,2],[1.0,3.0])
Note: Scala imports scala.collection.immutable.Vector
by default, so you have to importorg.apache.spark.mllib.linalg.Vector
explicitly to use MLlib’s Vector
.