走進cassandra之二數據模型

時間 2019-11-06

標籤走進 cassandra 之二數據模型简体版

原文原文鏈接

在1.1官方的文檔裏面，列族被分紅了兩類：
static column family
dynamic column famliy程序員

static這種，就是傳統上的，你們很快就能理解的相似於關係型數據庫的table的一種，官方定義以下：
A static column family uses a relatively static set of column names and is similar to a relational database table. For
example, a column family storing user data might have columns for the user name, address, email, phone number and
so on. Although the rows generally have the same set of columns, they are not required to have all of the columns
defined. Static column families typically have column metadata pre-defined for each column.
這段英文不難看懂，也很容易理解，跟table有很大的類似性。

關於dynamic這種，就是新的概念了，官方定義：
A dynamic column family takes advantage of Cassandra's ability to use arbitrary application-supplied column names to
store data. A dynamic column family allows you to pre-compute result sets and store them in a single row for efficient
data retrieval. Each row is a snapshot of data meant to satisfy a given query, sort of like a materialized view. For
example, a column family that tracks the users that subscribe to a particular user's blog is dynamic

這段文字有點費解，尤爲是剛剛學習NOSQL的時候。
可是他舉的例子，能夠比較快接受，你們如今都用微博，若是追蹤一個成名人物全部的粉絲，爲這個目的，創建一個列族的話，就可用 dynamic這種。
（你用關係型數據庫的表，也能存儲這種關係，可是記得，咱們如今是在nosql的世界裏哦）
總的來講，各個column不是固定的，是動態的，這就是dynamic.

關於每一個column，1.1官方文檔裏也作了描述，它區分的更加細緻，除了你們所瞭解的普通列和超級列，還有其餘的，以下：

Column families consist of these kinds of columns:
? Standard: Has one primary key.
? Composite: Has more than one primary key, recommended for managing wide rows
? Expiring: Gets deleted during compaction.
? Counter: Counts occurrences of an event.
? Super: Used to manage wide rows, inferior to using composite columns.
sql

關於cassandra有不少文檔上都說它是無schema，其實這個描述，稍微有點不許確，schema是什麼呢，就是一個數據庫的一組規則標準，無論你是關係型，仍是nosql，總得有規則的，不多是一點兒沒有，區別在於 cassandra的這種， schema有點粗，至關大條。

怎麼個粗法呢？

好比說咱們找到一個schema文件的例子
xxxx\services\schema

咱們能夠打開一個cv.cass來看看。以下：
CREATE KEYSPACE CV;

USE CV;

CREATE COLUMN FAMILY Comments
WITH column_type = Super
AND comparator = TimeUUIDType
AND key_validation_class = BytesType;

CREATE COLUMN FAMILY CommentCounter
WITH default_validation_class = CounterColumnType
AND key_validation_class = BytesType
AND comparator = UTF8Type;

CREATE COLUMN FAMILY Group
WITH comparator = UTF8Type
AND key_validation_class = BytesType
AND column_metadata=[
{column_name: name, validation_class: UTF8Type, index_type: KEYS},
{column_name: type, validation_class: UTF8Type, index_type: KEYS},
{column_name: timecreated, validation_class: IntegerType, index_type: KEYS},
{column_name: timeupdated, validation_class: IntegerType, index_type: KEYS}];

能夠看出，這裏面，就列出了列族的名字，還有一些基本屬性，比起關係型數據庫的 E-R圖來講，是有些大條，以致於說它是無 schema了。

那麼這段粗略的schema，到底規定了什麼東東呢？到底在說呢？

CREATE COLUMN FAMILY Comments 給列族起了一個名字，這個好理解。
WITH column_type = Super 這段是說這個列族是超級列族。
AND comparator = TimeUUIDType 這個是規定列名的數據類型和列是如何排序的。官方的解釋是The comparator specifies the data type
for the column name, as well as the sort order in which columns are stored within a row。
AND key_validation_class = BytesType; 這個是定義row key validator的

WITH default_validation_class = CounterColumnType 這個也是validator。

AND column_metadata=[
{column_name: name, validation_class: UTF8Type, index_type: KEYS},
{column_name: type, validation_class: UTF8Type, index_type: KEYS},
{column_name: timecreated, validation_class: IntegerType, index_type: KEYS},
{column_name: timeupdated, validation_class: IntegerType, index_type: KEYS}];
這一堆，說的是針對靜態列族，你能夠預先定義一部分列。這些列是預先定義好的，死的，不是在運行時才加上去的。
比較適合有公共部分的row，好比說，一個‘人‘的列族，每人確定是有性別的，別的能夠沒有，這個必須有。

通常來講，做爲咱們程序員，只關心列族的配置就行了，keyspace的，能夠不太關注。

數據模型，是瞭解cassandra的hello world 步驟，過了這一篇，下面就是高年級知識了。數據庫