Vitess is a database solution for deploying, scaling and managing large clusters of MySQL instances. It’s architected to run as effectively in a public or private cloud architecture as it does on dedicated hardware. It combines and extends many important MySQL features with the scalability of a NoSQL database. Vitess can help you with the following problems:mysql
Vitess includes compliant JDBC and Go database drivers using a native query protocol. Additionally, it implements the MySQL server protocol which is compatible with virtually any other language.git
Vitess has been serving all YouTube database traffic since 2011, and has now been adopted by many enterprises for their production needs.github
The following example will use a simple commerce database to illustrate how Vitess can take you through the journey of scaling from a single database to a fully distributed and sharded cluster. This is a fairly common story, and it applies to many use cases beyond e-commerce.web
It’s 2018 and, no surprise to anyone, people are still buying stuff online. You recently attended the first half of a seminar on disruption in the tech industry and want to create a completely revolutionary e-commerce site. In classic tech postmodern fashion, you call your products widgets instead of a more meaningful identifier and it somehow fits.sql
Naturally, you realize the need for a reliable transactional datastore. Because of the new generation of hipsters, you’re probably going to pull traffic away from the main industry players just because you’re not them. You’re smart enough to foresee the scalability you need, so you choose Vitess as your best scaling solution.json
Before we get started, let’s get a few things out of the way.app
minikube start --cpus=4 --memory=5000
. Note the additional resource requirements. In order to go through all the use cases, many vttablet and mysql instances will be launched. These require more resources than the defaults used by minikube.helm init
apt-get install mysql-client
go get vitess.io/vitess/go/cmd/vtctlclient
$GOPATH/bin/
So you searched keyspace on Google and got a bunch of stuff about NoSQL… what’s the deal? It took a few hours, but after diving through the ancient Vitess scrolls you figure out that in the NewSQL world, keyspaces and databases are essentially the same thing when unsharded. Finally, it’s time to get started.ide
Change to the helm example directory:post
cd examples/helm
In this directory, you will see a group of yaml files. The first digit of each file name indicates the phase of example. The next two digits indicate the order in which to execute them. For example, ‘101_initial_cluster.yaml’ is the first file of the first phase. We shall execute that now:ui
helm install ../../helm/vitess -f 101_initial_cluster.yaml
This will bring up the initial Vitess cluster with a single keyspace.
Once successful, you should see the following state:
~/...vitess/helm/vitess/templates> kubectl get pods,jobs NAME READY STATUS RESTARTS AGE po/etcd-global-2cwwqfkf8d 1/1 Running 0 14m po/etcd-operator-9db58db94-25crx 1/1 Running 0 15m po/etcd-zone1-btv8p7pxsg 1/1 Running 0 14m po/vtctld-55c47c8b6c-5v82t 1/1 Running 1 14m po/vtgate-zone1-569f7b64b4-zkxgp 1/1 Running 2 14m po/zone1-commerce-0-rdonly-0 6/6 Running 0 14m po/zone1-commerce-0-replica-0 6/6 Running 0 14m po/zone1-commerce-0-replica-1 6/6 Running 0 14m NAME DESIRED SUCCESSFUL AGE jobs/commerce-apply-schema-initial 1 1 14m jobs/commerce-apply-vschema-initial 1 1 14m jobs/zone1-commerce-0-init-shard-master 1 1 14m
If you have installed the mysql client, you should now be able to connect to the cluster using the following command:
~/...vitess/examples/helm> ./kmysql.sh mysql> show tables; +--------------------+ | Tables_in_commerce | +--------------------+ | corder | | customer | | product | +--------------------+ 3 rows in set (0.01 sec)
You can also browse to the vtctld console using the following command (Ubuntu):
./kvtctld.sh
The helm example is based on the values.yaml
file provided as the default helm chart for Vitess. The following overrides have been performed in order to run under minikube:
resources
: have been nulled out. This instructs the Kubernetes environment to use whatever is available. Note, this is not recommended for a production environment. In such cases, you should start with the baseline values provided in helm/vitess/values.yaml
and iterate from those.mysqlProtocol.authType
is set to none
. This should be changed to secret
and the credentials should be stored as Kubernetes secrets.NodePort
is not recommended in production. You may choose not to expose these end points to anyone outside Kubernetes at all. Another option is to create Ingress controllers.The helm chart specifies a single unsharded keyspace: commerce
. Unsharded keyspaces have a single shard named 0
.
NOTE: keyspace/shards are global entities of a cluster, independent of a cell. Ideally, you should list the keyspace/shards separately. For a cell, you should only have to specify which of those keyspace/shards are deployed in that cell. However, for simplicity, the existence of keyspace/shards are implicitly inferred from the fact that they are mentioned under each cell.
In this deployment, we are requesting two replica
type tables and one rdonly
type tablet. When deployed, one of the replica
tablet types will automatically be elected as master. In the vtctld console, you should see one master
, one replica
and one rdonly
vttablets.
The purpose of a replica tablet is for serving OLTP read traffic, whereas rdonly tablets are for serving analytics, or performing cluster maintenance operations like backups, or resharding. rdonly replicas are allowed to lag far behind the master because replication needs to be stopped to perform some of these functions.
In our use case, we are provisioning one rdonly replica per shard in order to perform resharding operations.
create table product( sku varbinary(128), description varbinary(128), price bigint, primary key(sku) ); create table customer( customer_id bigint not null auto_increment, email varbinary(128), primary key(customer_id) ); create table corder( order_id bigint not null auto_increment, customer_id bigint, sku varbinary(128), price bigint, primary key(order_id) );
The schema has been simplified to include only those fields that are significant to the example:
product
table contains the product information for all of the products.customer
table has a customer_id that has an auto-increment. A typical customer table would have a lot more columns, and sometimes additional detail tables.corder
table (named so because order
is an SQL reserved word) has an order_id auto-increment column. It also has foreign keys into customer(customer_id) and product(sku).Since Vitess is a distributed system, a VSchema (Vitess schema) is usually required to describe how the keyspaces are organized.
{ "tables": { "product": {}, "customer": {}, "corder": {} } }
With a single unsharded keyspace, the VSchema is very simple; it just lists all the tables in that keyspace.
NOTE: In the case of a single unsharded keyspace, a VSchema is not strictly necessary because Vitess knows that there are no other keyspaces, and will therefore redirect all queries to the only one present.
Due to a massive ingress of free-trade, single-origin yerba mate merchants to your website, hipsters are swarming to buy stuff from you. As more users flock to your website and app, the customer
and corder
tables start growing at an alarming rate. To keep up, you’ll want to separate those tables by moving customer
and corder
to their own keyspace. Since you only have as many products as there are types of yerba mate, you won’t need to shard the product table!
Let us add some data into our tables to illustrate how the vertical split works.
./kmysql.sh < ../common/insert_commerce_data.sql
Doc: