Nomad 服務編排

Nomad 服務編排

Nomad 是一個管理機器集羣並在集羣上運行應用程序的工具。

快速入門

環境準備

參考以前的一篇《Consul 搭建集羣》準備三臺虛機。node

ip
n1 172.20.20.10
n2 172.20.20.11
n3 172.20.20.12

單機安裝

登陸到虛機n1,切換用戶到rootlinux

» vagrant ssh n1                                        
su [vagrant@n1 ~]$ su
Password:
[root@n1 vagrant]#

安裝一些依賴的工具nginx

[root@n1 vagrant]# yum install -y epel-release
[root@n1 vagrant]# yum install -y jq
[root@n1 vagrant]# yum install -y unzip

下載0.8.1版本到/tmp目錄下redis

最新的0.8.3版本和consul結合會有反覆註冊服務的bug,這裏使用0.8.1
[root@n1 vagrant]# cd /tmp/
[root@n1 vagrant]# curl -s https://releases.hashicorp.com/nomad/0.8.1/nomad_0.8.1_linux_amd64.zip -o nomad.zip

解壓,並賦予nomad可執行權限,最後把nomad移動到/usr/bin/下docker

[root@n1 vagrant]# unzip nomad.zip
[root@n1 vagrant]# chmod +x nomad
[root@n1 vagrant]# mv nomad /usr/bin/nomad

檢查nomad是否安裝成功bootstrap

[root@n1 vagrant]# nomad
Usage: nomad [-version] [-help] [-autocomplete-(un)install] <command> [args]

Common commands:
    run         Run a new job or update an existing job
    stop        Stop a running job
    status      Display the status output for a resource
    alloc       Interact with allocations
    job         Interact with jobs
    node        Interact with nodes
    agent       Runs a Nomad agent

Other commands:
    acl             Interact with ACL policies and tokens
    agent-info      Display status information about the local agent
    deployment      Interact with deployments
    eval            Interact with evaluations
    namespace       Interact with namespaces
    operator        Provides cluster-level tools for Nomad operators
    quota           Interact with quotas
    sentinel        Interact with Sentinel policies
    server          Interact with servers
    ui              Open the Nomad Web UI
    version         Prints the Nomad version

出現如上所示表明安裝成功。api

批量安裝

參考以前的一篇《Consul 搭建集羣》批量安裝這一節。瀏覽器

使用以下腳本可批量安裝nomad,並同時爲每一個虛機安裝好docker。ssh

$script = <<SCRIPT

echo "Installing dependencies ..."
yum install -y epel-release
yum install -y net-tools
yum install -y wget
yum install -y jq
yum install -y unzip
yum install -y bind-utils

echo "Determining Consul version to install ..."
CHECKPOINT_URL="https://checkpoint-api.hashicorp.com/v1/check"
if [ -z "$CONSUL_DEMO_VERSION" ]; then
    CONSUL_DEMO_VERSION=$(curl -s "${CHECKPOINT_URL}"/consul | jq .current_version | tr -d '"')
fi

echo "Fetching Consul version ${CONSUL_DEMO_VERSION} ..."
cd /tmp/
curl -s https://releases.hashicorp.com/consul/${CONSUL_DEMO_VERSION}/consul_${CONSUL_DEMO_VERSION}_linux_amd64.zip -o consul.zip

echo "Installing Consul version ${CONSUL_DEMO_VERSION} ..."
unzip consul.zip
sudo chmod +x consul
sudo mv consul /usr/bin/consul

sudo mkdir /etc/consul.d
sudo chmod a+w /etc/consul.d

echo "Determining Nomad 0.8.1 to install ..."
#CHECKPOINT_URL="https://checkpoint-api.hashicorp.com/v1/check"
#if [ -z "$NOMAD_DEMO_VERSION" ]; then
#    NOMAD_DEMO_VERSION=$(curl -s "${CHECKPOINT_URL}"/nomad | jq .current_version | tr -d '"')
#fi

echo "Fetching Nomad version ${NOMAD_DEMO_VERSION} ..."
cd /tmp/
curl -s https://releases.hashicorp.com/nomad/0.8.1/nomad_0.8.1_linux_amd64.zip -o nomad.zip

echo "Installing Nomad version 0.8.1 ..."
unzip nomad.zip
sudo chmod +x nomad
sudo mv nomad /usr/bin/nomad

echo "Installing nginx ..."
#yum install -y nginx

echo "Installing docker ..."
yum install -y docker

SCRIPT

啓動 Agent

首先啓動consul組成一個集羣,具體參考《Consul 搭建集羣》。若是用默認的配置,nomad啓動後會檢測本機的Consul並自動的講nomad服務註冊。curl

n1

[root@n1 vagrant]# consul agent -server -bootstrap-expect 3 -data-dir /etc/consul.d -node=node1 -bind=172.20.20.10 -ui -client 0.0.0.0

n2

[root@n2 vagrant]# consul agent -server -bootstrap-expect 3 -data-dir /etc/consul.d -node=node2 -bind=172.20.20.11 -ui -client 0.0.0.0 -join 172.20.20.10

n3

[root@n3 vagrant]# consul agent -server -bootstrap-expect 3 -data-dir /etc/consul.d -node=node3 -bind=172.20.20.12 -ui -client 0.0.0.0 -join 172.20.20.10
[root@n1 vagrant]# consul members
Node   Address            Status  Type    Build  Protocol  DC   Segment
node1  172.20.20.10:8301  alive   server  1.1.0  2         dc1  <all>
node2  172.20.20.11:8301  alive   server  1.1.0  2         dc1  <all>
node3  172.20.20.12:8301  alive   server  1.1.0  2         dc1  <all>

基本概念

  • server 分配提交的job
  • clinet 執行job任務

啓動server

定義server的配置文件server.hcl

log_level = "DEBUG"

bind_addr = "0.0.0.0"

data_dir = "/home/vagrant/data_server"

name = "server1"

advertise {
  http = "172.20.20.10:4646"
  rpc = "172.20.20.10:4647"
  serf = "172.20.20.10:4648"
}

server {
  enabled = true
  # Self-elect, should be 3 or 5 for production
  bootstrap_expect = 3
}

在命令行中執行

[root@n1 vagrant]# nomad agent -config=server.hcl

進入到n2,n3 執行

nomad agent -config=server.hcl

打開瀏覽器 http://172.20.20.10:8500/ui/#/dc1/services
從consul中能看到nomad都以啓動
WX20180613-161551@2x

再打開nomad自帶的UI http://172.20.20.10:4646/ui/servers
能夠看到server都已運行
WX20180613-161611@2x

啓動client

在啓動client以前須要先啓動docker,client執行job須要用到docker。

[root@n1 vagrant]# systemctl start docker

在n2,n3 也須要啓動

定義client的配置文件client.hcl

log_level = "DEBUG"
data_dir = "/home/vagrant/data_clinet"
name = "client1"
advertise {
  http = "172.20.20.10:4646"
  rpc = "172.20.20.10:4647"
  serf = "172.20.20.10:4648"
}
client {
  enabled = true
  servers = ["172.20.20.10:4647"]
}

ports {
  http = 5656
}

在n1中輸入命令

[root@n1 vagrant]# nomad agent -config=client.hcl

打開瀏覽器 http://172.20.20.10:8500/ui/#/dc1/services/nomad-client
WX20180613-162159@2x

能夠看到nomad-client已經啓動成功,同理在n2,n3也運行client。

最終顯示以下
WX20180613-162351@2x
WX20180613-162409@2x

運行 Job

進入到n2,新建一個文件夾job,運行nomad init

[root@n2 vagrant]# mkdir job
[root@n2 vagrant]# cd job/
[root@n2 job]# nomad init
Example job file written to example.nomad

以上命令新建了一個example的Job

命令行鍵入

[root@n2 job]# nomad run example.nomad
==> Monitoring evaluation "97f8a1fe"
    Evaluation triggered by job "example"
    Evaluation within deployment: "3c89e74a"
    Allocation "47bf1f20" created: node "9df69026", group "cache"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "97f8a1fe" finished with status "complete"

能夠看到節點爲9df69026的client去執行了Job
WX20180613-164459@2x
WX20180613-164535@2x

進階操做

集羣成員

[root@n1 vagrant]# nomad server members
Name            Address       Port  Status  Leader  Protocol  Build  Datacenter  Region
server1.global  172.20.20.10  4648  alive   false   2         0.8.1  dc1         global
server2.global  172.20.20.11  4648  alive   false   2         0.8.1  dc1         global
server3.global  172.20.20.12  4648  alive   true    2         0.8.1  dc1         global

查詢 Job 狀態

[root@n1 vagrant]# nomad status example
ID            = example
Name          = example
Submit Date   = 2018-06-13T08:42:57Z
Type          = service
Priority      = 50
Datacenters   = dc1
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
cache       0       0         1        0       0         0

Latest Deployment
ID          = 3c89e74a
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy
cache       1        1       1        0

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
47bf1f20  9df69026  cache       0        run      running  8m44s ago  8m26s ago

修改 Job

編輯 example.nomad 找到 count = 1 修改成 count = 3

在命令行中查看Job的變動計劃

[root@n2 job]# nomad plan example.nomad
+/- Job: "example"
+/- Task Group: "cache" (2 create, 1 in-place update)
  +/- Count: "1" => "3" (forces create)
      Task: "redis"

Scheduler dry-run:
- All tasks successfully allocated.

Job Modify Index: 70
To submit the job with version verification run:

nomad job run -check-index 70 example.nomad

When running the job with the check-index flag, the job will only be run if the
server side version matches the job modify index returned. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.

執行Job的變動任務

[root@n2 job]# nomad job run -check-index 70 example.nomad
==> Monitoring evaluation "3a0ff5e0"
    Evaluation triggered by job "example"
    Evaluation within deployment: "2b5b803f"
    Allocation "34086acb" created: node "6166e031", group "cache"
    Allocation "4d01cd92" created: node "f97b5095", group "cache"
    Allocation "47bf1f20" modified: node "9df69026", group "cache"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "3a0ff5e0" finished with status "complete"

能夠看到又多了兩個client節點去執行Job任務

在瀏覽器中能夠看到一共有3個實例
WX20180613-170029@2x
同時也能看到Job的版本記錄
WX20180613-170103@2x

[root@n2 job]# nomad status example
ID            = example
Name          = example
Submit Date   = 2018-06-13T08:56:03Z
Type          = service
Priority      = 50
Datacenters   = dc1
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
cache       0       0         3        0       0         0

Latest Deployment
ID          = 2b5b803f
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy
cache       3        3       3        0

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created     Modified
34086acb  6166e031  cache       1        run      running  3m38s ago   3m25s ago
4d01cd92  f97b5095  cache       1        run      running  3m38s ago   3m26s ago
47bf1f20  9df69026  cache       1        run      running  16m43s ago  3m27s ago

離開集羣

首先中止n1的nomad server,Ctrl-C
在n2上查詢members

[root@n2 job]# nomad server members
Name            Address       Port  Status  Leader  Protocol  Build  Datacenter  Region
server1.global  172.20.20.10  4648  failed  false   2         0.8.1  dc1         global
server2.global  172.20.20.11  4648  alive   true    2         0.8.1  dc1         global
server3.global  172.20.20.12  4648  alive   false   2         0.8.1  dc1         global

server1 的狀態爲 failed,此時將server1 移出集羣

[root@n2 job]# nomad server force-leave server1.global
[root@n2 job]# nomad server members
Name            Address       Port  Status  Leader  Protocol  Build  Datacenter  Region
server1.global  172.20.20.10  4648  left    false   2         0.8.1  dc1         global
server2.global  172.20.20.11  4648  alive   true    2         0.8.1  dc1         global
server3.global  172.20.20.12  4648  alive   false   2         0.8.1  dc1         global

server1的狀態爲left,移出集羣成功。

相關文章
相關標籤/搜索