在 Azure CentOS VM 中配置 SQL Server 2019 AG - (上)

前文

  • 假定您對Azure和SQL Server HA具備基礎知識
  • 假定您對Azure Cli具備基礎知識
  • 目標是在Azure Linux VM上建立一個具備三個副本的可用性組,並實現偵聽器和Fencing配置

環境

  • SQL Server 2019 Developer on Linux
  • Azure VM Fencing agent
  • Azure Cli實現部分配置
  • CentOS 7.7 Azure VM,分別SQL19N1,SQL19N2,SQL19N3,位於同一VNet

步驟

  • 爲VM建立資源組和可用性集

# 中國東部2建立資源組
az group create --name SQL-DEMO-RG --location chinaeast2

# 建立用於VM人Availability Set,配置2個容錯域,2個更新域
az vm availability-set create \
    --resource-group SQL-DEMO-RG \
    --name AGLinux-AvailabilitySet \
    --platform-fault-domain-count 2 \
    --platform-update-domain-count 2
  • 使用Template部署3臺VM

第一次建立VM時,會生成template,而後下載保存下,修改其中的參數值後,就能夠方便地建立配置相似的VM。VM的配置主要有:node

  • 使用前面的可用性集
  • 使用同一個子網
  • IP使用Standard
  • SSH public key配置

模板和參數文件太長,就不展現了。能夠在Azure Portal上自行獲取。sql

# 以下是SQL19N2的配置,修改參數文件後,直接能夠用於建立SQL19N3
templateFile="./templateFile"
paramFile="./vmParams-sql19n2.json"
az deployment group validate --name sql19n2vm \
     -g SQL-DEMO-RG --template-file $templateFile --parameters $paramFile
  • 配置VM使用固定內網IP和公網DNS Label

三臺VM都須要修改配置,以下只是一臺的配置示例shell

# 找出nic和IP的信息
az network nic list -g SQL-DEMO-RG --query "[].{nicName:name,configuration:ipConfigurations[].{ipName:name,ip:privateIpAddress,method:privateIpAllocationMethod}}" -o yaml

# 修改privateIpAllocationMethod爲Static
az network nic ip-config update -g SQL-DEMO-RG --nic-name sql19n1152 --name ipconfig1 --set privateIpAllocationMethod=Static

# 找出pbulic ip名稱
az network public-ip list -g SQL-DEMO-RG --query "[].name" -o tsv

# 配置Public IP的DNS name,只能使用數字和小寫字母
az network public-ip update -g SQL-DEMO-RG -n SQL19N1ip851 --dns-name sql19n1
  • 安裝HA相關軟件包

最好先更新一下系統的軟件包,再安裝HA相關軟件。數據庫

yum update -y
yum install -y pacemaker pcs fence-agents-all resource-agents fence-agents-azure-arm
reboot
  • 爲羣集和SQL Server開放防火牆端口

# Pacemaker和Corosync的端口
# TCP: Ports 2224,3121,21064,5405
# UDP: Port 5405
firewall-cmd --add-port=2224/tcp --permanent
firewall-cmd --add-port=2224/tcp --permanent
firewall-cmd --add-port=21064/tcp --permanent
firewall-cmd --add-port=5405/tcp --permanent
firewall-cmd --add-port=5405/udp --permanent

# SQL Server端口和AG鏡像端口
# TCP: 1433,5022
firewall-cmd --add-port=1433/tcp --permanent
firewall-cmd --add-port=5022/tcp --permanent
firewall-cmd --reload
  • 添加hosts記錄

vi /etc/hosts
172.17.2.8      SQL19N1
172.17.2.9      SQL19N2
172.17.2.10     SQL19N3
  • 建立Pacemaker羣集

# 設置Pacemaker的默認用戶密碼,三臺VM上
passwd hacluster

# 設置pacemaker和pcsd自啓動在三臺VM上
systemctl enable pcsd
systemctl start pcsd
systemctl enable pacemaker

# 建立羣集,在master節點
sudo pcs cluster auth SQL19N1 SQL19N2 SQL19N3 -u hacluster 
sudo pcs cluster setup --name agcluster SQL19N1 SQL19N2 SQL19N3 --token 30000 --force
sudo pcs cluster start --all
sudo pcs cluster enable --all
# 查看羣集狀態
pcs status
# 在三個節點上修改quorum的expected-votes爲3,其實三節點羣集默認爲3
# 設置表示,羣集存活須要3票,這個修改隻影響當前running的羣集,不會變成羣集的永久性配置保存下來
pcs quorum expected-votes 3
  • 在Azure上爲Fencing Agent配置Service Principal

# 1. 建立 AAD App,成功後記錄下相應的appID
 az ad app create --display-name sqldemorg-app --identifier-uris http://localhost \ 
 --password "1qaz@WSX3edc" --end-date '2030-04-27' --credential-description "sql19 ag secret"
 
# 2. 建立AAD App的Service Principal
az ad sp create --id <appID>

# 3. 將Service Principal分配到VM對應的管理role,對每一個VM都要執行
# 我這裏分配的是Owner role,這不是安全的作法。應該使用自定義一個role,只給最小權限
# 自定義role須要Azure訂閱是PP1或者PP2級別

az role assignment create --assignee <appID> --role owner \
--scope /subscriptions/<subscription-ID>/resourceGroups/<resourceGroup-Name>/providers/Microsoft.Compute/virtualMachines/SQL19N1
  • 建立Azure的STONITH 設備

我使用的是Azure China,因此須要指定cloud=china,若是使用global Azure不須要指定此參數。
執行 fence_azure_arm -h,查看此資源代理的更多幫助信息json

pcs property set stonith-timeout=900
pcs stonith create rsc_st_azure fence_azure_arm login="<ApplicationID>" passwd="<servicePrincipalPassword>" resourceGroup="<resourceGroupName>" tenantId="<tenantID>" subscriptionId="<subscriptionId>" power_timeout=240 pcmk_reboot_timeout=900 cloud=china
  • 安裝SQL 2019及工具

# 安裝 SQL 2019和HA 資源代理
sudo curl -o /etc/yum.repos.d/mssql-server.repo https://packages.microsoft.com/config/rhel/7/mssql-server-2019.repo
sudo yum install -y mssql-server
sudo /opt/mssql/bin/mssql-conf setup
sudo yum install mssql-server-ha

# 安裝 mssql-tools
sudo curl -o /etc/yum.repos.d/msprod.repo https://packages.microsoft.com/config/rhel/7/prod.repo
sudo yum install -y mssql-tools unixODBC-devel
# 將mssql-tools目錄加入到aPATH,方便使用
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc
source ~/.bashrc

# 安裝 mssql-cli
sudo rpm --import https://packages.microsoft.com/keys/microsoft.asc
sudo curl -o /etc/yum.repos.d/mssql-cli.repo https://packages.microsoft.com/config/rhel/7/prod.repo
sudo yum install mssql-cli

# 查看SQL 狀態
systemctl status mssql-server

若是您熟悉 SQL Server相關的PowerShell,建議將PowerShell也安裝上,並安裝SQLServer module。對SQL Server的配置,使用PowerShell會方便不少數組

yum install powershell -y
pwsh
Install-Module SQLServer
# 查看SQL相關的命令
Get-Command -Module SQLServer
  • 配置AG

  • 建立PowerShell 函數方便後續執行T-SQL
# 打開PowerShell的 profile文件,若是不存在則須要建立
vi /root/.config/powershell/Microsoft.PowerShell_profile.ps1

# 將以下函數加入 到 profile文件中,每次打開pwsh時就能夠直接調用
# 函數有兩個參數,$sql表示須要執行的T-SQL,最好使用here-string以免字符轉義問題
# $servers表示目標實例,數組類型。默認值爲當前環境中的三個實例
function run-sql ($sql,$servers=("SQL19N1","SQL19N2","SQL19N3"))
{
        $secpasswd = "1qaz@WSX"|ConvertTo-SecureString -AsPlainText -Force
        $cred=New-Object System.Management.Automation.PSCredential -ArgumentList 'sa', $secpasswd
        $sql
        "---------"
        foreach($svr in $servers) {"Running T-SQL on $svr..."; Invoke-Sqlcmd -ServerInstance $svr -Credential $cred -Query $sql}
}

  • 啓用 hadr功能,每一個實例
sudo /opt/mssql/bin/mssql-conf set hadr.hadrenabled 1
sudo systemctl restart mssql-server
  • 啓動AG extened event session
# T-SQL,每一個實例
ALTER EVENT SESSION  AlwaysOn_health ON SERVER WITH (STARTUP_STATE=ON);
GO
  • 在主副本實例上建立證書,這個證書用於驗證Mirroring endpoint通訊。將證書和私鑰複製到其它節點上的相同的目錄位置。授予mssql用戶訪問權限
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '1qaz@WSX';
GO
CREATE CERTIFICATE dbm_certificate WITH SUBJECT = 'dbm';
GO
BACKUP CERTIFICATE dbm_certificate
   TO FILE = '/var/opt/mssql/data/dbm_certificate.cer'
   WITH PRIVATE KEY (
           FILE = '/var/opt/mssql/data/dbm_certificate.pvk',
           ENCRYPTION BY PASSWORD = '1qaz@WSX'
       );
# 複製證書和私鑰到輔助副本主機SQL19N2和SQL19N3
cd /var/opt/mssql/data
scp dbm_certificate.* root@SQL19N2:/var/opt/mssql/data/
scp dbm_certificate.* root@SQL19N3:/var/opt/mssql/data/

# 輔助副本節點上修改權限
cd /var/opt/mssql/data
chown mssql:mssql dbm_certificate.*
  • 在輔助副本實例中建立master key並導入證書
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '1qaz@WSX';
GO
CREATE CERTIFICATE dbm_certificate
    FROM FILE = '/var/opt/mssql/data/dbm_certificate.cer'
    WITH PRIVATE KEY (
    FILE = '/var/opt/mssql/data/dbm_certificate.pvk',
    DECRYPTION BY PASSWORD = '1qaz@WSX'
            );
  • 建立AG的鏡像端口,注意防火牆和NSG配置端口例外
CREATE ENDPOINT [Hadr_endpoint]
    AS TCP (LISTENER_PORT = 5022)
    FOR DATABASE_MIRRORING (
	    ROLE = ALL,
	    AUTHENTICATION = CERTIFICATE dbm_certificate,
		ENCRYPTION = REQUIRED ALGORITHM AES
		);
GO
ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;
  • 建立三個副本,同步模式的AG,主副本實例上執行
CREATE AVAILABILITY GROUP [ag1]
     WITH (DB_FAILOVER = ON, CLUSTER_TYPE = EXTERNAL)
     FOR REPLICA ON
         N'SQL19N1' 
 	      	WITH (
  	       ENDPOINT_URL = N'tcp://SQL19N1:5022',
  	       AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
  	       FAILOVER_MODE = EXTERNAL,
  	       SEEDING_MODE = AUTOMATIC,
  	       SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)
  	       ),
         N'SQL19N2' 
  	    WITH ( 
  	       ENDPOINT_URL = N'tcp://SQL19N2:5022', 
  	       AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
  	       FAILOVER_MODE = EXTERNAL,
  	       SEEDING_MODE = AUTOMATIC,
  	       SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)
  	       ),
  	   N'SQL19N3'
         WITH( 
  	      ENDPOINT_URL = N'tcp://SQL19N3:5022', 
  	      AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
  	      FAILOVER_MODE = EXTERNAL,
  	      SEEDING_MODE = AUTOMATIC,
  	      SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)
  	      );
GO
ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE;
GO
  • 爲Pacemaker建立sql登陸並受權,每一個實例
USE [master]
GO
CREATE LOGIN [pacemakerLogin] with PASSWORD= N'1qaz@WSX'
go
ALTER SERVER ROLE [sysadmin] ADD MEMBER [pacemakerLogin];
GO
  • 將pacemaker的login信息保存到本地文件
echo "pacemakerLogin" >> /var/opt/mssql/secrets/passwd
echo "1qaz@WSX" >> /var/opt/mssql/secrets/passwd

# 只容許root讀取
chown root:root /var/opt/mssql/secrets/passwd
chmod 400 /var/opt/mssql/secrets/passwd
  • 將輔助副本加入到AG, 輔助副本執行
ALTER AVAILABILITY GROUP [ag1] JOIN WITH (CLUSTER_TYPE = EXTERNAL);
GO
# auto_seeding功能須要的權限
ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE;
GO
  • 若是您不但願pacemakerLogin具備sysadmin的權限,能夠將之從sysadmin中移除,並授予以下權限。每一個實例
ALTER SERVER ROLE [sysadmin] DROP MEMBER [pacemakerLogin]
GO
GRANT ALTER, CONTROL, VIEW DEFINITION ON AVAILABILITY GROUP::ag1 TO pacemakerLogin;
GO
GRANT VIEW SERVER STATE TO pacemakerLogin;
GO
  • 添加數據庫到AG,主副本執行
CREATE DATABASE [db1];
GO
ALTER DATABASE [db1] SET RECOVERY FULL;
GO
BACKUP DATABASE [db1]
   TO DISK = N'nul';
GO
ALTER AVAILABILITY GROUP [ag1] ADD DATABASE [db1];
GO
  • 可用性數據庫狀態
SELECT * FROM sys.databases WHERE name = 'db1';
GO
SELECT DB_NAME(database_id) AS 'database', synchronization_state_desc FROM sys.dm_hadr_database_replica_states;
  • 在Pacemaker羣集中配置AG


  • 建立AG資源,ag_name要指定爲以前建立AG名稱
pcs resource create agcluster ocf:mssql:ag ag_name=ag1 meta failure-timeout=30s master notify=true
  • 建立虛擬IP資源
# 禁用fencing
pcs property set stonith-enabled=false

# 建立VIP
pcs resource create virtualip ocf:heartbeat:IPaddr2 ip=172.17.2.7
  • 建立 colacation constraint,vip和master必需在同一個節點上啓動
pcs constraint colocation add virtualip agcluster-master INFINITY with-rsc-role=Master
  • 建立 ordering constraint,vip要先於master副本資源啓動
pcs constraint order promote agcluster-master then start virtualip

# 查看當前的約束
pcs constraint show --full
  • 從新啓用STONITH並查看羣集狀態
pcs property set stonith-enabled=true
pcs status
# 個人環境中的狀態信息
---------------------------------------
Cluster name: agcluster
Stack: corosync
Current DC: SQL19N3 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Wed Apr 29 04:24:50 2020
Last change: Wed Apr 29 04:24:45 2020 by root via cibadmin on SQL19N1

3 nodes configured
5 resources configured

Online: [ SQL19N1 SQL19N2 SQL19N3 ]

Full list of resources:

 rsc_st_azure   (stonith:fence_azure_arm):      Started SQL19N1
 Master/Slave Set: agcluster-master [agcluster]
     Masters: [ SQL19N1 ]
     Slaves: [ SQL19N2 SQL19N3 ]
 virtualip      (ocf::heartbeat:IPaddr2):       Started SQL19N1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
  • 測試Failover和Fencing
# 手動failover
pcs resource move agcluster-master SQL19N2 --master
pcs status

# 手動 failover會生成一個constraint,避免AG資源再回到原來的節點
# 若是但願AG後續還能 failover回來,須要手動刪除之
pcs constraint show --full
pcs constraint remove cli-prefer-agcluster-master

# 嘗試Fencing羣集節點,每一個節點都試一下
# 以下命令的fencing只是重啓node,若是要關閉node,使用--off參數
pcs stonith fence SQL19N3 --debug
相關文章
相關標籤/搜索