原文地址:http://highscalability.com/blog/2019/4/8/from-bare-metal-to-kubernetes.htmlhtml
This is a guest post by Hugues Alary, Lead Engineer at Betabrand, a retail clothing company and crowdfunding platform, based in San Francisco. This article was originally published here.前端
retail :零售
crowdfunding :羣衆募資
這是Hugues Alary 寫的一篇客座博文(客座博文是什麼?),Hugues Alary是Betabrand的首席工程師。Betabrand是一家位於舊金山的衣服零售公司,也是一家衆籌平臺。這篇文章就是在這裏發出的。node
How migrating Betabrand's bare-metal infrastructure to a Kubernetes cluster hosted on Google Container Engine solved many engineering issues—from hardware failures, to lack of scalability of our production services, complex configuration management and highly heterogeneous development-staging-production environments—and allowed us to achieve a reliable, available and scalable infrastructure.python
This post will walk you through the many infrastructure changes and challenges Betabrand met from 2011 to 2018.linux
migrating :遷移
bare :adj. 光禿禿的, 無遮蔽的,赤裸,恰好夠的, 勉強 vt. 使赤裸, 使露出, 使暴露
metal :n. 金屬
infrastructure :基礎設施,基礎結構
Kubernetes cluster :K8s 集羣
failures :n. 失敗;故障;失敗者;破產
scalability :n. 可量測性,可伸縮性
heterogeneous :adj. 多種多樣的;混雜的
staging :n. 分段運輸;腳手架;上演;乘驛馬車的旅行,v. 表演;展示;分階段進行;籌劃(stage的ing形式)
Betabrand的主機設施是如何一步步遷移到K8s 集羣的,K8s 集羣是一種能夠幫咱們解決各類工程問題,好比軟件運行故障,生產服務可擴展性弱,複雜的配置管理,提供高異質性的 開發環境 - 測試環境 - 生產環境,爲咱們實現一個可靠的、可用、可擴展的虛擬主機。git
這篇博客將會帶你走過從2011年到2018年,Betabrand在虛擬主機的遷移過程當中遇到的許許多多的改變和挑戰。github
Betabrand’s infrastructure has changed many times over the course of the 7 years I’ve worked here.web
In 2011, the year our CTO hired me, the website was hosted on a shared server with a Plesk interface and no root access (of course). Every newsletter send—to at most a few hundred people—would bring the website to its knees and make it crawl, even completely unresponsive at times.redis
My first order of business became finding a replacement and move the website to its own dedicated server.docker
course :n. 課程 進程, 過程航向, 航線 一道菜
newsletter :時事通信;業務通信,內部通信;新聞信札
dedicated :專用的,專一的
After a few days of online research, we settled on a VPS—8GB RAM, 320GO Disk, 4 Virtual CPU, 150Mbps of bandwidth—at Rackspace . A few more days and we were live on our new infrastructure composed of… 1 server; running your typical Linux, Apache, PHP, MySQL stack, with a hint of Memcached.
Unsurprisingly, this infrastructure quickly became obsolete.
Not only didn’t it scale at all but, more importantly, every part of it was a Single Point Of Failure. Apache down? Website down. Rackspace instance down? Website down. MySQL down… you get the idea.
Another aspect of it was its cost.
Our average monthly bill quickly climbed over $1,000. Which was quite a price tag for a single machine and the—low—amount of traffic we generated at the time.
After a couple years running this stack, mid-2013, I decided it was time to make our website more scalable, redundant, but also more cost effective.
I estimated, we needed a minimum of 3 servers to make our website somewhat redundant which would amount to a whopping $14400/year at Rackspace. Being a really small startup, we couldn’t justify that "high" of an infrastructure bill; I kept looking.
The cheapest option ended up to be running our stack on bare-metal servers.
Rackspace :全球三大雲計算中心之一,1998年成立,是一家全球領先的託管服務器及雲計算提供商,公司總部位於美國,在英國,澳大利亞,瑞士,荷蘭及香港設有分部
settled :adj. 固定的;穩定的;v. 解決;定居(settle的過去分詞)
settled on :決定
composed :adj. 鎮靜的,沉着的,vt. 組成, 構成
typical :adj. 典型的;特有的;象徵性的
Unsurprisingly :不出所料
obsolete :adj. 老式的;廢棄的n. 廢詞;陳腐的人vt. 廢棄;淘汰
scale :n. 刻度;比例;數值範圍;天平;規模;鱗
aspect :n. 方面
generate :發生
redundant :adj. 因人員過剩而被解僱的;不須要的; 多餘的,在這裏,做者應該是想要添加備份服務器,讓系統能夠更穩定
estimate :vi. 估計,估價;n. 估計,估價;判斷,見解;vt. 估計,估量;判斷,評價
whopping :adj. 巨大的;天大的;adv. 很是地;異常地
justify :vt. 證實…有理; 爲…辯護
cheapest :最便宜的
I had worked in the past with OVH and had always been fairly satisfied (despite mixed reviews online). I estimated that running 3 servers at OVH would amount to $3240/year, almost 5 times less expensive than Rackspace.
Not only was OVH cheaper, but their servers were also 4 times more powerful than Rackspace’s: 32GB RAM, 8 CPUs, SSDs and unlimited bandwidth.
To top it off they had just opened a new datacenter in North America.
A few weeks later Betabrand.com was hosted at OVH in Beauharnois, Canada.
我在這裏工做的7年時間裏,Betabrand的虛擬主機建設更新換代過不少次。
在2011年,咱們的CTO把我招了進來,咱們的網站經過Plesk的接口部署在他們的共享服務器上,而且沒有root權限。每一條要發給幾百人的通信消息,都會讓網站變的脆弱不堪,像是在慢慢的爬行,有時候甚至會徹底沒有反應。所以,個人第一個任務就是找到能夠替代方案,把咱們的網站運行在他們的專用服務器上。
通過幾天的網絡搜索,咱們選中了Rackspace的一臺VPS(虛擬專用服務器)- 8GRAM,320G 硬盤,4核CPU,150M帶寬。沒過幾天,咱們就開始使用這個由一臺虛擬主機組成的服務器;在上面運行Linux,Apache, PHP, MySQL,還帶有一點Memcache緩存服務。
不出所料,這臺虛擬主機沒用到多久,就又變慢了。
不只僅是它的不可擴展性,更重要的是,每一部分都是單點故障,Apache掉了,網站就掉了,Rackspace掉了,網站也掉了,數據庫掉了.....ok,你已經明白了。
另外一個方面是他的花費。
咱們平均每月的帳單很快超過了$1000.這對於單臺機器來講是一個很是高的標價,並且咱們當時生成的流量不多。
在運行這一塊技術服務幾年後,2013 年年中,我決定是時候使咱們的網站更具可擴展性、冗餘性,並且更具成本效益。
Between 2013 and 2017, our hardware infrastructure went through a few architectural changes.
Towards the end of 2017, our stack was significantly larger than it used to be, both in terms of software and hardware.
Betabrand.com ran on 17 bare-metal servers:
2 HAProxy machines in charge of SSL Offloading configured as hot-standby
2 varnish-cache machines configured in a hot-standby load-balancing to our webservers
5 machines running Apache and PHP-FPM
2 redis servers, each running 2 separate instances of redis. 1 instance for some application caching, 1 instance for our PHP sessions
3 MariaDB servers configured as master-master, though used in a master-slave manner
3 Glusterd servers serving all our static assets
Each machine would otherwise run one or multiple processes like keepalived, Ganglia, Munin, logstash, exim, backup-manager, supervisord, sshd, fail2ban, prerender, rabbitmq and… docker.
However, while this infrastructure was very cheap, redundant and had no single point of failure, it still wasn’t scalable and was also much harder to maintain.
architectural :adj. 建築學的;建築上的;有關建築的符合建築法的
architectural changes :體系結構的更改
significantly :adv. 意味深長地,值得注目的
Offloading :卸載
hot-standby :熱備份
separate :vt. 使分離;使分居;使分開;vi. 分開;分居;隔開;adj. 分開的;單獨的
instances :實例
assets :n. 資產;有用的東西;有利條件;優勢
Varnish Cache 是一個web應用程序加速器,也是一個HTTP反向代理軟件
HAProxy是一個使用C語言編寫的自由及開放源代碼軟件,其提供高可用性、負載均衡,以及基於TCP和HTTP的應用程序代理
SSL網絡通訊提供安全及數據完整性的一種安全協議
在2013年到2017年之間,咱們的硬件架構通過了幾回體系結構上的改變。
到 2017 年末,咱們的技術棧在軟件和硬件方面都比過去大得多。
Betabrand.com運行在17臺裸機服務器上面。
Administering our server "fleet" now involved writing a set of Ansible scripts and maintaining them, which, despite Ansible being an amazing software, was no easy feat.
Even though it will make its best effort to get you there, Ansible doesn’t guarantee the state of your system.
fleet :adj. 快速的,敏捷的;n. 艦隊;小河;港灣
involved :adj. 捲入的;有關的;複雜的;v. 涉及;使參與
guarantee :vt. 保證; 擔保;n. 保證, 保障; 保證書; 保用期;擔保, 擔保人;擔保品, 抵押品
For example, running your Ansible scripts on a server fleet made of heterogeneous OSes (say debian 8 and debian 9) will bring all your machines to a state close to what you defined, but you will most likely end up with discrepancies; the first one being that you’re running on Debian 8 and Debian 9, but also software versions and configurations being different on some servers and others.
I searched quite often for an Ansible replacement, but never found better.
I looked into Puppet but found its learning curve too steep, and, from reading other people’s recipes, was taken aback by what seemed to be too many different ways of doing the same thing. Some people might think of this as flexibility, I see it as complexity.
SaltStack caught my eyes but also found it very hard to learn; despite their extensive, in depth documentation, their nomenclature choices (mine, pillar, salt, etc) never stuck with me; and it seemed to suffer the same issue as Puppet regarding complexity.
Nix package manager and NixOS sounded amazing, to the exception that I didn’t feel comfortable learning a whole new OS (I’ve been using Debian for years) and was worried that despite their huge package selection, I would eventually need packages not already available, which would then become something new to maintain.
Those are the only 3 I looked at but I’m sure there’s many other tools out there I’ve probably never heard of.
heterogeneous :adj. 多種多樣的;混雜的
Puppet :puppet是一個IT基礎設施自動化管理工具
curve :n. 曲線;彎曲;曲線球;曲線圖表
steep :adj. 陡峭的;誇大的;不合理的;急劇升降的
recipe :n. 烹飪法; 食譜;方法; 祕訣; 訣竅
flexibility :n. 柔韌性;機動性,靈活性
complexity :n. 複雜性,錯綜複雜的狀態
taken aback :吃了一驚
aback :adv. 向後;處於頂風位置;向後地
SaltStack 是一個服務器基礎架構集中化管理平臺
caught :v. 捕捉(catch的過去分詞)
extensive :adj. 廣闊的, 普遍的; 大量的, 大規模的
nomenclature :n. 命名法;術語
stuck :v. 刺(stick的過去式)adj. 不能動的;被卡住的
suffer :vt. 忍受;遭受;經歷;vi. 受損害;受痛苦;遭受,忍受;經驗
regarding :prep. (表示論及)關於; 至於; 就…而論
unix與類unix系統,統稱爲*nix。
exception :n. 例外
eventually :adv. 終於, 最後
Writing Ansible scripts and maintaining them, however, wasn’t our only issue; adding capacity was another one.
With bare-metal, it is impossible to add and remove capacity on the fly. You need to plan your needs well in advance: buy a machine—usually leased for a minimum of 1 month—wait for it to be ready—which can take from 2 minutes to 3 days--, install its base os, install Ansible’s dependencies (mainly python and a few other packages) then, finally, run your Ansible scripts against it.
For us this entire process was wholly unpractical and what usually happened is that we’d add capacity for an anticipated peak load, but never would remove it afterwards which in turn added to our costs.
It is worth noting, however, that even though having unused capacity in your infrastructure is akin to setting cash on fire, it is still a magnitude less expensive on bare-metal than in the cloud. On the other hand, the engineering headaches that come with using bare-metal servers simply shift the cost from purely material to administrative ones.
In our bare-metal setup capacity planning, server administration and Ansible scripting were just the tip of the iceberb.
capacity :n. 能力;容量;生產力;資格,地位
in advance :adv. 預先,提早
leased :adj. 租用的
entire :adj. 所有的,整個的;全體的
wholly :adv. 徹底地;所有;通通
unpractical :adj. 不切實際的;不實用的;不現實的;行不通的
anticipated :vt. 先於…行動,預期
peak :n. 頂點;山峯;最高點;帽舌;vt. 使達到最高點;使豎起;adj. 最高的;最大值的;vi. 消瘦;到達最高點;變憔悴
infrastructure :基礎設施
akin :adj. 同族的;同類的;相似的
magnitude :n. 巨大; 重要性
shift :n. 手段;移動;輪班;變化;vi. 移動;轉換;轉變;vt. 替換;轉移;改變
purely :adv. 純粹地;貞淑地;清潔地;徹底地;僅僅,只不過
iceberb :冰山
如今,爲了維護和管理咱們的服務「集羣」,咱們須要寫一套Ansible腳本,儘管,Ansible是一個神奇的軟件,可是這絕對不是一件簡單的事情。
即便,Ansible如今能夠帶給你最好的結果,可是他也不能總給你保證系統的狀態。
例如:在由不一樣的操做系統(這裏多是的debian8 和 debian9)組成的服務器集羣上運行咱們的Ansible腳本,可使咱們全部的機器達到近似咱們設定的狀態。可是,最終也是會有差別的,第一個就是在debian8 和 debian9上面運行,可是這些不一樣的服務器上的軟件的版本和配置都會有差別。
我常常搜索Ansible的替代軟件,可是,一直沒有找到比Ansible更好的。
我查到了Puppet,可是這個軟件的學習曲線太陡峭了。而且,在閱讀別人編寫的操做指南的時候,作一件相同的事情有好多種不一樣的操做方法,我真的很吃驚。有些人可能會認爲這事靈活性的體現,可是我以爲這讓它變得很複雜。
SaltStack這個軟件進入了個人實現,可是我發現它也很難學。儘管它有大量的,寫的很細緻的文檔。可是他的那些術語並無打動到我,而且,它好像和Puppet有一樣的-複雜性的毛病。
Nix包管理和Nix系統,看起來很不錯。例外的是,我有些反感學習一個新系統(我一直用Debian,而且用了許多年),而且,儘管它提供了大量可供選擇的包,最終我仍是沒有找到我須要的軟件包,這會變成一個新的須要管理的項目。
這僅僅是我找出來的3個軟件,可是我肯定,確定還有不少種我沒有據說過的工具沒有列出來。
然而,編寫和管理Ansibe腳本並非咱們惟一的問題。另外一個問題是,沒法對設備進行升級(提高性能或者容量)。
使用裸機的時候,沒法動態的對設備進行升級。你須要提早按照計劃規劃好你須要的性能:購買機器---首先須要先租用至少一個月---等它準備好---一般須要花費2分鐘到3天的時間---安裝系統---安裝Ansible的依賴環境(主要是python和其餘一些軟件包)---最後運行你的Ansible腳本,可能你還須要對它進行調整,而後重來
對於咱們來講,這整個過程是不切實際的。而且,一般狀況下,增長的最大性能,一旦處於運行中,以後就不會將它移除了,這樣就會增長咱們的開銷。
值得注意的是,儘管,設備中沒有使用到的性能就像是在燒錢,重要的是,這依然比購買雲服務器便宜。另外一方面,使用裸機服務器引發的工程難題只是將成本從純粹的材料花費轉移到了管理成本。
在咱們的裸機性能容量規劃中,服務器的管理和Ansible腳本只是冰山一角。
In early 2017, while our infrastructure had grown, so had our team.
We hired 7 more engineers making us a small 9 people team, with skillsets distributed all over the spectrum from backend to frontend with varying levels of seniority.
Even in a small 9 people team, being productive and limiting the amount of bugs deployed to production warrants a simple, easy to setup and easy to use development-staging-production trifecta.
Setting up your development environment as a new hire shouldn’t take hours, neither should upgrading or re-creating it.
Moreover, a company-wide accessible staging environment should exist and match 99% of your production, if not 100%.
Unfortunately, in our hardware infrastructure reaching this harmonious trifecta was impossible.
Scaling :n. 縮放比例;鱗片排列;[醫]刮治術,刮牙術;v. 刮鱗;剝落;生水垢(scale的ing形式)
infrastructure :n. 基礎設施; 基礎結構
distributed :adj. 分佈式的
spectrum :n. 光譜;範圍, 系列
frontend :前端
seniority :n. 年長;職位高;年資, 資歷
productive :adj. 多產的, 富饒的;富有成效的; 有益的
warrants :n. 受權證; 許可證;vt. 使…顯得合理; 成爲…的根據;保證, 擔保
trifecta :n. (賽馬賭博的)三連勝式
Moreover :此外,並且
company-wide :全公司
accessible :adj. 容易取得的,容易得到的,容易達到的
harmonious :adj. 和諧的,和氣的;協調的,調和的;音調優美的;悅耳的
在2017年的年初,隨着咱們服務器數量的增多,咱們的團隊也增大了。
咱們僱傭了7名工程師組成了9人團隊,技能範圍覆蓋了從後端到前端的各個資歷,各個級別。
即便是在9人的小團隊中,要作到工做效率高,而且將bug的數量限制在一個合理的範圍中,也須要一個簡單的,易於設置,易於使用的開發--測試--生產的三大流程體系工具。
從新設置一個開發環境,不該該花費數個小時,也不該該升級或者從新安裝。
此外,必須有全公司隨時都能用的,用於中間測試的環境,而且和真實生產環境能達到99%的匹配(若是達不到100%的話)
First of all, everybody in our engineering team uses MacBook Pros, which is an issue since our stack is linux based.
However, asking everybody to switch to linux and potentially change their precious workflow wasn’t really ideal. This meant that the best solution was to provide a development environment agnostic of developers' personal preferences in machines.
I could only see two obvious options:
Either provide a Vagrant stack that would run multiple virtual machines (17 potentially, though, more realistically, 1 machine running our entire stack), or, re-use the already written ansible scripts and run them against our local macbooks.
After investigating Vagrant, I felt that using virtual machines would hinder performances too much and wasn’t worth it. I decided, for better or worse, to go the Ansible route (in hindsight, this probably wasn’t the best decision).
We would use the same set of Ansible scripts on production, staging and dev. The caveat being of course that our development stack, although close to production, was not a 100% match.
This worked well enough for a while; However, the mismatch caused issues later when, for example, our development and production MySQL versions weren’t aligned. Some queries that ran on dev, wouldn’t on production.
potentially :adv. 潛在地;可能地
precious :adj. 寶貴的;珍貴的;矯揉造做的
agnostic :n. 不可知論者;adj. 不可知論(者)的
obvious :明顯的,顯而易見的
potentially :
realistically :adv. 現實地;實際地
investigating :調查
hinder :vt. & vi. 阻礙; 妨礙
hindsight :n. 過後的覺悟;過後的聰明
caveat :n. 警告;停止訴訟手續的申請;貨物出門概不退換;中止支付的廣告
開發環境
首先,咱們團隊中的全部開發工程師都是使用MacBook Pros,由於咱們的代碼運行在Linux上,所以這是一個問題。
然而,要求他們都切換到Linux,而且極可能改變他們寶貴的工做習慣,不是一個好主意。這意味着,最好的解決辦法就是在機器上提供一個開發環境,並不考慮開發者我的的喜愛。
只有兩個顯而易見的選擇:
要麼提供一個Vagrant虛擬機運行多個虛擬主機(更實際地說,可能有17臺主機在一臺機器上運行咱們的整個項目),要麼用已經編寫好的ansible腳本,並在本地macbooks上運行它們。
在調查了Vagrant以後,我以爲使用虛擬機,會阻礙太多的表現,不值得。不管好壞,我決定Ansible這條路(過後看來,這可能不是最好的決定)。
咱們將在生產、測試和開發上使用相同的Ansible腳本。須要注意的是,咱們的開發堆棧雖然接近生產,但並非100%匹配。
剛開始的時候,運行的挺好的;可是,當咱們的開發和生產MySQL版本不一致時,這種不匹配就會致使後面的問題,一些在開發上能夠運行的查詢,在生產環境上卻不能運行。
Secondly, having a development and production environments running on widely different softwares (mac os versus debian) meant that we absolutely needed a staging environment.
Not only because of potential bugs caused by version mismatches, but also because we needed a way to share new features to external members before launch.
Once again I had multiple choices:
buy 17 servers and run ansible against them. This would double our costs though and we were trying to save money.
setup our entire stack on a unique linux server, accessible from the outside. Cheaper solution, but once again not providing an exact replica of our production system.
I decided to implement the cost-saving solution.
An early version of the staging environment involved 3 independant linux servers, each running the entire stack. Developers would then yell across the room (or hipchat) "taking over dev1", "is anybody using dev3?", "dev2 is down :/".
Overall, our development-staging-production setup was far from optimal: it did the job; but definitely needed improvements.
absolutely :adv. 徹底地,絕對地
replica :複製品
implement :vt. 使生效, 貫徹, 執行 ;n. 工具, 器具, 用具
independant :adj. 獨立的;單獨的;無黨派的;不受約束...
definitely : adv. 明確地, 確切地 必定地, 確定地
測試環境(交付準備環境)
其次,在不一樣的軟件系統上(mac os或者 debia)運行咱們的生產和開發環境,意味着咱們必需要有一個交付準備環境。
這不只是由於版本不匹配致使潛在的bug,還由於咱們須要在啓動以前向外部成員共享新特性。
我又一次有了多個選擇:
我決定執行那個省錢的方案。
一個早期的測試環境包含3個獨立的Linux服務器,每個都運行着所有的技術棧。而後,開發人員會在房間裏大聲說(或者嘻嘻哈哈)「接管一下 dev1」,「有人在用dev3嗎?」,「dev2關機了」
總的來講,咱們的開發---測試---生產 的流程距離理想狀態很遠很遠,他是能夠工做的,但真的還須要好好改進。
In 2013 Dotcloud released Docker.
The Betabrand use case for Docker was immediately obvious. I saw it as the solution to simplify our development and staging environments; by getting rid of the ansible scripts (well, almost; more on that later).
Those scripts would now only be used for production.
At the time, one main pain point for the team was competing for our three physical staging servers: dev1, dev2 and dev3; and for me maintaining those 3 servers was a major annoyance.
After observing docker for a few months, I decided to give it a go in April 2014.
After installing docker on one of the staging servers, I created a single docker image containing our entire stack (haproxy, varnish, redis, apache, etc.) then over the next few months wrote a tool (sailor
) allowing us to create, destroy and manage an infinite number of staging environment accessible via individual unique URLs.
Worth noting that docker-compose didn’t exist at that time; and that putting your entire stack inside one docker image is of course a big no-no but that’s an unimportant detail here.
From this point on, the team wasn’t competing anymore for access to the staging servers. Anybody could create a new, fully configured, staging container from the docker image using sailor. I didn’t need to maintain the servers anymore either; better yet, I shut down and cancelled 2 of them.
Our development environment, however, still was running on macos (well, "Mac OS X" at the time) and using the Ansible scripts.
Then, sometime around 2016 docker-machine was released.
Docker machine is a tool taking care of deploying a docker daemon on any stack of your choice: virtualbox, aws, gce, bare-metal, azure, you name it, docker-machine does it; in one command line.
I saw it as the opportunity to easily and quickly migrate our ansible-based development environment to a docker based one. I modified sailor
to use docker-machine as its backend.
Setting up a development environment was now a matter of creating a new docker-machine then passing a flag for sailor to use it.
At this point, our development-staging process had been simplified tremendously; at least from a dev-ops perspective: anytime I needed to upgrade any software of our stack to a newer version or change the configuration, instead of modifying my ansible scripts, asking all the team to run them, then running them myself on all 3 staging servers; I could now simply push a new docker image.
Ironically enough, I ended up needing virtual machines (which I had deliberately avoided) to run docker on our macbooks. Using vagrant instead of Ansible would have been a better choice from the get go. Hindsight is always 20/20.
Using docker for our development and staging systems paved the way to the better solution that Betabrand.com now runs on.
immediately : adv. 當即, 立刻 直接地
rid :vt. 使擺脫, 解除…的負擔, 從…中清除
Worth noting :值得注意
compose :t. 組成, 構成
opportunity :機會
modified :改良的
tremendously :極大地
perspective :n. 遠景, 景 前途; 但願 透視 透視圖 觀點, 想法
modify :修改
Ironically :adv. 嘲諷地, 挖苦地 具備諷刺意味地
deliberately :adv. 慎重地;謹慎地 故意地,蓄意地 從容不迫地,鎮定自若地
Hindsight :n. 過後的覺悟;過後的聰明
20/20. :用來表示 完美annoyance :n. 惱怒;煩惱;打擾
infinite :adj. 無限的,無窮的;無數的;
2013年,Dotcloud發佈了Docker。
咱們的網站Betabrand在使用了Docker實例後的效果很是明顯,我以爲這是簡化咱們的開發和測試環境的方案,因此,咱們能夠擺脫麻煩的Ansible了(好的,後面將詳細介紹)。
這些腳本,如今只用在生產環境。
當時,團隊的一個主要痛點是爭奪咱們的三個測試服務器:dev一、dev2和dev3;對我來講,維護這3臺服務器是一個很大的麻煩。
在對docker觀察了幾個月以後,2014年,我決定放手去作。
我在一臺測試服務器上安裝了Docker,建立了一個包含咱們全部技術棧(haproxy, varnish, redis, apache,等等)的docker鏡像。在接下來的幾個月寫了一個工具(sailor
),這個工具能夠容許咱們每個的單獨URLs建立、銷燬、管理無數的測試環境
值得注意的是,docker-compose在當時並不存在;固然,將整個堆棧放在一個docker映像中是一個很大的禁忌,但在這裏,這是一個不重要的細節。
從如今開始,團隊再也不爭着訪問測試服務器了。任何人均可以使用sailor
從docker鏡像建立一個新的,徹底配置的docker容器。我也不用再須要維護服務器了;更好的是,我關閉並取消了其中的2個。
可是,咱們的開發環境仍在macos上運行(當時,「Mac OS X」)並使用Ansible腳本。
而後,2016年左右的docker-machine發佈了。
docker-machine是一個工具,爲你選擇的技術棧建立並維護一個守護進程:virtualbox,aws,gce,bare-metal,azure,你能夠爲它命名,這些都在命令行中操做。
我以爲這是能夠簡單快速的將基於ansible的開發環境遷移到基於docker機會。我改進了sailor
使用docker-machine做爲它的後端。
如今,創建一個開發環境就是建立一個新的docker-machine,而後爲sailor
傳遞一個標誌來使用它。
在這一點上,咱們的開發階段過程獲得了極大的簡化;至少從開發者的角度來看:任什麼時候候我須要將咱們技術棧的任何軟件升級到更新的版本或更改配置,而不是修改個人ansible腳本,要求全部團隊運行它們,而後我須要在3臺測試服務器上把它們都運行一次;我如今能夠簡單地推送一個新的docker鏡像。
具備諷刺意味的是,我最終須要虛擬機(我故意避免使用)在咱們的macbook上運行docker。使用vagrant代替Ansible原本是一個更好的選擇。後見之明老是20/20。
使用Docker,爲咱們開發和測試系統找到更好的網站運行的方案鋪平了道路
Because Betabrand is primarily an e-commerce platform, Black Friday loomed over our website more and more each year.
To our surprise, the website had handled increasingly higher loads since 2013 without failing in any major catastrophe, but, it did require a month long preparation beforehand: adding capacity, load testing and optimizing our checkout code paths as much as we possibly could.
After preparing for Black Friday 2016, however, it became evident the infrastructure wouldn’t scale for Black Friday 2017; I worried the website would become inacessible under the load.
Luckily, sometime in 2015, the release of Kubernetes 1.0 caught my attention.
Just like I saw in docker an obvious use-case, I knew k8s was what we needed to solve many of our issues. First of all, it would finally allow us to run an almost identical dev-staging-production environment. But also, would solve our scalability issues.
I also evaluated 2 other solutions, Nomad and Docker Swarm, but Kubernetes seemed to be the most promising.
For Black Friday 2017, I set out to migrate our entire infra to k8s.
Although I considered it, I quickly ruled out using our current OVH bare-metal servers for our k8s nodes since it would play against my goal of getting rid of Ansible and not dealing with all the issue that comes with hardware servers. Moreover, soon after I started investigating Kubernetes, Google released their managed Kubernetes (GKE) offer, which I rapidly came to choose.
loom :n. 織布機;若隱若現的景象;vi. 可怕地出現;朦朧地出現;隱約可見
increasingly :adv. 愈來愈多地;漸增地
catastrophe :n. 大災難;大禍;慘敗
preparation :n. 預備;準備
evident :adj. 明顯的;明白的
infrastructure :n. 基礎設施;公共建設;下部構造
inacessible :?inaccessible 做者寫錯了?--->adj. 達不到的, 不可及的
identical :adj. 同一的;徹底相同的
scalability :n. 可擴展性;可伸縮性;可量測性
因爲Betabrand主要是一個電子商務平臺,每一年的黑色星期五對咱們網站的關注的用戶愈來愈多。
令咱們驚訝的是,自2013年以來,該網站已經處理了愈來愈高的負載而沒有遇到任何重大災難,可是,它確實須要提早一個月進行準備:增長容量,負載測試並儘量地優化咱們的結帳代碼路徑。
然而,在準備2016年黑色星期五以後,很明顯2017年黑色星期五的基礎設施不會擴展;我擔憂網站會在負載下變得沒法控制。
幸運的是,在2015年的某個時候,Kubernetes 1.0的發佈引發了個人注意。
就像我在docker中看到一個明顯的用例同樣,我知道k8s能夠借我咱們遇到的許多問題。首先,它最終將容許咱們運行幾乎相同的開發生產環境。同時,也將解決咱們的可擴展性問題。
我還評估了其餘2個解決方案,Nomad和Docker Swarm,但Kubernetes彷佛是最有但願的。
對於2017年黑色星期五,我開始將整個基礎設施遷移到k8s。
儘管我考慮過這一點(將咱們的服務器用於k8s節點),但我很快就排除了這個作法,由於它會違背個人目標,即擺脫Ansible而不是處理硬件服務器帶來的全部問題。 此外,在我開始調查Kubernetes以後不久,谷歌發佈了他們管理的Kubernetes(GKE)產品,我很快就選擇了。
Migrating to k8s first involved gaining a strong understanding its architecture and its concepts, by reading the online documentation.
Most importantly understanding containers, Pods, Deployments and Services and how they all fit together. Then in order, ConfigMaps, Secrets, Daemonsets, StatefulSets, Volumes, PersistentVolumes and PersistentVolumeClaims.
Other concepts are important, though less necessary to get a cluster going.
Once I assimilated those concepts, the second, and hardest, step involved translating our bare-metal architecture into a set of YAML manifests.
From the beginning I set out to have one, and only one, set of manifests to be used for the creation of all three development, staging and production environment. I quickly ran into needing to parameterized my YAML manifests, which isn’t out-of-the-box supported by Kubernetes. This is where Helm [1] comes in handy.
from the Helm website: Helm helps you manage Kubernetes applications—Helm Charts helps you define, install, and upgrade even the most complex Kubernetes application.
Helm markets itself as a package manager for Kubernetes, I originally used it solely for its templating feature though. I have, now, also come to appreciate its package manager aspect and used it to install Grafana [2] and Prometheus [3].
After a bit of sweat and a few tears, our infrastructure was now neatly organized into 1 Helm package, 17 Deployments, 9 ConfigMaps, 5 PersistentVolumeClaims, 5 Secrets, 18 Services, 1 StatefulSet, 2 StorageClasses, 22 container images.
All that was left was to migrate to this new infrastructure and shutdown all our hardware servers.
gain :n. 增長;利潤;收穫 vt. 得到;增長;賺到 vi. 增長;獲利
concepts :n. 概念,觀念;思想
assimilate :vt. 吸取;使同化;把…比做;使類似
architecture :n. 建築學;建築風格;建築式樣;架構
manifest :n. 載貨單,貨單;旅客名單;貨運列車編組清單;v. 代表,清楚顯示
set out :vt. 規劃,展示,開始@vi. 出發
parameterized :參數化的
out-of-the-box :開箱即用的
handy :adj. 手邊的,就近的;便利的;容易取得的;敏捷的
sweat :汗水
遷移到k8s首先須要經過閱讀在線文檔對其體系結構和概念有一個較強的理解。
最重要的是理解containers, Pods, Deployments 和Services以及它們是如何組合在一塊兒的。而後依次是ConfigMaps, Secrets, Daemonsets, StatefulSets, Volumes, PersistentVolumes and PersistentVolumeClaims。
其餘的概念也很重要,可是對於集羣的運行來講就不是那麼必要了。
一旦我吸取了這些概念,第二個也是最難的步驟就是將咱們的服務器架構轉換成一組YAML清單。
從一開始,我就規劃了一組Yaml清單,僅有的清單用於建立全部三個開發、測試和生產環境。我很快就須要參數化個人YAML清單,Kubernetes不支持的開箱即用。這個時候,Helm就該派上用場了
來自Helm的網站:Helm幫助您定義、安裝和升級不管多麼複雜的Kubernetes應用程序。
Helm將本身定位爲Kubernetes的包管理器,但我最初只是將其用於模板特性。如今,我也開始欣賞它的包管理器方面,並使用它安裝Grafana[2]和Prometheus[3]。
通過一些汗水和淚水,咱們的基礎設施如今被整齊地組織成一個Helm包、17個部署、9個ConfigMaps、5個PersistentVolumeClaims、5個secret、18個服務、1個狀態集、2個存儲庫、22個容器映像。
這些都作好以後,剩下的就是遷移到這個新的設備上,並關閉全部的硬件服務器。
October 5th 2017 was the night.
Pulling the trigger was extremely easy and went without a hitch.
I created a new GKE cluster, ran helm install betabrand --name production
, imported our MySQL database to Google Cloud SQL, then, after what actually took about 2 hours, we were live in the Clouds.
The migration was that simple.
What helped a lot of course, was the ability to create multiple clusters in Google GKE: before migrating our production, I was able to rehearse through many test migration, jotting down every steps needed for a successful launch.
Black Friday 2017 was very successful for Betabrand and the few technical issues we ran into weren’t associated to the migration.
pull :拖、拉
trigger :vt. 觸發;引起,引發 vi. 鬆開扳柄 n. 觸發器;扳機;制滑機
extremely :極端、極其、很是
hitch :n. 鉤;猛拉;急推;蹣跚;故障
rehearse :排練,預演
jotting :簡短的筆記
associated :聯合的,關聯的
2017年10月5日晚上。
扣動扳機極其容易,而且沒有任何問題。
經過運行helm install betabrand --name production,我建立了一個新的GKE集羣,將咱們的MySQL引入到谷歌的雲SQL。而後,在等待了將近2個小時後,咱們就進入了雲端。
遷移就是這麼簡單。
幫助最大的功課,是在Google GKE中建立多個集羣的能力:在遷移咱們的生產環境以前,我儘量的排練了屢次遷移過程,而後記下成功操做的每個步驟。
咱們的Betabrand成功的度過了2017年的黑色星期五,有幾個技術問題和此次的遷移無關。
Our development machines run a Kubernetes cluster via Minikube [4].
The same YAML manifests are being used to create a local development environment or a "production-like" environment.
Everything that runs on Production, also runs in Development. The only difference between the two environments is that our development environment talks to a local MySQL database, whereas production talks to Google Cloud SQL.
Creating a staging environment is exactly the same as creating a new production cluster: all that is needed is to clone the production database instance (which is only a few clicks or one command line) then point the staging cluster to this database via a --set database
parameter in helm
.
parameter :實例
咱們的開發環境經過Minikube運行在K8s集羣上。
相同的YAML清單用於建立本地開發環境或「相似生產」環境。
在生產環境運行的全部東西,都要在開發環境上運行,這兩個環境的不一樣之處點就是,咱們的開發環境使用本地的MySQL數據庫,而生產環境使用谷歌雲SQL。
建立一個測試環境幾乎和建立一個新的生產環境是同樣的。須要作的就是複製一些生產環境的數據庫實例(只須要點擊記下或者一條命令)而後經過在helm裏面設置參數 -- set database 將測試環境的數據執行復制的這個數據庫。
It’s now been a year and 2 months since we moved our infrastructure to Kubernetes and I couldn’t be happier.
Kubernetes has been rock solid in production and we have yet to experience an outage.
In anticipation of a lot of traffic for Black Friday 2018, we were able to create an exact replica of our production services in a few minutes and do a lot of load testing. Those load tests revealed specific code paths performing extremely poorly that only a lot of traffic could reveal and allowed us to fix them before Black Friday.
As expected, Black Friday 2018 brought more traffic than ever to Betabrand.com, but k8s met its promises, and, features like the HorizontalPodAutoscaler coupled to GKE’s node autoscaling allowed our website to absorb peak loads without any issues.
K8s, combined with GKE, gave us the tools we needed to make our infrastructure reliable, available, scalable and maintainable.
solid :固態的
outage :短供期
anticipation :預料,預期
replica :複製品
perform :表演的,履行的
extremely :極端,極其,很是
reveal :顯示,顯露
promises :承諾
coupled :v. 聯接的;成對的;耦合的
從咱們遷移到K8s上已通過去一年兩個月了,我如今很是的輕鬆愉快。
K8s如今已經如磐石般堅固,咱們的系統再也沒有經歷過中斷。
爲了預測2018年黑色星期五的大流量,咱們用幾分鐘的時間建立了一個額外的生產環境副本並開始作壓力測試。這些測試顯示出特定的代碼路徑表現的很是差勁,只能顯示少許的流量,咱們能夠在黑色星期五到來以前修復他們。
意料之中,2018年的黑色星期五給咱們的網站Betabrand帶來了史無前例的流量,可是,K8s兌現了它的承諾。像HorizontalPodAutoscaler這樣的功能和GKE節點的自動調整伸縮結合在一塊兒,可使咱們的網站吸取峯值負載,而不會出現任何問題。
K8s和GKE的結合,給咱們提供了咱們須要的工具,來使咱們的基礎設施可靠、可用、可擴展和可維護。