《Cloud Native Infrastructure》CHAPTER 1（2）

時間 2019-11-30

標籤 cloud native infrastructure chapter 欄目雲服務简体版

原文原文鏈接

Cloud Native Infrastructure

「Cloud Native」是一個被市場過分解讀的術語，可是它對於工程與管理來講仍然具備意義，對於咱們，這是公有云提供商的技術演變史。react

「Cloud native」 is a loaded term. As much as it has been hijacked by marketing departments, it still can be meaningful for engineering and management. To us, it is the evolution of technology in the world where public cloud providers exist.web

「Cloud native infrastructure」是爲了運行程序而隱藏在組件背後的由API控制，軟件管理基礎設施。經過此些特徵運行的基礎設施產生了一種新的可拓展的，高效的模式。shell

Cloud native infrastructure is infrastructure that is hidden behind useful abstractions, controlled by APIs, managed by software, and has the purpose of running applications. Running infrastructure with these traits（特徵） gives rise to a new pattern for managing that infrastructure in a scalable, efficient way.數據庫

當「抽象」對客戶能夠隱藏複雜性的時候，它是有用的。它能夠實現更復雜的技術使用方式，但它也限制了技術的使用方式。它適用於底層技術，例如TCP基於IP的抽象，或更高級別的協議，再例如虛擬機如何抽象物理服務器。抽象應該老是讓用戶的關注點向上層移動，而不是從新實現底層技術。緩存

Abstractions are useful when they successfully hide complexity for their consumer. They can enable more complex uses of the technology, but they also limit how the technology is used. They apply to low-level technology, such as how TCP abstracts IP, or higher levels, such as how VMs abstract physical servers. Abstractions should always allow the consumer to 「move up the stack」 and not reimplement the lower layers.服務器

Cloud native infrastructure須要抽象底層IaaS產品以提供本身的組件。新的抽象層負責控制其下層的IaaS，並經過開發API以供客戶端控制。網絡

Cloud native infrastructure needs to abstract the underlying IaaS offerings to provide its own abstractions. The new layer is responsible for controlling the IaaS below it as well as exposing its own APIs to be controlled by a consumer.架構

由軟件管理的基礎設施是雲中的關鍵差別化因素。軟件控制的基礎設施可使基礎設施變得可擴展，並且它在彈性、資源調配和可維護性方面也發揮着做用。該軟件須要瞭解基礎設施的抽象，並知道如何獲取抽象資源並相應地在可消耗的IaaS組件中實現它。app

Infrastructure that is managed by software is a key differentiator in the cloud. Software-controlled infrastructure enables infrastructure to scale, and it also plays a role in resiliency, provisioning, and maintainability. The software needs to be aware of the infrastructure’s abstractions and know how to take an abstract resource and implement it in consumable IaaS components accordingly.負載均衡

這些模式不只僅影響基礎設施的運行方式。在「Cloud Native Infrastructure」上運行的App以及相應的工做人員也與傳統基礎設施中有所不一樣。

These patterns influence more than just how the infrastructure runs. The types of applications that run on cloud native infrastructure and the kinds of people who work on them are different from those in traditional infrastructure.

若是「Cloud Native Infrastructure」看起來很是像PaaS產品，那麼咱們在構建本身的基礎設施時應該注意什麼？讓咱們快速描述一些看起來像解決方案但不提供全面的Cloud Native Infrastructure的案例。

If cloud native infrastructure looks a lot like a PaaS offering, how can we know what to watch out for when building our own? Let’s quickly describe some areas that may appear like the solution, but don’t provide all aspects of cloud native infrastructure.

What Is Not Cloud Native Infrastructure? （什麼不是雲原生基礎設施）

Cloud Native Infrastructure不只僅是運行在公有云上的基礎設施。僅僅從其餘人那租用服務器的使用時間並不能使您的基礎設施Cloud Native。管理IaaS的過程每每與運行一個物理的機房沒有區別，許多將現有基礎設施遷移到雲上的公司都未能得到回報。

Cloud native infrastructure is not only running infrastructure on a public cloud. Just because you rent server time from someone else does not make your infrastructure cloud native. The processes to manage IaaS are often no different than running a physical data center, and many companies that have migrated existing infrastructure to the cloud have failed to reap the rewards.

Cloud Naive不是在容器內運行程序，當Netflix率先推出雲本機基礎設施時，其幾乎全部應用程序都部署在虛擬機映像裏，而不是容器。這種打包整個程序並不意味着你將可以擁有拓展性和自運維繫統的好處。即便您的應用程序是經過持續集成和持續交付流水線自動構建和部署的，也不意味着您將受益於這種能夠補全API驅動部署的基礎設施中。

Cloud native is not about running applications in containers. When Netflix pioneered cloud native infrastructure, almost all its applications were deployed with virtualmachine images, not containers. The way you package your applications does not mean you will have the scalability and benefits of autonomous systems. Even if your applications are automatically built and deployed with a continuous integration and continuous delivery pipeline, it does not mean you are benefiting from infrastructure that can complement API-driven deployments.

它也不意味着您只須要運行一個容器編排系統（例如Kubernetes和Meos）。容器編排系統提供了Cloud Native Infrastructure中所需的許多平臺功能，但不按預期使用這些feature的話，就意味着不的應用程序是被動態調度運行在一組服務器上。這是很是好的第一步，但還不夠。

It also doesn’t mean you only run a container orchestrator (e.g., Kubernetes and Mesos). Container orchestrators provide many platform features needed in cloud native infrastructure, but not using the features as intended means your applications are dynamically scheduled to run on a set of servers. This is a very good first step, but there is still work to be done.

Cloud Native不是關於微服務的，也不是「infrastructure as code」，微服務經過更細粒度的不一樣的功能劃爲，實現更快的開發週期，但單體應用也能夠具備相同的功能，被有效地管理，也能夠從Cloud Native Infrastruture中受益。

Cloud native is not about microservices or infrastructure as code. Microservices enable faster development cycles on smaller distinct functions, but monolithic applications can have the same features that enable them to be managed effectively by software and can also benefit from cloud native infrastructure.

Infrastructure as code是經過「機器可解析語言」或領域特定語言（DSL）使基礎設施自動化。將code應用於基礎設施的傳統工具包括配置管理工具（如Chef和Puppet）。這些工具在自動化任務和一致性方面有很大的幫助，可是它沒法提供必要的抽象來描述單個服務器以外的基礎設施。

Infrastructure as code defines and automates your infrastructure in machine-parsible language or domain-specific language (DSL). Traditional tools to apply code to infrastructure include configuration management tools (e.g., Chef and Puppet). These tools help greatly in automating tasks and providing consistency, but they fall short in providing the necessary abstractions to describe infrastructure beyond a single server.

配置管理工具能夠在人的操做下，一次性自動化將配置與程序聯繫在一塊兒進行部署，「人」就可能成爲擴容的瓶頸。這些工具也不會自動生成一個完整系統所需的額外部分（例如，存儲和網絡）。

Configuration management tools automate one server at a time and depend on humans to tie together the functionality provided by the servers. This positions humans as a potential bottleneck for infrastructure scale. These tools also don’t automate the extra parts of cloud infrastructure (e.g., storage and network) that are needed to make a complete system.

雖然配置管理工具爲操做系統的資源（例如，部署包管理）提供了一些抽象，但它們並無抽象出足夠的底層操做系統來輕鬆管理它。若是工程師想要管理系統上的每一個包和文件，那麼這將是一個很是艱苦的過程，並且對於每一個配置變量都是獨一無二的。一樣，不定義或定義不正確的配置管理，只會消耗系統資源而且不提供任何價值。

While configuration management tools provide some abstractions for an operating system’s resources (e.g., package managers), they do not abstract away enough of the underlying OS to easily manage it. If an engineer wanted to manage every package and file on a system, it would be a very painstaking process and unique to every configuration variant. Likewise, configuration management that defines no, or incorrect, resources is only consuming system resources and providing no value.

雖然配置管理工具能夠幫助實現部分基礎設施的自動化，但它們不能更好地幫助管理應用程序。咱們將在後面的章節中介紹部署、管理、測試和操做基礎設施的流程，以探索Cloud Native Infrastuture的不一樣之處，但首先咱們將瞭解哪些應用程序是成功的，以及什麼時候應使用Cloud Native Infrastuture。

While configuration management tools can help automate parts of infrastructure, they don’t help manage applications better. We will explore how cloud native infrastructure is different by looking at the processes to deploy, manage, test, and operate infrastructure in later chapters, but first we will look at which applications are successful and when you should use cloud native infrastructure.

Cloud Native Applications（雲原生應用）

像雲改變業務與基礎設施直接的關係同樣，雲原生基礎設施改變了應用與基礎設施的關係。咱們須要看到Cloud Native應用與傳統應用之間的不一樣，以便咱們瞭解它們與基礎設施的新關係。

Just as the cloud changed the relationship between business and infrastructure, cloud native applications changed the relationship between applications and infrastructure. We need to see what is different about cloud native compared to traditional applications so we can understand their new relationship with infrastructure.

出於本書須要有一個共享的詞彙表的目的，咱們須要定義當咱們說「cloud native application」時的含義。Cloud Native與12-Factor App不一樣，即便它們可能有一些相似的特徵。若是您想了解更多有關它們如何不一樣的詳細信息，咱們建議您閱讀Kevin Hoffman撰寫的Beyond the Twelve-Factor App（O'Reilly，2012）。

For the purposes of this book, and to have a shared vocabulary, we need to define what we mean when we say 「cloud native application.」 Cloud native is not the same thing as a 12-factor application, even though they may share some similar traits. If you’d like more details about how they are different, we recommend reading Beyond the Twelve-Factor App, by Kevin Hoffman (O’Reilly, 2012).

一個雲原生應用程序是被設計在平臺上運行，旨在實現彈性，敏捷性，可操做性和可觀察性。

彈性：容許失敗，而不是試圖阻止它; 它利用了運行在平臺上的動態特性。
敏捷性：容許快速部署和快速迭代。
可操做性增長了程序內部對於App生命週期的控制，而不是依賴於外部進程和監控服務。
可觀察性提供了有關應用程序狀態問題的信息。
Cloud Native Application的定義仍在不斷髮展。 CNCF等組織還有其餘定義。

A cloud native application is engineered to run on a platform and is designed for resiliency, agility, operability, and observability. Resiliency embraces failures instead of trying to prevent them; it takes advantage of the dynamic nature of running on a platform. Agility allows for fast deployments and quick iterations. Operability adds control of application life cycles from inside the application instead of relying on external processes and monitors. Observability provides information to answer questions about application state.

The definition of a cloud native application is still evolving. There are other definitions available from organizations like the CNCF.

雲原生應用程序經過各類方法獲取這些特徵。它一般取決於您的應用程序運行的位置以及業務的流程和文化。如下是實現雲原生應用程序所需特徵的經常使用方法：

Cloud native applications acquire these traits through various methods. It can often depend on where your applications run5 and the processes and culture of the business. The following are common ways to implement the desired characteristics of a cloud native application:

Microservices（微服務）
Health reporting（健康報告）
Telemetry data（遙測數據，經過傳感器被遙測終端接收到的實時數據）
Resiliency（彈性）
Declarative, not reactive（聲明式，而非反應式）

Microservices （微服務）

做爲單個實體管理和部署的應用程序一般稱爲單體應用程序。在最初開發應用程序時，單體應用有不少好處。它們更容易理解並容許您在不影響其餘服務的狀況下更改主要功能。

Applications that are managed and deployed as single entities are often called monoliths. Monoliths have a lot of benefits when applications are initially developed. They are easier to understand and allow you to change major functionality without affect‐ ing other services.

隨着應用程序的複雜性的增長，單體應用程序的好處逐漸減小。它們變得愈來愈難理解，而且它們失去了敏捷性，由於工程師很難對代碼進行推理和更改。

As complexity of the application grows, the benefits of monoliths diminish. They become harder to understand, and they lose agility because it is harder for engineers to reason about and make changes to the code.

對抗複雜性的最佳方法之一是將明肯定義的功能分離爲較小的服務，並讓每一個服務獨立地迭代。經過按需更改的應用程序的部分，來提升應用程序的靈活性。每一個微服務能夠由單獨的團隊管理，用適當的語言編寫，並根據須要獨立擴展。

One of the best ways to fight complexity is to separate clearly defined functionality into smaller services and let each service independently iterate. This increases the application’s agility by allowing portions of it to be changed more easily as needed. Each microservice can be managed by separate teams, written in appropriate lan‐ guages, and be independently scaled as needed.

只要每一個服務都堅守約定，應用程序就能夠快速改進和更改。固然，遷移到微服務體系結構還須要考慮其餘許多因素。其中最重要的是咱們在附錄A中提到的彈性通信。

So long as each service adheres to strong contracts, the application can improve and change quickly. There are of course many other considerations for moving to micro‐ service architecture. Not the least of these is resilient communication, which we address in Appendix A.

咱們不可能想到全部的遷移微服務的考慮點。擁有microservices並不意味着您擁有cloud native infrastructure。若是您想了解更多信息，咱們建議Sam Newman的《Building Microservices》（O'Reilly，2015）。正如咱們前面所說，雖然微服務是實現應用程序敏捷性的一種方法，但對於Cloud Native Application來講，不是必須的。

We cannot go into all considerations for moving to microservices. Having microser‐ vices does not mean you have cloud native infrastructure. If you would like to read more, we suggest Sam Newman’s Building Microservices (O’Reilly, 2015). While microservices are one way to achieve agility with your applications, as we said before, they are not a requirement for cloud native applications.

Health Reporting（健康報告）

沒有人比開發人員更瞭解應用程序在健康狀態下運行須要什麼。很長一段時間，一直試圖弄清楚「健康」對於他們負責運行的應用程序意味着什麼。若是不瞭解實際使應用程序健康的緣由，他們在應用程序「不健康時」監控和告警的嘗試每每是脆弱和不完整的。

No one knows more about what an application needs to run in a healthy state than the developer. For a long time, infrastructure administrators have tried to figure out what 「healthy」 means for applications they are responsible for running. Without knowledge of what actually makes an application healthy, their attempts to monitor and alert when applications are unhealthy are often fragile and incomplete.

爲了提升雲原生應用的可操做性，應用程序應暴露一個運行情況檢查的接口。開發人員能夠將其實現爲應用程序在執行自檢後能夠響應的命令或進程信號，或者更常見的是，做爲由經過HTTP提供返回運行情況的應用程序的Web端點。
監控報告在Google Borg的一篇論文中出現：幾乎每一個在borg下運行的任務都包含一個內置的HTTP服務器，它發佈有關任務運行情況和數千個性能指標（例如，RPC延遲）的信息。BORG監視健康檢查URL，並從新啓動沒有及時響應或返回HTTP錯誤代碼的任務。其餘數據由儀表盤的監控工具和服務級別目標（SLO）違規警報跟蹤。

To increase the operability of cloud native applications, applications should expose a health check. Developers can implement this as a command or process signal that the application can respond to after performing self-checks, or, more commonly, as a web endpoint provided by the application that returns health status via an HTTP code.

Google Borg Example，One example of health reporting is laid out in Google’s Borg paper：
Almost every task run under Borg contains a built-in HTTP server that publishes information about the health of the task and thousands of performance metrics (e.g., RPC latencies). Borg monitors the health-check URL and restarts tasks that do not respond promptly or return an HTTP error code. Other data is tracked by monitor‐ ing tools for dashboards and alerts on service-level objective (SLO) violations.

將健康責任轉移到應用程序中使App更易於管理和自動化。應用程序應該知道它是否正常運行，以及它依賴什麼（例如，對數據庫的訪問）來提供業務價值。這意味着開發人員須要與產品經理一塊兒定義應用程序服務的業務功能，並相應地編寫測試。

Moving health responsibilities into the application makes the application much easier to manage and automate. The application should know if it’s running properly and what it relies on (e.g., access to a database) to provide business value. This means developers will need to work with product managers to define what business function the application serves and to write the tests accordingly.

提供健康檢查的應用程序示例包括ZooKeeper的ruok命令和etcd的http/health端點。

Examples of applications that provide heath checks include Zookeeper’s ruok com‐mand and etcd’s HTTP /health endpoint.

應用程序不只僅具備健康或不健康的狀態。它將經歷一個啓動和關閉過程，在此過程當中，他們應該經過健康檢查報告他們的狀態。若是應用程序可以讓平臺準確地知道它處於什麼狀態，那麼平臺就更容易知道如何操做它。

Applications have more than just healthy or unhealthy states. They will go through a startup and shutdown process during which they should report their state through their health check. If the application can let the platform know exactly what state it is in, it will be easier for the platform to know how to operate it.

一個很好的例子是，當平臺須要知道應用程序什麼時候能夠接收流量時。當應用程序啓動時，它不能正確地處理流量，它應該表現爲尚未準備好。此附加狀態將阻止應用程序過早終止，由於若是健康檢查失敗，平臺可能會假定應用程序不健康，並反覆中止或從新啓動應用程序。

A good example is when the platform needs to know when the application is available to receive traffic. While the application is starting, it cannot properly handle traffic（流量）, and it should present itself as not ready. This additional state will prevent the application from being terminated prematurely, because if health checks fail, the platform may assume the application is not healthy and stop or restart it repeatedly.

應用程序運行情況只是可以自動化應用程序生命週期的一部分。除了知道應用程序是否健康外，還須要知道應用程序是否在執行工做。這些信息來自遙測數據。

Application health is just one part of being able to automate application life cycles. In addition to knowing if the application is healthy, you need to know if the application is doing any work. That information comes from telemetry data.

Telemetry Data（遙測數據）

遙測數據（相似於SLA數據）是決策所必需的信息。確實，遙測數據可能與健康報告有些重疊，但它們有不一樣的用途。健康報告通知咱們應用程序的生命週期狀態，而遙測數據通知咱們應用程序業務目標。

Telemetry data is the information necessary for making decisions. It’s true that telemetry data can overlap somewhat with health reporting, but they serve different purposes. Health reporting informs us of application life cycle state, while telemetry data informs us of application business objectives.

您度量的指標有時稱爲服務級別指標（SLI）或關鍵性能指標（KPI）。這些是特定於應用程序的數據，容許您確保應用程序的性能在服務級別目標（SLO）內。若是您想要更多關於這些條款的信息以及它們如何與您的應用程序和業務需求相關，咱們建議您閱讀《Site Reliability Engineering》（O'Reilly）的第4章。

The metrics you measure are sometimes called service-level indicators (SLIs) or key performance indicators (KPIs). These are application-specific data that allow you to make sure the performance of applications is within a service-level objective (SLO). If you want more information on these terms and how they relate to your application and business needs, we recommend reading Chapter 4 from Site Reliability Engineering (O’Reilly).

遙測與指標被用來回答相似下列的問題：

應用程序每分鐘接收多少請求？
有什麼錯誤嗎？
什麼是應用程序延遲？
下訂單須要多長時間？

Telemetry and metrics are used to solve questions such as:

How many requests per minute does the application receive?
Are there any errors?
What is the application latency?
How long does it take to place an order?

這些數據一般被採集或推送到時序數據庫（用於存儲和管理時間序列數據的專業化數據庫，例如Prometheus或influxdb）進行聚合。對遙測數據的惟一要求是按照將要收集數據的系統進行格式化。

The data is often scraped or pushed to a time series database (e.g., Prometheus or InfluxDB) for aggregation. The only requirement for the telemetry data is that it is formatted for the system that will be gathering the data.

最好至少實現度量的RED方法，RED指速率（Rate）、錯誤（Error）和持續時間（Duration）。

速率：收到多少請求
錯誤：應用程序中的錯誤數
持續時間：收到響應的時間

It is probably best to, at minimum, implement the RED method for metrics, which collects rate, errors, and duration from the application.

Rate：How many requests received
Errors：How many errors from the application
Duration：How long to receive a response

遙測數據應該應用於告警而不是健康監測。在動態的自我修復環境中，咱們不太關心單個應用程序實例的生命週期，而更關心整個應用程序的SLO。健康報告對於自動化應用程序管理仍然很重要，但不該用於頁面工程師。

Telemetry data should be used for alerting rather than health monitoring. In a dynamic, self-healing environment, we care less about individual application instance life cycles and more about overall application SLOs. Health reporting is still important for automated application management, but should not be used to page engineers.

若是一個應用程序的1個或50個實例不正常，只要知足該應用程序的業務需求，咱們可能不但願收到警報。度量指標可讓您知道是否知足SLO、應用程序如何被使用以及應用程序的「正常」狀況。警報能夠幫助您將系統恢復到已知的良好狀態。

若是它遷移，咱們跟蹤它。有時咱們會繪製一個還沒有移動的圖形，以防萬一決定讓它run。
—Ian Malpass, Measure Anything, Measure Everything

If 1 instance or 50 instances of an application are unhealthy, we may not care to receive an alert, so long as the business need for the application is being met. Metrics let you know if you are meeting your SLOs, how the application is being used, and what 「normal」 is for your application. Alerting helps you to restore your systems to a known good state.

If it moves, we track it. Sometimes we’ll draw a graph of something that isn’t moving yet, just in case it decides to make a run for it.
—Ian Malpass, Measure Anything, Measure Everything

告警也不該與日誌記錄混淆。日誌記錄用於調試，開發和觀察模式。它暴露了應用程序的內部設計。指標（例如，錯誤率）有時能夠從日誌中計算，可是須要額外的聚合服務（例如，ElasticSearch）和處理。

Alerting also shouldn’t be confused with logging. Logging is used for debugging, development, and observing patterns. It exposes the internal functionality of your application. Metrics can sometimes be calculated from logs (e.g., error rate) but requires additional aggregation services (e.g., ElasticSearch) and processing.

Resiliency（彈性）

一旦您擁有遙測和監控數據，您須要確保您的應用程序可以抵禦故障。彈性曾經是基礎設施的責任，但云原生應用程序須要承擔部分工做。

Once you have telemetry and monitoring data, you need to make sure your applica‐ tions are resilient to failure. Resiliency used to be the responsibility of the infrastruc‐ ture, but cloud native applications need to take on some of that work.

基礎設施旨在抵禦失敗。硬件過去須要多個硬盤驅動器，電源以及全天候監控和部件更換以保持應用程序可用。使用雲原生應用程序，應用程序有責任接受失敗而不是避免失敗。

在任何平臺中，尤爲是在雲中，最重要的特徵是其可靠性。
—David Rensin, The ARCHITECHT Show: A crash course from Google on engineering for the cloud

Infrastructure was engineered to resist failure. Hardware used to require multiple hard drives, power supplies, and round-the-clock monitoring and part replacements to keep an application available. With cloud native applications, it is the application’s responsibility to embrace failure instead of avoid it.

In any platform, especially in a cloud, the most important feature above all else is its reliability.
—David Rensin, The ARCHITECHT Show: A crash course from Google on engineering for the cloud

設計彈性應用程序自己就是一本書。咱們將考慮使用雲原生應用程序實現彈性的兩個主要方面：故障設計和優雅降級。

Designing resilient applications could be an entire book itself. There are two main aspects to resiliency we will consider with cloud native application: design for failure, and graceful degradation.

Design for failure（故障設計）

惟一不該該故障的系統是讓你活着的系統（例如，心臟搭橋和剎車）。若是您的服務永遠不會中止，您花費了太多時間來設計它們以抵禦失敗而且沒有足夠的時間來增長業務價值。您的SLO肯定服務須要多長的正常運行時間。您用於設計超出SLO的正常運行時間的任何資源都被浪費了。

The only systems that should never fail are those that keep you alive (e.g., heart implants, and brakes). If your services never go down,8 you are spending too much time engineering them to resist failure and not enough time adding business value. Your SLO determines how much uptime is needed for a service. Any resources you spend to engineer uptime that exceeds the SLO are wasted.

您應該爲每項服務測量的兩個值，應該是您的平均故障間隔時間（MTBF）和平均恢復時間（MTTR）。經過監控和指標，您能夠檢測是否知足SLO，但運行應用程序的平臺是保持MTBF較高且MTTR較低的關鍵。

Two values you should measure for every service should be your your mean time between failures (MTBF) and mean time to recovery (MTTR). Monitoring and metrics allow you to detect if you are meeting your SLOs, but the platform where the applications run is key to keeping your MTBF high and your MTTR low.

在任何複雜的系統中，都會出現故障。您能夠管理硬件中的某些故障（例如，RAID和冗餘電源）以及基礎設施中的某些故障（例如，負載均衡器）; 由於應用程序知道它們什麼時候健康，因此它們也應該盡力管理本身的故障。

In any complex system, there will be failures. You can manage some failures in hardware (e.g., RAID and redundant power supplies) and some in infrastructure (e.g., load balancers); but because applications know when they are healthy, they should also try to manage their own failure as best they can.

與假設可用性的應用程序相比，設計有故障預期的應用程序將以更具防護性的方式開發。當故障不可避免時，將在應用程序中內置額外的檢查，故障模式和日誌記錄。

An application that is designed with expectations of failure will be developed in a more defensive way than one that assumes availability. When failure is inevitable, there will be additional checks, failure modes, and logging built into the application.

知道應用程序的每種失敗方式是不可能的。假設任何事情均可能而且可能會失敗的假設，是一種雲原生應用程序模式。

It is impossible to know every way an application can fail. Developing with the assumption that anything can, and likely will, fail is a pattern of cloud native applications.

應用程序的最佳狀態是健康。第二個最佳狀態爲失敗。其餘一切都是非二進制的，很難監控和排除故障。蜂窩公司的首席執行官慈善專業人士在她的文章「Ops: It’s Everyone’s Job Now」中指出，「分佈式系統永遠不會徹底work；它們以一種持續的、部分退化的服務狀態存在。接受失敗，彈性設計，保護和縮小關鍵路徑。」

The best state for your application to be in is healthy. The second best state is failed. Everything else is nonbinary and difficult to monitor and troubleshoot. Charity Majors, CEO of Honeycomb, points out in her article 「Ops: It’s Everyone’s Job Now」 that 「distributed systems are never up; they exist in a constant state of partially degraded service. Accept failure, design for resiliency, protect and shrink the critical path.」

不管失敗是什麼，雲本機應用程序都應該具備適應性。他們接收失敗，因此當它被檢測到時，他們會進行調整。

Cloud native applications should be adaptable no matter what the failure is. They expect failure, so they adjust when it’s detected.

有些故障不能也不該該設計進應用程序（例如，網絡分區和可用性區域故障）。平臺應自動處理未集成到應用程序中的故障領域。

Some failures cannot and should not be designed into applications (e.g., network partitions and availability zone failures). The platform should autonomously handle fail‐ ure domains that are not integrated into the applications.

Graceful degradation（優雅降級）

雲本地應用程序須要有一種方法來處理過多的負載，無論它是應用程序仍是依賴服務。處理負載的一種方法是優雅地降級。《Site Reliability Engineering》一書將應用程序中的優雅降級描述爲在過載狀況下提供「不如正常響應準確或包含的數據少於正常響應，但更容易計算的響應」。

Cloud native applications need to have a way to handle excessive load, no matter if it’s the application or a dependent service under load. One way to handle load is to degrade gracefully. The Site Reliability Engineering book describes graceful degradation in applications as offering 「responses that are not as accurate as or that contain less data than normal responses, but that are easier to compute」 when under excessive load.

某些減小應用程序負載的方面由基礎設施處理。智能負載均衡和動態伸縮可能會有所幫助，但在某些時候，您的應用程序可能承受的負載可能會超過它所能處理的負載。雲原生應用程序須要意識到這一必然性並作出相應的反應。

Some aspects of shedding application load are handled by infrastructure. Intelligent load balancing and dynamic scaling can help, but at some point your application may be under more load than it can handle. Cloud native applications need to be aware of this inevitability and react accordingly.

優雅降級的要點是容許應用程序老是返回對請求的響應。若是應用程序沒有足夠的本地計算資源，以及依賴服務沒有及時返回信息，則這種狀況確實如此。依賴於一個或多個其餘服務的服務對於響應請求應該是可用的，即便沒法依賴於這些服務。當服務降級時，解決方案多是返回部分答案或從本地緩存返回舊信息的答案。

The point of graceful degradation is to allow applications to always return an answer to a request. This is true if the application doesn’t have enough local compute resources, as well as if dependent services don’t return information in a timely manner. Services that are dependent on one or many other services should be available to answer requests even if dependent services are not. Returning partial answers, or answers with old information from a local cache, are possible solutions when services are degraded.

雖然應該在應用程序中實現優雅降級和故障處理，但平臺的多個層應該也有所幫助。若是採用微服務，則網絡基礎設施將成爲須要在提供應用程序彈性方面發揮積極做用的關鍵組件。有關構建彈性網絡層的更多信息，請參閱附錄A.

While graceful degradation and failure handling should both be implemented in the application, there are multiple layers of the platform that should help. If microservices are adopted, then the network infrastructure becomes a critical component that needs to take an active role in providing application resiliency. For more information on building a resilient network layer, please see Appendix A.

Declarative, Not Reactive（聲明式，而非響應式）

因爲雲原生應用程序設計爲在雲環境中運行，所以它們與基礎架構和支持應用程序的交互方式與傳統應用程序不一樣。在雲原生應用程序中，與任何其餘服務通訊的方式都是經過網絡。許多時候，網絡通訊是經過RESTful HTTP調用完成的，但也能夠經過其餘接口實現，如遠程過程調用（RPC）。

Because cloud native applications are designed to run in a cloud environment, they interact with infrastructure and supporting applications differently than traditional applications do. In a cloud native application, the way to communicate with anything is through the network. Many times network communication is done through RESTful HTTP calls, but it can also be implemented via other interfaces such as remote procedure calls (RPC).

傳統應用程序能夠經過消息隊列、寫在共享存儲上的文件或觸發shell命令的本地腳原本自動執行任務。通訊方法對發生的事件做出反應（例如，若是用戶點擊提交，運行提交腳本）而且一般須要信息存在於同一物理或虛擬服務器上。

Serverless:Serverless平臺是雲原生的，是事件反應式設計。他們在雲中工做得很好的一個緣由是由於他們經過HTTP API進行通訊，是單一用途的功能，而且在他們所稱的內容中是聲明性的。該平臺還有助於使它們在雲中可擴展和訪問。
Traditional applications would automate tasks through message queues, files written on shared storage, or local scripts that triggered shell commands. The communication method reacted to an event that happened (e.g., if the user clicks submit, run the submit script) and often required information that existed on the same physical or virtual server.
Serverless:Serverless platforms are cloud native and reactive to events by design. A reason they work so well in a cloud is because they communicate over HTTP APIs, are single- purpose functions, and are declarative in what they call. The platform also helps by making them scalable and accessible from within the cloud.

傳統應用程序中的反應式通訊一般是一種構建彈性的嘗試。若是應用程序在磁盤或消息隊列中寫入文件而後應用程序死亡，則仍能夠完成消息或文件的結果。

Reactive communication in traditional applications is often an attempt to build resiliency. If the application wrote a file on disk or into a message queue and then the application died, the result of the message or file could still be completed.

這並非說不該該使用消息隊列之類的技術，而是在動態且不斷出現故障的系統中不能依賴它們做爲惟一的彈性層。從根本上說，應用程序之間的通訊應該在雲原生環境中進行更改 - 不只由於還有其餘方法能夠構建通訊彈性（請參閱附錄A），還由於在雲中複製傳統通訊方法一般須要作更多工做。

This is not to say technologies like message queues should not be used, but rather that they cannot be relied on for the only layer of resiliency in a dynamic and constantly failing system. Fundamentally, the communication between applications should change in a cloud native environment—not only because there are other methods to build communication resiliency (see Appendix A), but also because it is often more work to replicate traditional communication methods in the cloud.

當應用程序能夠信任通訊的彈性時，它們應該中止響應式並開始聲明式。聲明式通訊確信網絡可以傳遞消息，還相信應用程序將返回成功或錯誤。這並非說讓應用程序注意變化並不重要。 Kubernetes的控制器就是這樣作的API服務器。可是，一旦找到更改，它們就會聲明一個新狀態並信任API服務器和kubelet來執行必要的操做。

When applications can trust the resiliency of the communication, they should stop reacting and start declaring. Declarative communication trusts that the network will deliver the message. It also trusts that the application will return a success or an error. This isn’t to say having applications watch for change is not important. Kubernetes’ controllers do exactly that to the API server. However, once change is found, they declare a new state and trust the API server and kubelets to do the necessary thing.

因爲許多緣由，聲明式通訊模型變得更加健壯。最重要的是，它將通訊模型標準化，並將一些功能實現從應用程序轉移到遠程API或服務端點，使其達到所需狀態。這有助於簡化應用程序，並使它們之間的行爲更具可預測性。

The declarative communication model turns out to be more robust for many reasons. Most importantly, it standardizes a communication model and it moves the functional implementation of how something gets to a desired state away from the application to a remote API or service endpoint. This helps simplify applications and allows them to behave more predictably with each other.

How Do Cloud Native Applications Impact Infrastructure?（雲原生應用程序如何影響基礎設施？）

您能夠知道雲本機應用程序不一樣於傳統應用程序。雲本機應用程序不受益於直接在iaas上運行或與服務器的操做系統緊密耦合。他們但願在動態環境中運行，其中大部分是自主系統。

Hopefully, you can tell that cloud native applications are different than traditional applications. Cloud native applications do not benefit from running directly on IaaS or being tightly coupled to a server’s operating system. They expect to be run in a dynamic environment with mostly autonomous systems.

雲原生基礎架構在IaaS之上建立了一個提供自主應用程序管理的平臺。該平臺創建在動態建立的基礎設施之上，以抽象出單個服務並促進動態資源分配調度。

Cloud native infrastructure creates a platform on top of IaaS that provides autonomous application management. The platform is built on top of dynamically created infrastructure to abstract away individual servers and promote dynamic resource allocation scheduling.
Automation is not the same thing as autonomous. Automation allows humans to have a bigger impact on the actions they take.
Cloud native is about autonomous systems that do not require humans to make decisions. It still uses automation, but only after deciding the action needed. Only when the system cannot automatically determine the right thing to do should it notify a human.

具備這些特徵的應用程序須要一個能夠實時監控，收集指標，而後在發生故障時作出反應的平臺。雲原生應用程序不依賴於人員來設置ping檢查或建立syslog規則。它們須要從選擇基本操做系統或包管理器中抽象出來的自助服務資源，而且它們依靠服務發現和強大的網絡通訊來提供功能豐富的體驗

Applications with these characteristics need a platform that can pragmatically monitor, gather metrics, and then react when failures occur. Cloud native applications do not rely on humans to set up ping checks or create syslog rules. They require self-service resources abstracted away from selecting a base operating system or package manager, and they rely on service discovery and robust network communication to provide a feature-rich experience.

Conclusion

運行雲原生應用程序所需的基礎設施與傳統應用程序不一樣。過去許多須要基礎設施處理的職責都已轉移到應用程序中。

The infrastructure required to run cloud native applications is different than traditional applications. Many responsibilities that infrastructure used to handle have moved into the applications.

雲原生應用程序經過分解爲更小的服務來簡化其代碼複雜性。這些服務提供監控，指標和直接構建到應用程序中的彈性。須要新的工具來自動化服務擴散和生命週期管理的管理。

Cloud native applications simplify their code complexity by decomposing into smaller services. These services provide monitoring, metrics, and resiliency built directly into the application. New tooling is required to automate the management of service proliferation and life cycle management.

基礎設施如今負責總體資源管理，動態編排，服務發現等等。它須要提供一個服務不依賴於單個組件而是依賴於API和自動系統的平臺。第2章更詳細地討論了雲原生基礎設施的功能。

The infrastructure is now responsible for holistic resource management, dynamic orchestration, service discovery, and much more. It needs to provide a platform where services don’t rely on individual components, but rather on APIs and autonomous systems. Chapter 2 discusses cloud native infrastructure features in more detail.

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。