{done}GTD190022: 【翻譯】Why we switched from Python to Go

https://getstream.io/blog/switched-python-go/javascript

【中文版】http://blog.csdn.net/dev_csdn/article/details/78386256html

 

Switching to a new language is always a big step, especially when only one of your team members has prior experience with that language. Early this year, we switched Stream’sprimary programming language from Python to Go. This post will explain some of the reasons why we decided to leave Python behind and make the switch to Go.java

切換到一門新語言總意味着是一大步,尤爲當你的團隊只有一位成員對此語言有過初步的經驗。今年早期,咱們把「Stream」的初步開發語言從Python轉到Go。下面這個帖子將說明咱們決心撇下Python,切換到Go的部分緣由。python

Reasons to Use Go

Reason 1 – Performance

Go is fast!

Go is extremely fast. The performance is similar to that of Java or C++. For our use case, Go is typically 30 times faster than Python. Here’s a small benchmark game comparing Go vs Java.git

緣由1 - 表現:
Go很快!
Go確實快。其表現相似於Java或C++。在咱們的使用狀況,Go一般比Python快30倍。如下是一個比較Go和Java遊戲的小參照。github

Reason 2 – Language Performance Matters

For many applications, the programming language is simply the glue between the app and the database. The performance of the language itself usually doesn’t matter much.golang

Stream, however, is an API provider powering the feed infrastructure for 500 companies and more than 200 million end users. We’ve been optimizing Cassandra, PostgreSQL, Redis, etc. for years, but eventually, you reach the limits of the language you’re using.web

Python is a great language but its performance is pretty sluggish for use cases such as serialization/deserialization, ranking and aggregation. We frequently ran into performance issues where Cassandra would take 1ms to retrieve the data and Python would spend the next 10ms turning it into objects.數據庫

緣由2 - 語言表現:
對於許多應用程序,編程語言只是應用程序和數據庫之間的粘合。 語言的表現一般並不重要。express

然而,Stream是500家公司和超過2億最終用戶的API提供商。 多年來,咱們一直在優化Cassandra,PostgreSQL,Redis等,但最終達到您所使用語言的極限。

Python是一種很棒的語言,可是它的性能對於用例來講是串行化/反序列化,排序和聚合很是緩慢。 Cassandra將須要1ms的時間來檢索數據,Python將花費下一個10ms將其轉換成對象。

Reason 3 – Developer Productivity & Not Getting Too Creative

Have a look at this little snippet of Go code from the How I Start Go tutorial. (This is a great tutoril and a good starting point to pick up a bit of Go.)

package main

type openWeatherMap struct{}

func (w openWeatherMap) temperature(city string) (float64, error) {
	resp, err := http.Get("http://api.openweathermap.org/data/2.5/weather?APPID=YOUR_API_KEY&q=" + city)
	if err != nil {
		return 0, err
	}

	defer resp.Body.Close()

	var d struct {
		Main struct {
			Kelvin float64 `json:"temp"`
		} `json:"main"`
	}

	if err := json.NewDecoder(resp.Body).Decode(&d); err != nil {
		return 0, err
	}

	log.Printf("openWeatherMap: %s: %.2f", city, d.Main.Kelvin)
	return d.Main.Kelvin, nil
}

If you’re new to Go, there’s not much that will surprise you when reading that little code snippet. It showcases multiple assignments, data structures, pointers, formatting and a built-in HTTP library.

When I first started programming I always loved using Python’s more advanced features. Python allows you to get pretty creative with the code you’re writing. For instance, you can:

  • Use MetaClasses to self-register classes upon code initialization
  • Swap out True and False
  • Add functions to the list of built-in functions
  • Overload operators via magic methods

These features are fun to play around with but, as most programmers will agree, they often make the code harder to understand when reading someone else’s work.

Go forces you to stick to the basics. This makes it very easy to read anyone’s code and immediately understand what’s going on.

Note: How 「easy」 it is really depends on your use case, of course. If you want to create a basic CRUD API I’d still recommend Django + DRF, or Rails.

緣由3 - 開發人員的生產力和創造力不夠

看看這個Go的小片斷。 (這是一個很棒的教程,也是一個很好的起點,能夠選擇一點Go)。

若是你剛用Go,那沒有太多的驚喜。它顯示多個分配,數據結構,指針,格式化和內置的HTTP庫。

當我第一次開始編程時,我一直喜歡使用Python的更高級的功能。 Python容許你本身寫。你的實例,你能夠:

在代碼初始化時使用MetaClasses來自我註冊類
交換真假
將功能添加到內置函數列表中
經過魔術方法重載運算符
這些功能很是有趣,但玩起來頗有趣。

Go使你遵照基礎知識,這使得很容易閱讀任何人的代碼,並當即瞭解發生了什麼。

注意:固然,「容易」真的取決於你的用例。若是你想建立一個基本的CRUD API,我仍然建議使用Django + DRF或者Rails。

Reason 4 – Concurrency & Channels

As a language, Go tries to keep things simple. It doesn’t introduce many new concepts. The focus is on creating a simple language that is incredibly fast and easy to work with. The only area where it does get innovative is goroutines and channels. (To be 100% correct the concept of CSP started in 1977, so this innovation is more of a new approach to an old idea.) Goroutines are Go’s lightweight approach to threading, and channels are the preferred way to communicate between goroutines.

Goroutines are very cheap to create and only take a few KBs of additional memory. Because Goroutines are so light, it is possible to have hundreds or even thousands of them running at the same time.

You can communicate between goroutines using channels. The Go runtime handles all the complexity. The goroutines and channel-based approach to concurrency makes it very easy to use all available CPU cores and handle concurrent IO – all without complicating development. Compared to Python/Java, running a function on a goroutine requires minimal boilerplate code. You simply prepend the function call with the keyword 「go」:

緣由4 - 併發與渠道

做爲一門語言,Go試圖讓事情變得簡單。它不引入了許多新的概念。重點是創建一個簡單的語言確實是使人難以置信的快速和易於使用。若是它確實得到創新的惟一領域是夠程和渠道。 ( 100%正確的CSP的概念在1977年啓動的,因此這種創新更多的是一種新的方法,以舊的觀念。)夠程是Go的輕量級的方法來穿線,和渠道是夠程之間溝通的首選方式。

夠程是很是便宜的建立,只須要額外的內存幾KB。由於夠程是如此之輕,有可能有狗紅魔甚至數千人在同一時間運行。

您可使用渠道夠程之間的通訊。 Go運行時處理全部的複雜性。該夠程和基於信道來實現併發性使得它很是容易使用全部可用的CPU內核和處理併發IO - 無需複雜的發展。相比到Python / Java的,運行在一個夠程的功能只須要不多的樣板代碼。您只需預先設置的函數調用使用關鍵字「Go」:

package main

import (
	"fmt"
	"time"
)

func say(s string) {
	for i := 0; i < 5; i++ {
		time.Sleep(100 * time.Millisecond)
		fmt.Println(s)
	}

}

func main() {
	go say("world")
	say("hello")
}

https://tour.golang.org/concurrency/1

Go’s approach to concurrency is very easy to work with. It’s an interesting approach compared to Node where the developer has to pay close attention to how asynchronous code is handled.

Another great aspect of concurrency in Go is the race detector. This makes it easy to figure out if there are any race conditions within your asynchronous code.

Go的併發方法很容易使用。 對於處理異步代碼的Node來講,這是一個有趣的方法。

Go的另外一個很大的方面是Go是賽車探測器。 這使得很容易弄清楚異步代碼中是否存在任何競爭條件。

 

Here are a few good resources to get started with Go and channels:

Reason 5 – Fast Compile Time

Our largest micro service written in Go currently takes 6 seconds to compile. Go’s fast compile times are a major productivity win compared to languages like Java and C++ which are famous for sluggish compilation speed. I like sword fighting, but it’s even nicer to get things done while I still remember what the code is supposed to do:

緣由5 - 快速編譯時間

咱們最大的微服務目前正在運行6秒編譯。 去吧,這是全部關於它。 我喜歡劍術,但更好的是完成任務。

XKCD – Code compiling before Go

Reason 6 – The Ability to Build a Team

First of all, let’s start with the obvious: there are not as many Go developers compared to older languages like C++ and Java. According to StackOverflow, 38% of developers know Java, 19.3% know C++ and only 4.6% know Go. GitHub data shows a similar trend: Go is more widely used than languages such as Erlang, Scala and Elixir, but less popular than Java and C++.

Fortunately, Go is a very simple and easy to learn language. It provides the basic features you need and nothing else. The new concepts it introduces are the 「defer」 statement and built-in management of concurrency with 「go routines」 and channels. (For the purists: Go isn’t the first language to implement these concepts, just the first to make them popular.) Any Python, Elixir, C++, Scala or Java dev that joins a team can be effective at Go within a month because of its simplicity.

We’ve found it easier to build a team of Go developers compared to many other languages. If you’re hiring people in competitive ecosystems like Boulder and Amsterdam this is an important benefit.

緣由6 - 創建團隊的能力

首先,咱們從明顯的開始:沒有不少人C ++和Java。根據StackOverflow,38%的開發人員知道Java,19.3%的人知道C ++,只有4.6%的人知道Go。 GitHub數據顯示了相似的趨勢:Go不只僅用做Erlang,Scala和Elixir等語言,而是比Java和C ++更流行。

幸運的是,Go是一種很是簡單易學的語言。它提供您所須要的基本功能,沒有其餘的。新的概念是「延遲」語句和內置的並行管理與「去往例程」和渠道。 (對於純粹主義者:Go不是實現這些概念的第一種語言,只是第一種使其流行的語言。)任何Python,Elixir,C ++,Scala或Java的簡單性。

咱們發現更容易創建一個開發團隊。若是您對諸如博爾德和阿姆斯特丹這樣的競爭性生態系統感興趣,這是一個重要的好處。

Reason 7 – Strong Ecosystem

For a team of our size (~20 people) the ecosystem matters. You simply can’t create value for your customers if you have to reinvent every little piece of functionality. Go has great support for the tools we use. Solid libraries were already available for Redis, RabbitMQ, PostgreSQL, Template parsing, Task scheduling, Expression parsing and RocksDB.

Go’s ecosystem is a major win compared to other newer languages like Rust or Elixir. It’s of course not as good as languages like Java, Python or Node, but it’s solid and for many basic needs you’ll find high-quality packages already available.

緣由7 - 強大的生態系統

對於一個咱們規模的團隊(約20人),生態系統很重要。 您根本沒法爲客戶創造價值。 Go對咱們使用的工具備很大的支持。 Redis,RabbitMQ,PostgreSQL,模板解析,任務計劃,表達式解析和RocksDB已經可使用實體庫。

Go的生態系統是一個主要的勝利。 Java,Python或Node,但若是不是,則可有可無。

Reason 8 – Gofmt, Enforced Code Formatting

Let’s start with what is Gofmt? And no, it’s not a swear word. Gofmt is an awesome command line utility, built into the Go compiler for formatting your code. In terms of functionality it’s very similar to Python’s autopep8. While the show Silicon Valley portrays otherwise, most of us don’t really like to argue about tabs vs spaces. It’s important that formatting is consistent, but the actual formatting standard doesn’t really matter all that much. Gofmt avoids all of this discussion by having one official way to format your code.

緣由8 - Gofmt,強制代碼格式化

咱們從Gofmt開始吧? 不,這不是一個發誓的話。 Gofmt是一個使人敬畏的命令行實用程序,內置於Go編譯器中,用於格式化代碼。 在功能方面,它很是相似於Python的autopep8。 而硅谷的展現除此以外,咱們大多數人並不喜歡爭論標籤與空格。 格式化是一致的,但實際的格式化標準並不重要。 Gofmt經過一種正式的格式化代碼來避免全部這些討論。

Reason 9 – gRPC and Protocol Buffers

Go has first-class support for protocol buffers and gRPC. These two tools work very well together for building microservices which need to communicate via RPC. You only need to write a manifest where you define the RPC calls that can be made and what arguments they take. Both server and client code are then automatically generated from this manifest. This resulting code is both fast, has a very small network footprint and is easy to use.

From the same manifest, you can generate client code for many different languages even, such as C++, Java, Python and Ruby. So, no more ambiguous REST endpoints for internal traffic, that you have to write almost the same client and server code for every time. .

緣由9 - gRPC和協議緩衝區

Go對協議緩衝區和gRPC有一流的支持。 這兩個工具一塊兒工做,構建須要經過RPC進行通訊的微服務器。 您只須要編寫RPC的表現形式。 服務器和客戶端代碼都將今後清單自動生成。 這樣產生的代碼既快速,網絡佔用也很小,易於使用。

從相同的表現能夠建立C ++,Java,Python和Ruby。 所以,內部流量的REST端點不會更加模糊,您每次都必須編寫相同的客戶端和服務器代碼。

Disadvantages of Using Golang

Disadvantage 1 – Lack of Frameworks

Go doesn’t have a single dominant framework like Rails for Ruby, Django for Python or Laravel for PHP. This is a topic of heated debate within the Go community, as many people advocate that you shouldn’t use a framework to begin with. I totally agree that this is true for some use cases. However, if someone wants to build a simple CRUD API they will have a much easier time with Django/DJRF, Rails Laravel or Phoenix.

缺點1 - 缺少框架

Go沒有一個主要的框架,如Rails for Ruby,Django for Python或Larvel for PHP。 這是社會上激烈辯論的話題,正如不少人所倡導的那樣。 我很高興服務。 然而,若是有人想要構建一個簡單的CRUD API,那麼Django / DJRF,Rails Laravel或Phoenix將會更容易一些。

Disadvantage 2 – Error Handling

Go handles errors by simply returning an error from a function and expecting your calling code to handle the error (or to return it up the calling stack). While this approach works, it’s easy to lose scope of what went wrong to ensure you can provide a meaningful error to your users. The errors package solves this problem by allowing you to add context and a stack trace to your errors.

Another issue is that it’s easy to forget to handle an error by accident. Static analysis tools like errcheck and megacheck are handy to avoid making these mistakes.

While these workarounds work well it doesn’t feel quite right. You’d expect proper error handling to be supported by the language.

缺點2 - 錯誤處理

轉到頁面頂部轉到頁面頂部發送私人訊息轉到頁面頂部 雖然這種方法有效,但很容易解決問題。 錯誤包經過容許您向您的錯誤添加上下文和堆棧跟蹤來解決此問題。

另外一個問題是,很容易忘記處理錯誤。 靜態分析工具,如errcheck和megacheck。

Disadvantage 3 – Package Management

Go’s package management is by no means perfect. By default, it doesn’t have a way to specify a specific version of a dependency and there’s no way to create reproducible builds. Python, Node and Ruby all have better systems for package management. However, with the right tools, Go’s package management works quite well.

You can use Dep to manage your dependencies to allow specifying and pinning versions. Apart from that, we’ve contributed an open-source tool called VirtualGo which makes it easier to work on multiple projects written in Go.

缺點3 - 軟件包管理

Go的包管理絕非完美。 默認狀況下,它沒有辦法指定特定版本的依賴項,而且沒法建立可重複構建。 Python,Node和Ruby。 然而,使用正確的工具,Go的軟件包管理工做至關不錯。

您可使用它來管理依賴項,以容許指定和固定版本。 除此以外,咱們添加了一個名爲VirtualGo的開源工具,能夠輕鬆地在Go中編寫的多個項目上工做。

Virtual Go

 

Python vs Go

One interesting experiment we conducted was taking our ranked feed functionality in Python and rewriting it in Go. Have a look at this example of a ranking method:

{
	"functions": {
		"simple_gauss": {
			"base": "decay_gauss",
			"scale": "5d",
			"offset": "1d",
			"decay": "0.3"
		},
		"popularity_gauss": {
			"base": "decay_gauss",
			"scale": "100",
			"offset": "5",
			"decay": "0.5"
		}
	},
	"defaults": {
		"popularity": 1
	},
	"score": "simple_gauss(time)*popularity"
}

Both the Python and Go code need to do the following to support this ranking method:

  1. Parse the expression for the score. In this case, we want to turn this string 「simple_gauss(time)*popularity」 into a function that takes an activity as input and returns a score as output.
  2. Create partial functions based on the JSON config. For example, we want 「simple_gauss」 to call 「decay_gauss」 with a scale of 5 days, offset of 1 day and a decay factor of 0.3.
  3. Parse the 「defaults」 configuration so you have a fallback if a certain field is not defined on an activity.
  4. Use the function from step 1 to score all activities in the feed.

Developing the Python version of the ranking code took roughly 3 days. That includes writing the code, unit tests and documentation. Next, we’ve spent approximately 2 weeks optimizing the code. One of the optimizations was translating the score expression (simple_gauss(time)*popularity) into an abstract syntax tree. We also implemented caching logic which pre-computed the score for certain times in the future.

In contrast, developing the Go version of this code took roughly 4 days. The performance didn’t require any further optimization. So while the initial bit of development was faster in Python, the Go based version ultimately required substantially less work from our team. As an added benefit, the Go code performed roughly 40 times faster than our highly-optimized Python code.

Now, this is just a single example of the performance gains we’ve experienced by switching to Go. It is, of course, comparing apples to oranges:

  • The ranking code was my first project in Go
  • The Go code was built after the Python code, so the use case was better understood
  • The Go library for expression parsing was of exceptional quality

Your mileage will vary. Some other components of our system took substantially more time to build in Go compared to Python. As a general trend, we see that developing Go code takes slightly more effort. However, we spend much less time optimizing the code for performance.

Python和Go代碼都是:

解析得分的表達式。在這種狀況下,咱們想把這個字符串「simple_gauss(time)* popularity」變成一個將一個活動做爲輸入並返回一個分數做爲輸出的函數。
基於JSON配置建立部分功能。例如,咱們想要「simple_gauss」以5天的比例調用「decay_gauss」,1天的偏移量和0.3的衰減因子。
解析「默認值」配置,以便在活動上未定義某個字段時具備回覆。
使用步驟1的功能得分。
開發Python版本的排名代碼大約須要3天。包括編寫代碼,單元測試和文檔。接下來,咱們花了兩個星期優化代碼。其中一個優化是將分數表達式(simple_gauss(time)* popularity)轉換爲抽象語法樹。咱們還實現了未來某些時間預先計算分數的緩存邏輯。

相比之下,開發此代碼的Go版本大約須要4天。性能無需進一步優化。因此Python的第一步開發速度更快,基於Go的版本。做爲一個額外的好處,代碼比咱們高度優化的Python代碼快40倍。

如今,這只是性能提高的一個簡單例子。固然,比較蘋果和橙子:

排名代碼是我在Go的第一個項目
Python代碼,因此用例更好的理解了
Go圖書館用於表達解析什麼是卓越的品質
你的里程會有所不一樣。咱們系統的其餘組件或多或少依賴於時間。做爲一個大趨勢,咱們看到發展。可是,咱們花費更少的時間來優化代碼以實現性能。

Elixir vs Go – The Runner Up

Another language we evaluated is Elixir. Elixir is built on top of the Erlang virtual machine. It’s a fascinating language and we considered it since one of our team members has a ton of experience with Erlang.

For our use cases, we noticed that Go’s raw performance is much better. Both Go and Elixir will do a great job serving thousands of concurrent requests. However, if you look at individual request performance, Go is substantially faster for our use case. Another reason why we chose Go over Elixir was the ecosystem. For the components we required, Go had more mature libraries whereas, in many cases, the Elixir libraries weren’t ready for production usage. It’s also harder to train/find developers to work with Elixir.

These reasons tipped the balance in favor of Go. The Phoenix framework for Elixir looks awesome though and is definitely worth a look.

Elixir vs. Go - The Runner Up

咱們評估的另外一種語言是Elixir。 Elixir構建在Erlang虛擬機的頂部。 這是一個迷人的語言,咱們有不少Erlang的經驗。

對於咱們的用例,咱們注意到這是一個很好的表現。 Go和Elixir都想作一個很好的工做。 可是,若是您查看我的請求的性能, 咱們選擇去Elixir的另外一個緣由是生態系統。 對於咱們須要的組件,我想有更多成熟的庫,在許多狀況下,Elixir庫還沒有準備好用於生產。 培訓/發現開發人員也很難與Elixir合做。

這些緣由致使了Go的平衡。 鳳凰框架的Elixir看起來很棒,但絕對值得一看。

Conclusion

Go is a very performant language with great support for concurrency. It is almost as fast as languages like C++ and Java. While it does take a bit more time to build things using Go compared to Python or Ruby, you’ll save a ton of time spent on optimizing the code.

We have a small development team at Stream powering the feeds for over 200 million end users. Go’s combination of a great ecosystem, easy onboarding for new developers,fast performance, solid support for concurrency and a productive programming environment make it a great choice.

Stream still leverages Python for our dashboard, site and machine learning forpersonalized feeds. We won’t be saying goodbye to Python anytime soon, but going forward all performance-intensive code will be written in Go.

If you want to learn more about Go check out the blog posts listed below. To learn more about Stream, this interactive tutorial is a great starting point.

結論
Go是一種很是強大的語言,極大地支持併發。 它幾乎和C ++和Java這樣的語言同樣快。 若是不是,您將可使用它。

Stream擁有一個小型開發團隊,爲超過2億的最終用戶供電。 這是充分利用系統的最佳方式。

流仍然利用Python爲咱們的儀表板,站點和機器學習個性化的飼料。 咱們不會在任什麼時候候再說Python,可是前進的全部性能密集型代碼都將寫在Go中。

若是您想了解更多關於Go,請查看下面列出的博文。 要了解有關Stream的更多信息,本交互式教程是一個很好的起點。

More Reading about Switching to Golang

Learning Go

Tags: gogolanggrpcmicro servicesperformancepythonscalability

Post navigation

  • ksandvik

    dep will be part of Go 1.10. https://github.com/golang/dep/wiki/Roadmap

    • Thierry Schellenbach

      that’s awesome

  • shuoli84

    Your code changed a lot and heavy business logic? Python
    Performance is critical and without much c dependency? Go

    • Hasen

      If your code changes a lot, python is a poor choice. Dynamic typing makes refactoring difficult.

      • In that cases we need rely heavily in unit tests and unfortunately
        create good testes covering all cases is not so easy. A good type
        system, like ML family languages, however make this more easy to
        accomplish.

        I like Python, but Dynamic Typing it sucks
        sometimes. But I like to enforce the usage of 3.5+ with [typing
        hint](https://www.python.org/dev/peps/pep-0484/). That way you can run
        OPTIONALLY your code statically typed through
        [mypy](http://mypy-lang.org/) and see whats happens.

  • great post and glad Go is working well in your team, but this line:

    > there’s no way to create reproducible builds

    we have been using https://github.com/FiloSottile/gvt and the vendor folder for over 2 years, our builds are 100% reproducible, checking in dependencies hasn’t been an issue for us.

    • Jonathan

      Folks starting anew should use https://github.com/golang/dep, which is almost the official tool (slated to be official in Go 1.10). 100% reproducible for sure, check in your vendor dependencies too.

  • LàTrinius Washington

    Go is an awesome language, but here are my gripes/wishes:

    1. I wish the garbage collection was optional or could be turned off for programs that allocate all data statically or manage dynamic memory themselves via RAII or other means. Such a feature would eliminate languages like Rust, C, C++ from even being considered on a project.

    2. I wish the Go library ecosystem was as large (or even a tenth as large) and as rich as Perl’s CPAN. Speaking of which, I wish Go had an independent third-party system such as CPAN so that we’re not forced to rely only on Google’s code libraries or random libraries written by individuals and scrounged from around the web.

    If the above two wishes were addressed, Go would become the perfect language.

  • Diego Jancic

    Hey. Great article! Quick question though, how do you do the connection between Python and Go? gRPC and protobuf? Thanks

  • It’s worth noting that Buffalo is pretty far along in providing a Rails/Phoenix/Django experience for building out MVC and APIs in Go:

    https://gobuffalo.io/

    It is not 1.0 yet, which will happen when the lead developer feels that he can provide API stability guarantees, but it is pretty feature-complete. Well worth a look if you are interested in Go, but don’t find 「just use the standard library」 a satisfactory answer.

  • Paddy3118

    It would be great to get an update post in a year’s time.

相關文章
相關標籤/搜索