如何提升代碼的可讀性學習筆記

時間 2021-02-20

標籤 python mysql c++ 算法 sql 設計模式 api 性能優化數據結構多線程欄目 Python 简体版

原文原文鏈接

本文整理自 taowen 師傅在滴滴內部的分享。

1.Why

對一線開發人員來講，天天工做內容大可能是在已有項目的基礎上繼續堆代碼。當項目實在堆不動時就須要尋找收益來重構代碼。既然咱們的大多數時間都花在坐在顯示器前讀寫代碼這件事上，那可讀性很差的代碼都是在謀殺本身or同事的生命，因此不如一開始就提煉技巧，努力寫好代碼; )python

2.How

爲提升代碼可讀性，先來分析代碼實際運行環境。代碼實際運行於兩個地方：cpu和人腦。對於cpu，代碼優化需理解其工做機制，寫代碼時爲針對cpu特性進行優化；對於人腦，咱們在讀代碼時，它像解釋器同樣，一行一行運行代碼，從這個角度來講，要提升代碼的可讀性首先須要知道大腦的運行機制。mysql

下面來看一下人腦適合作的事情和不適合作的事情：c++

大腦擅長作的事情

名稱	圖片	說明
對象識別		不一樣於機器學習看無數張貓片以後可能仍是不能準確識別貓這個對象，人腦在看過幾只貓以後就能夠很好的識別。
空間分解		人腦不須要標註，能夠直觀感覺到空間中的不一樣物體。
時序預測		你的第一感受是否是這個哥們要被車撞了？
時序記憶		做爲人類生存本能之一，咱們屢次走過某個地方時，人腦會對這個地方造成記憶。
類比推測		人腦還有類比功能，好比說這道題大多數人會選擇C吧。

大腦不擅長作的事情

名稱	圖片	例子
沒法映射到現實生活經驗的抽象概念		人腦看到左圖時，會比較輕鬆想到通關方式，可是若是換成右圖這種抽象的概念，裏面的對象換成了嘿嘿的像素，咱們就不知道這是什麼鬼了。好比說代碼裏若是充斥着Z,X,C,V 這樣的變量名，你可能就看懵了。
冗長的偵探推理		這種須要遞歸(or循環)去檢查全部可能性最後找到解法的場景，人腦一樣不擅長。
跟蹤多個同時變化的過程		大腦是個單線程的CPU，不擅長左手畫圓，右手畫圈。

代碼優化理論

瞭解人腦的優缺點後，寫代碼時就能夠根據人腦的特色對應改善代碼的可讀性了。這裏提取出三種理論：算法

Align Models ，匹配模型：代碼中的數據和算法模型應和人腦中的心智模型對應
Shorten Process ，簡短處理：寫代碼時應縮短「福爾摩斯探案集」的流程長度，即不要寫大段代碼
Isolate Process，隔離處理：寫代碼一個流程一個流程來處理，不要同時描述多個流程的演進過程

下面經過例子詳細解釋這三種模型：sql

Align Models

在代碼中，模型無外乎就是數據結構與算法，而在人腦中，對應的是心智模型，所謂心智模型就是人腦對於一個物體 or 一件事情的想法，咱們平時說話就是心智模型的外在表現。寫代碼時應把代碼中的名詞與現實名詞對應起來，減小人腦從需求文檔到代碼的映射成本。好比對於「銀行帳戶」這個名詞，不少變量名均可以體現這個詞，好比：bankAccount、bank_account、account、BankAccount、BA、bank_acc、item、row、record、model，編碼中應統一使用和現實對象能連接上的變量名。設計模式

代碼命名技巧

起變量名時候取其實際含義，不必隨便寫個變量名而後在註釋裏面偷偷用功。api

// bad
var d int // elapsed time in days

// good
var elapsedTimeInDays int // 全局使用

起函數名動詞+名詞結合，還要注意標識出你的自定義變量類型：性能優化

// bad
func getThem(theList [][]int) [][]int {
    var list1 [][]int // list1是啥，不知道
    for _, x := range theList {
        if x[0] == 4 { // 4是啥，不知道
            list1 = append(list1, x)
        }
    }
    return list1
}

// good
type Cell []int // 標識[]int做用

func (cell Cell) isFlagged() bool { // 說明4的做用
    return cell[0] == 4
}

func getFlaggedCells(gameBoard []Cell) []Cell { // 起有意義的變量名
    var flaggedCells []Cell
    for _, cell := range gameBoard {
        if cell.isFlagged() {
            flaggedCells = append(flaggedCells, cell)
        }
    }
    return flaggedCells
}

代碼分解技巧

按照空間分解(Spatial Decomposition)：下面這塊代碼都是與Page相關的邏輯，仔細觀察能夠根據page的空間分解代碼：數據結構

// bad
// …then…and then … and then ... // 平鋪直敘描述整個過程
func RenderPage(request *http.Request) map[string]interface{} {
    page := map[string]interface{}{}
    name := request.Form.Get("name")
    page["name"] = name
    urlPathName := strings.ToLower(name)
    urlPathName = regexp.MustCompile(`['.]`).ReplaceAllString(
        urlPathName, "")
    urlPathName = regexp.MustCompile(`[^a-z0-9]+`).ReplaceAllString(
        urlPathName, "-")
    urlPathName = strings.Trim(urlPathName, "-")
    page["url"] = "/biz/" + urlPathName
    page["date_created"] = time.Now().In(time.UTC)
    return page
}

// good
// 按空間分解，這樣的好處是能夠集中精力到關注的功能上
var page = map[string]pageItem{
    "name":         pageName,
    "url":          pageUrl,
    "date_created": pageDateCreated,
}

type pageItem func(*http.Request) interface{}

func pageName(request *http.Request) interface{} { // name 相關過程
    return request.Form.Get("name")
}

func pageUrl(request *http.Request) interface{} { // URL 相關過程
    name := request.Form.Get("name")
    urlPathName := strings.ToLower(name)
    urlPathName = regexp.MustCompile(`['.]`).ReplaceAllString(
        urlPathName, "")
    urlPathName = regexp.MustCompile(`[^a-z0-9]+`).ReplaceAllString(
        urlPathName, "-")
    urlPathName = strings.Trim(urlPathName, "-")
    return "/biz/" + urlPathName
}

func pageDateCreated(request *http.Request) interface{} { // Date 相關過程
    return time.Now().In(time.UTC)
}

按照時間分解(Temporal Decomposition)：下面這塊代碼把整個流程的算帳和打印帳單混寫在一塊兒，能夠按照時間順序對齊進行分解：多線程

// bad 
func (customer *Customer) statement() string {
    totalAmount := float64(0)
    frequentRenterPoints := 0
    result := "Rental Record for " + customer.Name + "\n"

    for _, rental := range customer.rentals {
        thisAmount := float64(0)
        switch rental.PriceCode {
        case REGULAR:
            thisAmount += 2
        case New_RELEASE:
            thisAmount += rental.rent * 2
        case CHILDREN:
            thisAmount += 1.5
        }
        frequentRenterPoints += 1
        totalAmount += thisAmount
    }
    result += strconv.FormatFloat(totalAmount,'g',10,64) + "\n"
    result += strconv.Itoa(frequentRenterPoints)

    return result
}

// good 邏輯分解後的代碼
func statement(custom *Customer) string {
    bill := calcBill(custom)

    statement := bill.print()

    return statement
}

type RentalBill struct {
    rental Rental
    amount float64
}

type Bill struct {
    customer             *Customer
    rentals              []RentalBill
    totalAmount          float64
    frequentRenterPoints int
}

func calcBill(customer *Customer) Bill {

    bill := Bill{}
    for _, rental := range customer.rentals {
        rentalBill := RentalBill{
            rental: rental,
            amount: calcAmount(rental),
        }
        bill.frequentRenterPoints += calcFrequentRenterPoints(rental)
        bill.totalAmount += rentalBill.amount
        bill.rentals = append(bill.rentals, rentalBill)
    }
    return bill
}

func (bill Bill) print() string {

    result := "Rental Record for " + bill.customer.name + "(n"

    for _, rental := range bill.rentals{
        result += "\t" + rental.movie.title + "\t" +
            strconv.FormatFloat(rental.amount, 'g', 10, 64) + "\n"
    }
    

    result += "Amount owed is " +
        strconv.FormatFloat(bill.totalAmount, 'g', 10, 64) + "\n"

    result += "You earned + " +
        strconv.Itoa(bill.frequentRenterPoints) + "frequent renter points"

    return result
}

func calcAmount(rental Rental) float64 {
    thisAmount := float64(0)
    switch rental.movie.priceCode {
    case REGULAR:
        thisAmount += 2
        if rental.daysRented > 2 {
            thisAmount += (float64(rental.daysRented) - 2) * 1.5
        }
    case NEW_RELEASE:
        thisAmount += float64(rental.daysRented) * 3
    case CHILDRENS:
        thisAmount += 1.5
        if rental.daysRented > 3 {
            thisAmount += (float64(rental.daysRented) - 3) * 1.5
        }
    }
    return thisAmount
}

func calcFrequentRenterPoints(rental Rental) int {
    frequentRenterPoints := 1
    switch rental.movie.priceCode {
    case NEW_RELEASE:
        if rental.daysRented > 1 {
            frequentRenterPointst++
        }
    }
    return frequentRenterPoints
}

按層分解(Layer Decomposition)：

// bad
func findSphericalClosest(lat float64, lng float64, locations []Location) *Location {
    var closest *Location
  closestDistance := math.MaxFloat64
  for _, location := range locations {
    latRad := radians(lat)
    lngRad := radians(lng)
    lng2Rad := radians(location.Lat)
    lng2Rad := radians(location.Lng)
    var dist = math.Acos(math.Sin(latRad) * math.Sin(lat2Rad) +  
                         math.Cos(latRad) * math.Cos(lat2Rad) *
                         math.Cos(lng2Rad - lngRad) 
                        )
    if dist < closestDistance {
            closest = &location
      closestDistance = dist
    }
  }
    return closet
}

// good
type Location struct {
}

type compare func(left Location, right Location) int

func min(objects []Location, compare compare) *Location {
    var min *Location
    for _, object := range objects {
        if min == nil {
            min = &object
            continue
        }
        if compare(object, *min) < 0 {
            min = &object
        }
    }
    return min
}

func findSphericalClosest(lat float64, lng float64, locations []Location) *Location {
    isCloser := func(left Location, right Location) int {
        leftDistance := rand.Int()
        rightDistance := rand.Int()
        if leftDistance < rightDistance {
            return -1
        } else {
            return 0
        }
    }
    closet := min(locations, isCloser)
    return closet
}

註釋

註釋不該重複代碼的工做。應該去解釋代碼的模型和心智模型的映射關係，應說明爲何要使用這個代碼模型，下面的例子就是反面教材:

// bad
/** the name. */
var name string
/** the version. */
var Version string
/** the info. */
var info string

// Find the Node in the given subtree, with the given name, using the given depth.
func FindNodeInSubtree(subTree *Node, name string, depth *int) *Node {
}

下面的例子是正面教材:

// Impose a reasonable limit - no human can read that much anyway
const MAX_RSS_SUBSCRIPTIONS = 1000

// Runtime is O(number_tags * average_tag_depth), 
// so watch out for badly nested inputs.
func FixBrokenHTML(HTML string) string {
    // ...
}

Shorten Process

Shorten Process的意思是要縮短人腦「編譯代碼」的流程。應該避免寫出像小白鼠走迷路同樣又長又繞的代碼。所謂又長又繞的代碼表如今，跨表達式跟蹤、跨多行函數跟蹤、跨多個成員函數跟蹤、跨多個文件跟蹤、跨多個編譯單元跟蹤，甚至是跨多個代碼倉庫跟蹤。

對應的手段能夠有：引入變量、拆分函數、提前返回、縮小變量做用域，這些方法最終想達到的目的都是讓大腦喘口氣，不要一口氣跟蹤過久。一樣來看一些具體的例子：

例子

下面的代碼，多種複合條件組合在一塊兒，你看了半天繞暈了可能也沒看出到底什麼狀況下爲true，什麼狀況爲false。

// bad
func (rng *Range) overlapsWith(other *Range) bool {
    return (rng.begin >= other.begin && rng.begin < other.end) ||
        (rng.end > other.begin && rng.end <= other.end) ||
        (rng.begin <= other.begin && rng.end >= other.end)
}

可是把狀況進行拆解，每種條件進行單獨處理。這樣邏輯就很清晰了。

// good
func (rng *Range) overlapsWith(other *Range) bool {
    if other.end < rng.begin {
        return false // they end before we begin 
    }    
    if other.begin >= rng.end {
        return false // they begin after we end 
    }
  return true // Only possibility left: they overlap
}

再來看一個例子，一開始你寫代碼的時候，可能只有一個if ... else...，後來PM讓加一下權限控制，因而你能夠開心的在if裏繼續套一層if，補丁打完，開心收工，因而代碼看起來像這樣：

// bad 多層縮進的問題
func handleResult(reply *Reply, userResult int, permissionResult int) {
  if userResult == SUCCESS {
    if permissionResult != SUCCESS {
      reply.WriteErrors("error reading permissions")
        reply.Done()
        return
    }
    reply.WriteErrors("")
  } else {
    reply.WriteErrors("User Result")
  }
  reply.Done()
}

這種代碼也比較好改，通常反向寫if條件返回判否邏輯便可：

// good
func handleResult(reply *Reply, userResult int, permissionResult int) {
  defer reply.Done()
  if userResult != SUCCESS {
    reply.WriteErrors("User Result")
    return 
  }
  if permissionResult != SUCCESS {
    reply.WriteErrors("error reading permissions")
    return
  }
  reply.WriteErrors("")
}

這個例子的代碼問題比較隱晦，它的問題是全部內容都放在了MooDriver這個對象中。

// bad
type MooDriver struct {
    gradient Gradient
  splines []Spline
}
func (driver *MooDriver) drive(reason string) {
  driver.saturateGradient()
  driver.reticulateSplines()
  driver.diveForMoog(reason)
}

比較好的方法是儘量減小全局scope，而是使用上下文變量進行傳遞。

// good 
type ExplicitDriver struct {
  
}

// 使用上下文傳遞
func (driver *MooDriver) drive(reason string) {
  gradient := driver.saturateGradient()
  splines := driver.reticulateSplines(gradient)
  driver.diveForMoog(splines, reason)
}

Isolate Process

人腦缺陷是不擅長同時跟蹤多件事情，若是」同時跟蹤「事物的多個變化過程，這不符合人腦的構造；可是若是把邏輯放在不少地方，這對大腦也不友好，由於大腦須要」東拼西湊「才能把一塊邏輯看全。因此就有了一句很經典的廢話，每一個學計算機的大學生都聽過。你的代碼要作到高內聚，低耦合，這樣就牛逼了！-_-|||，可是你要問說這話的人什麼叫高內聚，低耦合呢，他可能就得琢磨琢磨了，下面來經過一些例子來琢磨一下。

首先先來玄學部分，若是你的代碼寫成下面這樣，可讀性就不會很高。

通常狀況下，咱們能夠根據業務場景努力把代碼修改爲這樣：

舉幾個例子，下面這段代碼很是常見，裏面version的含義是用戶端上不一樣的版本須要作不一樣的邏輯處理。

func (query *Query) doQuery() {
  if query.sdQuery != nil {
    query.sdQuery.clearResultSet()
  }
  // version 5.2 control
  if query.sd52 {
    query.sdQuery = sdLoginSession.createQuery(SDQuery.OPEN_FOR_QUERY)
  } else {
    query.sdQuery = sdSession.createQuery(SDQuery.OPEN_FOR_QUERY)
  }
  query.executeQuery()
}

這段代碼的問題是因爲版本差別多塊代碼流程邏輯Merge在了一塊兒，形成邏輯中間有分叉現象。處理起來也很簡單，封裝一個adapter，把版本邏輯抽出一個interface，而後根據版本實現具體的邏輯。

再來看個例子，下面代碼中根據expiry和maturity這樣的產品邏輯不一樣 也會形成分叉現象，因此你的代碼會寫成這樣：

// bad
type Loan struct {
    start    time.Time
    expiry   *time.Time
    maturity *time.Time
    rating   int
}

func (loan *Loan) duration() float64 {
    if loan.expiry == nil {
        return float64(loan.maturity.Unix()-loan.start.Unix()) / 365 * 24 * float64(time.Hour)
    } else if loan.maturity == nil {
        return float64(loan.expiry.Unix()-loan.start.Unix()) / 365 * 24 * float64(time.Hour)
    }
    toExpiry := float64(loan.expiry.Unix() - loan.start.Unix())
    fromExpiryToMaturity := float64(loan.maturity.Unix() - loan.expiry.Unix())
    revolverDuration := toExpiry / 365 * 24 * float64(time.Hour)
    termDuration := fromExpiryToMaturity / 365 * 24 * float64(time.Hour)
    return revolverDuration + termDuration
}

func (loan *Loan) unusedPercentage() float64 {
    if loan.expiry != nil && loan.maturity != nil {
        if loan.rating > 4 {
            return 0.95
        } else {
            return 0.50
        }
    } else if loan.maturity != nil {
        return 1
    } else if loan.expiry != nil {
        if loan.rating > 4 {
            return 0.75
        } else {
            return 0.25
        }
    }
    panic("invalid loan")
}

解決多種產品邏輯的最佳實踐是Strategy pattern，代碼入下圖，根據產品類型建立出不一樣的策略接口，而後分別實現duration和unusedPercentage這兩個方法便可。

// good
type LoanApplication struct {
    expiry   *time.Time
    maturity *time.Time
}

type CapitalStrategy interface {
    duration() float64
    unusedPercentage() float64
}

func createLoanStrategy(loanApplication LoanApplication) CapitalStrategy {
    if loanApplication.expiry != nil && loanApplication.maturity != nil {
        return createRCTL(loanApplication)
    }
    if loanApplication.expiry != nil {
        return createRevolver(loanApplication)
    }
    if loanApplication.maturity != nil {
        return createTermLoan
    }
    panic("invalid loan application")
}

可是現實狀況沒有這麼簡單，由於不一樣事物在你眼中就是多進程多線程運行的，好比上面產品邏輯的例子，雖然經過一些設計模式把執行的邏輯隔離到了不一樣地方，可是代碼中只要含有多種產品，代碼在執行時仍是會有一個產品選擇的過程。邏輯發生在同一時間、同一空間，因此「天然而然」就須要寫在了一塊兒：

功能展現時，因爲須要展現多種信息，會形成 concurrent process
寫代碼時，業務包括功能性和非功能性需求，也包括正常邏輯和異常邏輯處理
考慮運行效率時，爲提升效率咱們會考慮異步I/O、多線程/協程
考慮流程複用時，因爲版本差別和產品策略也會形成merged concurrent process

對於多種功能雜糅在一塊兒，好比上面的RenderPage函數，對應解法爲不要把全部事情合在一塊兒搞，把單塊功能內聚，總體再耦合成爲一個單元。

對於多個同步進行的I/O操做，能夠經過協程把揉在一塊兒的過程分開來：

// bad 兩個I/O寫到一塊兒了
func sendToPlatforms() {
    httpSend("bloomberg", func(err error) {
        if err == nil {
            increaseCounter("bloomberg_sent", func(err error) {
                if err != nil {
                    log("failed to record counter", err)
                }
            })
        } else {
            log("failed to send to bloom berg", err)
        }
    })
    ftpSend("reuters", func(err error) {
        if err == DIRECTORY_NOT_FOUND {
            httpSend("reuterHelp", err)
        }
    })
}

對於這種併發的I/O場景，最佳解法就是給每一個功能各自寫一個計算函數，代碼真正運行的時候是」同時「在運行，可是代碼中是分開的。

//good 協程寫法
func sendToPlatforms() {
    go sendToBloomberg()
    go sendToReuters()
}

func sendToBloomberg() {
    err := httpSend("bloomberg")
    if err != nil {
        log("failed to send to bloom berg", err)
        return
    }
    err := increaseCounter("bloomberg_sent")
    if err != nil {
        log("failed to record counter", err)
    }
}

func sendToReuters() {
    err := ftpSend("reuters")
    if err == nil {
        httpSend("reutersHelp", err)
    }
}

有時，邏輯必需要合併到一個Process裏面，好比在買賣商品時必需要對參數作邏輯檢查：

// bad
func buyProduct(req *http.Request) error {
    err := checkAuth(req)
    if err != nil {
        return err
    }
    // ...
}

func sellProduct(req *http.Request) error {
    err := checkAuth(req)
    if err != nil {
        return err
    }
    // ...
}

這種頭部有公共邏輯經典解法是寫個Decorator單獨處理權限校驗邏輯，而後wrapper一下正式邏輯便可：

// good 裝飾器寫法
func init() {
    buyProduct = checkAuthDecorator(buyProduct)
    sellProduct = checkAuthDecorator(sellProduct)
}

func checkAuthDecorator(f func(req *http.Request) error) func(req *http.Request) error {
    return func(req *http.Request) error {
        err := checkAuth(req)
        if err != nil {
            return err
        }
        return f(req)
    }
}

var buyProduct = func(req *http.Request) error {
    // ...
}

var sellProduct = func(req *http.Request) error {
    // ...
}

此時你的代碼會想這樣：

固然公共邏輯不只僅存在於頭部，仔細思考一下所謂的strategy、Template pattern，他們是在邏輯的其餘地方去作這樣的邏輯處理。

這塊有一個新的概念叫：信噪比。信噪比是一個相對概念，信息，對我有用的；噪音，對我沒用的。代碼應把什麼邏輯寫在一塊兒，不只取決於讀者是誰，還取決於這個讀者當時但願完成什麼目標。

好比下面這段C++和Python代碼：

void sendMessage(const Message &msg) const {...}
![image.png](/img/bVcOMhy)

def sendMessage(msg):

若是你如今要作業務開發，你可能會以爲Python代碼讀起來很簡潔；可是若是你如今要作一些性能優化的工做，C++代碼顯然能給你帶來更多信息。

再好比下面這段代碼，從業務邏輯上講，這段開發看起來很是清晰，就是去遍歷書本獲取Publisher。

for _, book := range books {
  book.getPublisher()
}

可是若是你看了線上打了以下的SQL日誌，你懵逼了，心想這個OOM真**，真就是一行一行執行SQL，這行代碼可能會引發DB報警，讓你的DBA同事半夜起來修DB。

SELECT * FROM Pubisher WHERE PublisherId = book.publisher_id
SELECT * FROM Pubisher WHERE PublisherId = book.publisher_id
SELECT * FROM Pubisher WHERE PublisherId = book.publisher_id
SELECT * FROM Pubisher WHERE PublisherId = book.publisher_id
SELECT * FROM Pubisher WHERE PublisherId = book.publisher_id

因此若是代碼改爲這樣，你可能就會更加明白這塊代碼實際上是在循環調用實體。

for _, book := range books {
  loadEntity("publisher", book.publisher_id)
}

總結一下：

優先嚐試給每個Process一個本身的函數，不要合併到一塊兒來算
- 嘗試界面拆成組件
- 嘗試把訂單拆成多個單據，獨立跟蹤多個流程
- 嘗試用協程而不是回調來表達concurrent i/o
若是不得不在一個Process中處理多個相對獨立的事情
- 嘗試複製一份代碼，而不是複用同一個Process
- 嘗試顯式插入: state/ adapter/ strategy/template/ visitor/ observer
- 嘗試隱式插入: decorator/aop
- 提升信噪比是相對於具體目標的，提升了一個目標的信噪比，就下降了另一個目標的信噪比