教你用Python建立瀑布圖

時間 2019-12-11

標籤 python 建立瀑布欄目 Python 简体版

原文原文鏈接

介紹python

對於繪製某些類型的數據來講，瀑布圖是一種十分有用的工具。不足爲奇的是，咱們可使用Pandas和matplotlib建立一個可重複的瀑布圖。函數

在往下進行以前，我想先告訴你們我指代的是哪一種類型的圖表。我將創建一個維基百科文章中描述的2D瀑布圖。工具

這種圖表的一個典型的用處是顯示開始值和結束值之間起「橋樑」做用的+和-的值。由於這個緣由，財務人員有時會將其稱爲一個橋樑。跟我以前所採用的其餘例子類似，這種類型的繪圖在Excel中不容易生成，固然確定有生成它的方法，可是不容易記住。oop

關於瀑布圖須要記住的關鍵點是：它本質上是一個堆疊在一塊兒的條形圖，不過特殊的一點是，它有一個空白底欄，因此頂部欄會「懸浮」在空中。那麼，讓咱們開始吧。學習

建立圖表

首先，執行標準的輸入，並確保IPython能顯示matplot圖。spa

 
        import 
        numpy as np 
       
        import 
        pandas as pd 
       
        import 
        matplotlib.pyplot as plt

 
        % 
        matplotlib inline

設置咱們想畫出瀑布圖的數據，並將其加載到數據幀（DataFrame）中。code

數據須要以你的起始值開始，可是你須要給出最終的總數。咱們將在下面計算它。orm

 
   
    
      
      
        index  
        = 
        [ 
        'sales' 
        , 
        'returns' 
        , 
        'credit fees' 
        , 
        'rebates' 
        , 
        'late charges' 
        , 
        'shipping' 
        ] 
       
 
        data  
        = 
        { 
        'amount' 
        : [ 
        350000 
        , 
        - 
        30000 
        , 
        - 
        7500 
        , 
        - 
        25000 
        , 
        95000 
        , 
        - 
        7000 
        ]} 
       
 
        trans  
        = 
        pd.DataFrame(data 
        = 
        data,index 
        = 
        index) 
       
 
    
 
   
 

我使用了IPython中便捷的display函數來更簡單地控制我要顯示的內容。blog

 
        from 
        IPython.display  
        import 
        display 
       
        display(trans)

瀑布圖的最大技巧是計算出底部堆疊條形圖的內容。有關這一點，我從stackoverflow上的討論中學到不少。ip

首先，咱們獲得累積和。

 
        display(trans.amount.cumsum()) 
       
        sales            
        350000 
       
        returns          
        320000 
       
        credit fees      
        312500 
       
        rebates          
        287500 
       
        late charges     
        382500 
       
        shipping         
        375500 
       
        Name: amount, dtype: int64

這看起來不錯，但咱們須要將一個地方的數據轉移到右邊。

 
        blank 
        = 
        trans.amount.cumsum().shift( 
        1 
        ).fillna( 
        0 
        ) 
       
        display(blank)

 
        sales                 
        0 
       
        returns          
        350000 
       
        credit fees      
        320000 
       
        rebates          
        312500 
       
        late charges     
        287500 
       
        shipping         
        382500 
       
        Name: amount, dtype: float64

咱們須要向trans和blank數據幀中添加一個淨總量。

 
        total  
        = 
        trans. 
        sum 
        ().amount 
       
        trans.loc[ 
        "net" 
        ]  
        = 
        total 
       
        blank.loc[ 
        "net" 
        ]  
        = 
        total 
       
        display(trans) 
       
        display(blank)

 
        sales                 
        0 
       
        returns          
        350000 
       
        credit fees      
        320000 
       
        rebates          
        312500 
       
        late charges     
        287500 
       
        shipping         
        382500 
       
        net              
        375500 
       
        Name: amount, dtype: float64

建立咱們用來顯示變化的步驟。

 
        step  
        = 
        blank.reset_index(drop 
        = 
        True 
        ).repeat( 
        3 
        ).shift( 
        - 
        1 
        ) 
       
        step[ 
        1 
        :: 
        3 
        ]  
        = 
        np.nan 
       
        display(step)

 
        0         
        0 
       
        0       
        NaN 
       
        0    
        350000 
       
        1    
        350000 
       
        1       
        NaN 
       
        1    
        320000 
       
        2    
        320000 
       
        2       
        NaN 
       
        2    
        312500 
       
        3    
        312500 
       
        3       
        NaN 
       
        3    
        287500 
       
        4    
        287500 
       
        4       
        NaN 
       
        4    
        382500 
       
        5    
        382500 
       
        5       
        NaN 
       
        5    
        375500 
       
        6    
        375500 
       
        6       
        NaN 
       
        6       
        NaN 
       
        Name: amount, dtype: float64

對於「net」行，爲了避免使堆疊加倍，咱們須要確保blank值爲0。

 
        blank.loc[ 
        "net" 
        ]  
        = 
        0

而後，將其畫圖，看一下什麼樣子。

 
   
    
      
      
        my_plot  
        = 
        trans.plot(kind 
        = 
        'bar' 
        , stacked 
        = 
        True 
        , bottom 
        = 
        blank,legend 
        = 
        None 
        , title 
        = 
        "2014 Sales Waterfall" 
        ) 
       
 
        my_plot.plot(step.index, step.values, 
        'k' 
        ) 
       
 
    
 
   
 

看起來至關不錯，可是讓咱們試着格式化Y軸，以使其更具備可讀性。爲此，咱們使用FuncFormatter和一些Python2.7+的語法來截斷小數並向格式中添加一個逗號。

 
        def 
        money(x, pos): 
       
        'The two args are the value and tick position' 
       
        return 
        "${:,.0f}" 
        . 
        format 
        (x)

 
        from 
        matplotlib.ticker  
        import 
        FuncFormatter 
       
        formatter  
        = 
        FuncFormatter(money)

而後，將其組合在一塊兒。

 
        my_plot  
        = 
        trans.plot(kind 
        = 
        'bar' 
        , stacked 
        = 
        True 
        , bottom 
        = 
        blank,legend 
        = 
        None 
        , title 
        = 
        "2014 Sales Waterfall" 
        ) 
       
        my_plot.plot(step.index, step.values, 
        'k' 
        ) 
       
        my_plot.set_xlabel( 
        "Transaction Types" 
        ) 
       
        my_plot.yaxis.set_major_formatter(formatter)

完整腳本

基本圖形可以正常工做，可是我想添加一些標籤，並作一些小的格式修改。下面是我最終的腳本：

 
   
    
      
      
        import 
        numpy as np 
       
 
        import 
        pandas as pd 
       
 
        import 
        matplotlib.pyplot as plt 
       
 
        from 
        matplotlib.ticker  
        import 
        FuncFormatter 
       

           
       
 
        #Use python 2.7+ syntax to format currency 
       
 
        def 
        money(x, pos): 
       
 
             
        'The two args are the value and tick position' 
       
 
             
        return 
        "${:,.0f}" 
        . 
        format 
        (x) 
       
 
        formatter  
        = 
        FuncFormatter(money) 
       

           
       
 
        #Data to plot. Do not include a total, it will be calculated 
       
 
        index  
        = 
        [ 
        'sales' 
        , 
        'returns' 
        , 
        'credit fees' 
        , 
        'rebates' 
        , 
        'late charges' 
        , 
        'shipping' 
        ] 
       
 
        data  
        = 
        { 
        'amount' 
        : [ 
        350000 
        , 
        - 
        30000 
        , 
        - 
        7500 
        , 
        - 
        25000 
        , 
        95000 
        , 
        - 
        7000 
        ]} 
       

           
       
 
        #Store data and create a blank series to use for the waterfall 
       
 
        trans  
        = 
        pd.DataFrame(data 
        = 
        data,index 
        = 
        index) 
       
 
        blank  
        = 
        trans.amount.cumsum().shift( 
        1 
        ).fillna( 
        0 
        ) 
       

           
       
 
        #Get the net total number for the final element in the waterfall 
       
 
        total  
        = 
        trans. 
        sum 
        ().amount 
       
 
        trans.loc[ 
        "net" 
        ] 
        = 
        total 
       
 
        blank.loc[ 
        "net" 
        ]  
        = 
        total 
       

           
       
 
        #The steps graphically show the levels as well as used for label placement 
       
 
        step  
        = 
        blank.reset_index(drop 
        = 
        True 
        ).repeat( 
        3 
        ).shift( 
        - 
        1 
        ) 
       
 
        step[ 
        1 
        :: 
        3 
        ]  
        = 
        np.nan 
       

           
       
 
        #When plotting the last element, we want to show the full bar, 
       
 
        #Set the blank to 0 
       
 
        blank.loc[ 
        "net" 
        ]  
        = 
        0 
       

           
       
 
        #Plot and label 
       
 
        my_plot  
        = 
        trans.plot(kind 
        = 
        'bar' 
        , stacked 
        = 
        True 
        , bottom 
        = 
        blank,legend 
        = 
        None 
        , figsize 
        = 
        ( 
        10 
        ,  
        5 
        ), title 
        = 
        "2014 Sales Waterfall" 
        ) 
       
 
        my_plot.plot(step.index, step.values, 
        'k' 
        ) 
       
 
        my_plot.set_xlabel( 
        "Transaction Types" 
        ) 
       

           
       
 
        #Format the axis for dollars 
       
 
        my_plot.yaxis.set_major_formatter(formatter) 
       

           
       
 
        #Get the y-axis position for the labels 
       
 
        y_height  
        = 
        trans.amount.cumsum().shift( 
        1 
        ).fillna( 
        0 
        ) 
       

           
       
 
        #Get an offset so labels don't sit right on top of the bar 
       
 
        max 
        = 
        trans. 
        max 
        () 
       
 
        neg_offset  
        = 
        max 
        / 
        25 
       
 
        pos_offset  
        = 
        max 
        / 
        50 
       
 
        plot_offset  
        = 
        int 
        ( 
        max 
        / 
        15 
        ) 
       

           
       
 
        #Start label loop 
       
 
        loop  
        = 
        0 
       
 
        for 
        index, row  
        in 
        trans.iterrows(): 
       
 
             
        # For the last item in the list, we don't want to double count 
       
 
             
        if 
        row[ 
        'amount' 
        ]  
        = 
        = 
        total: 
       
 
                 
        y  
        = 
        y_height[loop] 
       
 
             
        else 
        : 
       
 
                 
        y  
        = 
        y_height[loop]  
        + 
        row[ 
        'amount' 
        ] 
       
 
             
        # Determine if we want a neg or pos offset 
       
 
             
        if 
        row[ 
        'amount' 
        ] >  
        0 
        : 
       
 
                 
        y  
        + 
        = 
        pos_offset 
       
 
             
        else 
        : 
       
 
                 
        y  
        - 
        = 
        neg_offset 
       
 
             
        my_plot.annotate( 
        "{:,.0f}" 
        . 
        format 
        (row[ 
        'amount' 
        ]),(loop,y),ha 
        = 
        "center" 
        ) 
       
 
             
        loop 
        + 
        = 
        1 
       

           
       
 
        #Scale up the y axis so there is room for the labels 
       
 
        my_plot.set_ylim( 
        0 
        ,blank. 
        max 
        () 
        + 
        int 
        (plot_offset)) 
       
 
        #Rotate the labels 
       
 
        my_plot.set_xticklabels(trans.index,rotation 
        = 
        0 
        ) 
       
 
        my_plot.get_figure().savefig( 
        "waterfall.png" 
        ,dpi 
        = 
        200 
        ,bbox_inches 
        = 
        'tight' 
        )