Python語法速覽與機器學習開發環境搭建

Python語法速覽與機器學習開發環境搭建從屬於筆者的程序猿的數據科學與機器學習實戰手冊,若是但願瞭解更多關於數據科學與機器學習知識體系結構,推薦閱讀2016:個人技術體系結構圖:Web/ServerSideApplication/MachineLearning面向程序猿的數據科學與機器學習知識體系及資料合集html

Python

Python 是一門高階、動態類型的多範式編程語言。人生苦短,請用Python,大量功能強大的語法糖的同時讓不少時候Python代碼看上去有點像僞代碼。譬如咱們用Python實現的簡易的快排相較於Java會顯得很短小精悍:python

def quicksort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) / 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quicksort(left) + middle + quicksort(right)
    
print quicksort([3,6,8,10,1,2,1])
# Prints "[1, 1, 2, 3, 6, 8, 10]"

Python 版本

Python社區存在的最大的問題就是版本分裂,這也是筆者一直以爲有點雞肋般的感受,畢竟對於處女座而言實在是難受。目前Python社區中存在兩個不一樣的主要版本:2.7與3.4。Python 3.0引入了不少不向後兼容的變化,所以不少遵循2.7版本的代碼並不能適用於3.4版本。咱們可使用python --version命令來查看當前使用的版本。git

經常使用習慣

模塊 注意點
換行 反斜槓()繼續上一行,Python文件以模塊形式組織。Python程序語句不以分號結尾,而以換行符結尾。Python 使用硬回車來分割語句, 冒號和縮進來分割代碼塊。C++ 和 Java 使用分號來分割語句, 花括號來分割代碼塊。
註釋 a. 使用#符號標示註釋; b. 在模塊、類或者函數起始添加一個字符串起文檔做用; c. 使用三引號標示註釋。 print """ Usage: thingy [OPTIONS] -h Display this usage message -H hostname Hostname to connect to """
主流程 Python 中沒有子程序,只有函數, 全部的函數都有返回值,而且全部的函數都以 def 開始。
字符串 Python中單引號與雙引號的區別相似於PHP中,雙引號中能夠包括單引號。
數組 Python中數組下標能夠爲負數,即從右端開始計量,-1即爲最後一個數。Python不能夠修改數組中值,字符串下標索引方式相似於MATLAB。
函數 Python的函數能夠嵌套定義

Installation:環境搭建

Conda

筆者推薦使用Anaconda做爲環境搭建工具,而且推薦使用Python 3.5版本,能夠在這裏下載。若是是習慣使用Docker的小夥伴能夠參考anaconda-notebookgithub

docker pull rothnic/anaconda-notebook
docker run -p 8888:8888 -i -t rothnic/anaconda-notebook

安裝完畢以後可使用以下命令驗證安裝是否完畢:docker

conda --version

安裝完畢以後咱們就能夠建立具體的開發環境了,主要是經過create命令來建立新的獨立環境:編程

conda create --name snowflakes biopython

該命令會建立一個名爲snowflakes而且安裝了Biopython的環境,若是你須要切換到該開發環境,可使用activate命令:api

  • Linux, OS X: source activate snowflakes數組

  • Windows: activate snowflakesbabel

咱們也能夠在建立環境的時候指明是用python2仍是python3:app

conda create --name bunnies python=3 astroid babel

環境建立完畢以後,咱們可使用info命令查看全部環境:

conda info --envs
conda environments:

     snowflakes          * /home/username/miniconda/envs/snowflakes
     bunnies               /home/username/miniconda/envs/bunnies
     root                  /home/username/miniconda

當咱們切換到某個具體的環境後,能夠安裝依賴包了:

conda list # 列舉當前環境中的全部依賴包
conda install nltk # 安裝某個新的依賴

Jupyter Notebook

在Conda安裝以後,Jupyter Notebook是默認安裝好的,直接在工做目錄下打開便可:

jupyter notebook

你能夠參閱Running the Notebook獲取更多命令細節。

基礎數據類型

和其餘主流語言同樣,Python爲咱們提供了包括integer、float、boolean、strings等在內的不少基礎類型。

數值類型

x = 3
print type(x) # Prints "<type 'int'>"
print x       # Prints "3"
print x + 1   # Addition; prints "4"
print x - 1   # Subtraction; prints "2"
print x * 2   # Multiplication; prints "6"
print x ** 2  # Exponentiation; prints "9"
x += 1
print x  # Prints "4"
x *= 2
print x  # Prints "8"
y = 2.5
print type(y) # Prints "<type 'float'>"
print y, y + 1, y * 2, y ** 2 # Prints "2.5 3.5 5.0 6.25"

不過須要注意的是,Python並無x++或者x--這樣的自增或者自減操做符。另外,Python內置的也提供了長整型與其餘複雜數值類型的整合,能夠參考這裏

布爾類型

Python提供了常見的邏輯操做符,不過須要注意的是Python中並無使用&&、||等,而是直接使用了英文單詞。

t = True
f = False
print type(t) # Prints "<type 'bool'>"
print t and f # Logical AND; prints "False"
print t or f  # Logical OR; prints "True"
print not t   # Logical NOT; prints "False"
print t != f  # Logical XOR; prints "True"

字符串

Python對於字符串的支持仍是很好的,不過須要注意到utf-8編碼問題。

hello = 'hello'   # String literals can use single quotes
world = "world"   # or double quotes; it does not matter.
print hello       # Prints "hello"
print len(hello)  # String length; prints "5"
hw = hello + ' ' + world  # String concatenation
print hw  # prints "hello world"
hw12 = '%s %s %d' % (hello, world, 12)  # sprintf style string formatting
print hw12  # prints "hello world 12"

Python中的字符串對象還包含了不少有用的方法,譬如:

s = "hello"
print s.capitalize()  # Capitalize a string; prints "Hello"
print s.upper()       # Convert a string to uppercase; prints "HELLO"
print s.rjust(7)      # Right-justify a string, padding with spaces; prints "  hello"
print s.center(7)     # Center a string, padding with spaces; prints " hello "
print s.replace('l', '(ell)')  # Replace all instances of one substring with another;
                               # prints "he(ell)(ell)o"
print '  world '.strip()  # Strip leading and trailing whitespace; prints "world"

能夠在這裏中查看詳細的方法列表。

複雜數據類型

列表

Python中的列表等價於數組,不過其可以動態擴展而且可以存放不一樣類型的數值。

xs = [3, 1, 2]   # Create a list
print xs, xs[2]  # Prints "[3, 1, 2] 2"
print xs[-1]     # Negative indices count from the end of the list; prints "2"
xs[2] = 'foo'    # Lists can contain elements of different types
print xs         # Prints "[3, 1, 'foo']"
xs.append('bar') # Add a new element to the end of the list
print xs         # Prints "[3, 1, 'foo', 'bar']"
x = xs.pop()     # Remove and return the last element of the list
print x, xs      # Prints "bar [3, 1, 'foo']"

一樣你能夠在文檔中查看更多的細節。

切片

Python中對於數組的訪問也至關人性化,經過簡單的操做符便可以完成對於數組中子數組的截取。

nums = range(5)    # range is a built-in function that creates a list of integers
print nums         # Prints "[0, 1, 2, 3, 4]"
print nums[2:4]    # Get a slice from index 2 to 4 (exclusive); prints "[2, 3]"
print nums[2:]     # Get a slice from index 2 to the end; prints "[2, 3, 4]"
print nums[:2]     # Get a slice from the start to index 2 (exclusive); prints "[0, 1]"
print nums[:]      # Get a slice of the whole list; prints ["0, 1, 2, 3, 4]"
print nums[:-1]    # Slice indices can be negative; prints ["0, 1, 2, 3]"
nums[2:4] = [8, 9] # Assign a new sublist to a slice
print nums         # Prints "[0, 1, 8, 9, 4]"

遍歷

你可使用基本的for循環來遍歷數組中的元素,就像下面介個樣紙:

animals = ['cat', 'dog', 'monkey']
for animal in animals:
    print animal
# Prints "cat", "dog", "monkey", each on its own line.

若是你在循環的同時也但願可以獲取到當前元素下標,可使用enumerate函數:

animals = ['cat', 'dog', 'monkey']
for idx, animal in enumerate(animals):
    print '#%d: %s' % (idx + 1, animal)
# Prints "#1: cat", "#2: dog", "#3: monkey", each on its own line

變換

在編程中咱們常常須要對數組進行變換,比較著名的咱們可使用map、reduce、filter這幾個函數,而在Python中提供了很是方便的List Comprehension操做符。譬如咱們須要對數組中元素進行依次平方操做

nums = [0, 1, 2, 3, 4]
squares = []
for x in nums:
    squares.append(x ** 2)
print squares   # Prints [0, 1, 4, 9, 16]

咱們能夠簡寫爲以下方式:

nums = [0, 1, 2, 3, 4]
squares = [x ** 2 for x in nums]
print squares   # Prints [0, 1, 4, 9, 16]

List Comprehensions也支持進行條件選擇:

nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
print even_squares  # Prints "[0, 4, 16]"

字典

Python中的字典類型即相似於Java中的Map或者JavaScript中的Object,也就是所謂的鍵值對類型,基本的使用方式爲:

d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data
print d['cat']       # Get an entry from a dictionary; prints "cute"
print 'cat' in d     # Check if a dictionary has a given key; prints "True"
d['fish'] = 'wet'    # Set an entry in a dictionary
print d['fish']      # Prints "wet"
# print d['monkey']  # KeyError: 'monkey' not a key of d
print d.get('monkey', 'N/A')  # Get an element with a default; prints "N/A"
print d.get('fish', 'N/A')    # Get an element with a default; prints "wet"
del d['fish']        # Remove an element from a dictionary
print d.get('fish', 'N/A') # "fish" is no longer a key; prints "N/A"

更多的語法細節能夠參考這裏

遍歷

對於字典的遍歷也很是簡單:

d = {'person': 2, 'cat': 4, 'spider': 8}
for animal in d:
    legs = d[animal]
    print 'A %s has %d legs' % (animal, legs)
# Prints "A person has 2 legs", "A spider has 8 legs", "A cat has 4 legs"

若是你但願同時訪問鍵和其對應的值,可使用iteritems方法:

d = {'person': 2, 'cat': 4, 'spider': 8}
for animal, legs in d.iteritems():
    print 'A %s has %d legs' % (animal, legs)
# Prints "A person has 2 legs", "A spider has 8 legs", "A cat has 4 legs"

變換

nums = [0, 1, 2, 3, 4]
even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}
print even_num_to_square  # Prints "{0: 0, 2: 4, 4: 16}"

Set

Set是一系列無序且惟一的元素的集合:

animals = {'cat', 'dog'}
print 'cat' in animals   # Check if an element is in a set; prints "True"
print 'fish' in animals  # prints "False"
animals.add('fish')      # Add an element to a set
print 'fish' in animals  # Prints "True"
print len(animals)       # Number of elements in a set; prints "3"
animals.add('cat')       # Adding an element that is already in the set does nothing
print len(animals)       # Prints "3"
animals.remove('cat')    # Remove an element from a set
print len(animals)       # Prints "2"

更多語法細節能夠參考這裏

遍歷

集合遍歷的語法和數組遍歷很相似,不過由於集合自己是無序的,所以你不可以依賴於遍歷的順序來預測集合中元素的順序:

animals = {'cat', 'dog', 'fish'}
for idx, animal in enumerate(animals):
    print '#%d: %s' % (idx + 1, animal)
# Prints "#1: fish", "#2: dog", "#3: cat"

變換

from math import sqrt
nums = {int(sqrt(x)) for x in range(30)}
print nums  # Prints "set([0, 1, 2, 3, 4, 5])"

Tuples

Python中的Tuple指不可變的有序元素集合,Tuple很相似於列表,不過區別在於Tuple能夠作字典中的鍵類型,而列表則不能夠。

d = {(x, x + 1): x for x in range(10)}  # Create a dictionary with tuple keys
t = (5, 6)       # Create a tuple
print type(t)    # Prints "<type 'tuple'>"
print d[t]       # Prints "5"
print d[(1, 2)]  # Prints "1"

Function:函數

Python中的函數使用def關鍵字進行定義,譬如:

def sign(x):
    if x > 0:
        return 'positive'
    elif x < 0:
        return 'negative'
    else:
        return 'zero'

for x in [-1, 0, 1]:
    print sign(x)
# Prints "negative", "zero", "positive"

同時,Python中的函數還支持可選參數:

def hello(name, loud=False):
    if loud:
        print 'HELLO, %s!' % name.upper()
    else:
        print 'Hello, %s' % name

hello('Bob') # Prints "Hello, Bob"
hello('Fred', loud=True)  # Prints "HELLO, FRED!"

更多的語法細節能夠參考這裏

Classes:類

Python中對於類的定義也很直接:

class Greeter(object):
    
    # Constructor
    def __init__(self, name):
        self.name = name  # Create an instance variable
        
    # Instance method
    def greet(self, loud=False):
        if loud:
            print 'HELLO, %s!' % self.name.upper()
        else:
            print 'Hello, %s' % self.name
        
g = Greeter('Fred')  # Construct an instance of the Greeter class
g.greet()            # Call an instance method; prints "Hello, Fred"
g.greet(loud=True)   # Call an instance method; prints "HELLO, FRED!"

能夠參考這裏獲取更多信息。

延伸閱讀

相關文章
相關標籤/搜索