Python語法速覽與機器學習開發環境搭建從屬於筆者的程序猿的數據科學與機器學習實戰手冊,若是但願瞭解更多關於數據科學與機器學習知識體系結構,推薦閱讀2016:個人技術體系結構圖:Web/ServerSideApplication/MachineLearning、面向程序猿的數據科學與機器學習知識體系及資料合集。html
Python 是一門高階、動態類型的多範式編程語言。人生苦短,請用Python,大量功能強大的語法糖的同時讓不少時候Python代碼看上去有點像僞代碼。譬如咱們用Python實現的簡易的快排相較於Java會顯得很短小精悍:python
def quicksort(arr): if len(arr) <= 1: return arr pivot = arr[len(arr) / 2] left = [x for x in arr if x < pivot] middle = [x for x in arr if x == pivot] right = [x for x in arr if x > pivot] return quicksort(left) + middle + quicksort(right) print quicksort([3,6,8,10,1,2,1]) # Prints "[1, 1, 2, 3, 6, 8, 10]"
Python社區存在的最大的問題就是版本分裂,這也是筆者一直以爲有點雞肋般的感受,畢竟對於處女座而言實在是難受。目前Python社區中存在兩個不一樣的主要版本:2.7與3.4。Python 3.0引入了不少不向後兼容的變化,所以不少遵循2.7版本的代碼並不能適用於3.4版本。咱們可使用python --version
命令來查看當前使用的版本。git
模塊 | 注意點 |
---|---|
換行 | 反斜槓()繼續上一行,Python文件以模塊形式組織。Python程序語句不以分號結尾,而以換行符結尾。Python 使用硬回車來分割語句, 冒號和縮進來分割代碼塊。C++ 和 Java 使用分號來分割語句, 花括號來分割代碼塊。 |
註釋 | a. 使用#符號標示註釋; b. 在模塊、類或者函數起始添加一個字符串起文檔做用; c. 使用三引號標示註釋。 print """ Usage: thingy [OPTIONS] -h Display this usage message -H hostname Hostname to connect to """ |
主流程 | Python 中沒有子程序,只有函數, 全部的函數都有返回值,而且全部的函數都以 def 開始。 |
字符串 | Python中單引號與雙引號的區別相似於PHP中,雙引號中能夠包括單引號。 |
數組 | Python中數組下標能夠爲負數,即從右端開始計量,-1即爲最後一個數。Python不能夠修改數組中值,字符串下標索引方式相似於MATLAB。 |
函數 | Python的函數能夠嵌套定義 |
筆者推薦使用Anaconda做爲環境搭建工具,而且推薦使用Python 3.5版本,能夠在這裏下載。若是是習慣使用Docker的小夥伴能夠參考anaconda-notebookgithub
docker pull rothnic/anaconda-notebook docker run -p 8888:8888 -i -t rothnic/anaconda-notebook
安裝完畢以後可使用以下命令驗證安裝是否完畢:docker
conda --version
安裝完畢以後咱們就能夠建立具體的開發環境了,主要是經過create命令來建立新的獨立環境:編程
conda create --name snowflakes biopython
該命令會建立一個名爲snowflakes而且安裝了Biopython的環境,若是你須要切換到該開發環境,可使用activate命令:api
Linux, OS X: source activate snowflakes
數組
Windows: activate snowflakes
babel
咱們也能夠在建立環境的時候指明是用python2仍是python3:app
conda create --name bunnies python=3 astroid babel
環境建立完畢以後,咱們可使用info
命令查看全部環境:
conda info --envs conda environments: snowflakes * /home/username/miniconda/envs/snowflakes bunnies /home/username/miniconda/envs/bunnies root /home/username/miniconda
當咱們切換到某個具體的環境後,能夠安裝依賴包了:
conda list # 列舉當前環境中的全部依賴包 conda install nltk # 安裝某個新的依賴
在Conda安裝以後,Jupyter Notebook是默認安裝好的,直接在工做目錄下打開便可:
jupyter notebook
你能夠參閱Running the Notebook獲取更多命令細節。
和其餘主流語言同樣,Python爲咱們提供了包括integer、float、boolean、strings等在內的不少基礎類型。
x = 3 print type(x) # Prints "<type 'int'>" print x # Prints "3" print x + 1 # Addition; prints "4" print x - 1 # Subtraction; prints "2" print x * 2 # Multiplication; prints "6" print x ** 2 # Exponentiation; prints "9" x += 1 print x # Prints "4" x *= 2 print x # Prints "8" y = 2.5 print type(y) # Prints "<type 'float'>" print y, y + 1, y * 2, y ** 2 # Prints "2.5 3.5 5.0 6.25"
不過須要注意的是,Python並無x++
或者x--
這樣的自增或者自減操做符。另外,Python內置的也提供了長整型與其餘複雜數值類型的整合,能夠參考這裏。
Python提供了常見的邏輯操做符,不過須要注意的是Python中並無使用&&、||
等,而是直接使用了英文單詞。
t = True f = False print type(t) # Prints "<type 'bool'>" print t and f # Logical AND; prints "False" print t or f # Logical OR; prints "True" print not t # Logical NOT; prints "False" print t != f # Logical XOR; prints "True"
Python對於字符串的支持仍是很好的,不過須要注意到utf-8
編碼問題。
hello = 'hello' # String literals can use single quotes world = "world" # or double quotes; it does not matter. print hello # Prints "hello" print len(hello) # String length; prints "5" hw = hello + ' ' + world # String concatenation print hw # prints "hello world" hw12 = '%s %s %d' % (hello, world, 12) # sprintf style string formatting print hw12 # prints "hello world 12"
Python中的字符串對象還包含了不少有用的方法,譬如:
s = "hello" print s.capitalize() # Capitalize a string; prints "Hello" print s.upper() # Convert a string to uppercase; prints "HELLO" print s.rjust(7) # Right-justify a string, padding with spaces; prints " hello" print s.center(7) # Center a string, padding with spaces; prints " hello " print s.replace('l', '(ell)') # Replace all instances of one substring with another; # prints "he(ell)(ell)o" print ' world '.strip() # Strip leading and trailing whitespace; prints "world"
能夠在這裏中查看詳細的方法列表。
Python中的列表等價於數組,不過其可以動態擴展而且可以存放不一樣類型的數值。
xs = [3, 1, 2] # Create a list print xs, xs[2] # Prints "[3, 1, 2] 2" print xs[-1] # Negative indices count from the end of the list; prints "2" xs[2] = 'foo' # Lists can contain elements of different types print xs # Prints "[3, 1, 'foo']" xs.append('bar') # Add a new element to the end of the list print xs # Prints "[3, 1, 'foo', 'bar']" x = xs.pop() # Remove and return the last element of the list print x, xs # Prints "bar [3, 1, 'foo']"
一樣你能夠在文檔中查看更多的細節。
Python中對於數組的訪問也至關人性化,經過簡單的操做符便可以完成對於數組中子數組的截取。
nums = range(5) # range is a built-in function that creates a list of integers print nums # Prints "[0, 1, 2, 3, 4]" print nums[2:4] # Get a slice from index 2 to 4 (exclusive); prints "[2, 3]" print nums[2:] # Get a slice from index 2 to the end; prints "[2, 3, 4]" print nums[:2] # Get a slice from the start to index 2 (exclusive); prints "[0, 1]" print nums[:] # Get a slice of the whole list; prints ["0, 1, 2, 3, 4]" print nums[:-1] # Slice indices can be negative; prints ["0, 1, 2, 3]" nums[2:4] = [8, 9] # Assign a new sublist to a slice print nums # Prints "[0, 1, 8, 9, 4]"
你可使用基本的for
循環來遍歷數組中的元素,就像下面介個樣紙:
animals = ['cat', 'dog', 'monkey'] for animal in animals: print animal # Prints "cat", "dog", "monkey", each on its own line.
若是你在循環的同時也但願可以獲取到當前元素下標,可使用enumerate
函數:
animals = ['cat', 'dog', 'monkey'] for idx, animal in enumerate(animals): print '#%d: %s' % (idx + 1, animal) # Prints "#1: cat", "#2: dog", "#3: monkey", each on its own line
在編程中咱們常常須要對數組進行變換,比較著名的咱們可使用map、reduce、filter
這幾個函數,而在Python中提供了很是方便的List Comprehension操做符。譬如咱們須要對數組中元素進行依次平方操做
nums = [0, 1, 2, 3, 4] squares = [] for x in nums: squares.append(x ** 2) print squares # Prints [0, 1, 4, 9, 16]
咱們能夠簡寫爲以下方式:
nums = [0, 1, 2, 3, 4] squares = [x ** 2 for x in nums] print squares # Prints [0, 1, 4, 9, 16]
List Comprehensions也支持進行條件選擇:
nums = [0, 1, 2, 3, 4] even_squares = [x ** 2 for x in nums if x % 2 == 0] print even_squares # Prints "[0, 4, 16]"
Python中的字典類型即相似於Java中的Map或者JavaScript中的Object,也就是所謂的鍵值對類型,基本的使用方式爲:
d = {'cat': 'cute', 'dog': 'furry'} # Create a new dictionary with some data print d['cat'] # Get an entry from a dictionary; prints "cute" print 'cat' in d # Check if a dictionary has a given key; prints "True" d['fish'] = 'wet' # Set an entry in a dictionary print d['fish'] # Prints "wet" # print d['monkey'] # KeyError: 'monkey' not a key of d print d.get('monkey', 'N/A') # Get an element with a default; prints "N/A" print d.get('fish', 'N/A') # Get an element with a default; prints "wet" del d['fish'] # Remove an element from a dictionary print d.get('fish', 'N/A') # "fish" is no longer a key; prints "N/A"
更多的語法細節能夠參考這裏。
對於字典的遍歷也很是簡單:
d = {'person': 2, 'cat': 4, 'spider': 8} for animal in d: legs = d[animal] print 'A %s has %d legs' % (animal, legs) # Prints "A person has 2 legs", "A spider has 8 legs", "A cat has 4 legs"
若是你但願同時訪問鍵和其對應的值,可使用iteritems
方法:
d = {'person': 2, 'cat': 4, 'spider': 8} for animal, legs in d.iteritems(): print 'A %s has %d legs' % (animal, legs) # Prints "A person has 2 legs", "A spider has 8 legs", "A cat has 4 legs"
nums = [0, 1, 2, 3, 4] even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0} print even_num_to_square # Prints "{0: 0, 2: 4, 4: 16}"
Set是一系列無序且惟一的元素的集合:
animals = {'cat', 'dog'} print 'cat' in animals # Check if an element is in a set; prints "True" print 'fish' in animals # prints "False" animals.add('fish') # Add an element to a set print 'fish' in animals # Prints "True" print len(animals) # Number of elements in a set; prints "3" animals.add('cat') # Adding an element that is already in the set does nothing print len(animals) # Prints "3" animals.remove('cat') # Remove an element from a set print len(animals) # Prints "2"
更多語法細節能夠參考這裏。
集合遍歷的語法和數組遍歷很相似,不過由於集合自己是無序的,所以你不可以依賴於遍歷的順序來預測集合中元素的順序:
animals = {'cat', 'dog', 'fish'} for idx, animal in enumerate(animals): print '#%d: %s' % (idx + 1, animal) # Prints "#1: fish", "#2: dog", "#3: cat"
from math import sqrt nums = {int(sqrt(x)) for x in range(30)} print nums # Prints "set([0, 1, 2, 3, 4, 5])"
Python中的Tuple指不可變的有序元素集合,Tuple很相似於列表,不過區別在於Tuple能夠作字典中的鍵類型,而列表則不能夠。
d = {(x, x + 1): x for x in range(10)} # Create a dictionary with tuple keys t = (5, 6) # Create a tuple print type(t) # Prints "<type 'tuple'>" print d[t] # Prints "5" print d[(1, 2)] # Prints "1"
Python中的函數使用def
關鍵字進行定義,譬如:
def sign(x): if x > 0: return 'positive' elif x < 0: return 'negative' else: return 'zero' for x in [-1, 0, 1]: print sign(x) # Prints "negative", "zero", "positive"
同時,Python中的函數還支持可選參數:
def hello(name, loud=False): if loud: print 'HELLO, %s!' % name.upper() else: print 'Hello, %s' % name hello('Bob') # Prints "Hello, Bob" hello('Fred', loud=True) # Prints "HELLO, FRED!"
更多的語法細節能夠參考這裏。
Python中對於類的定義也很直接:
class Greeter(object): # Constructor def __init__(self, name): self.name = name # Create an instance variable # Instance method def greet(self, loud=False): if loud: print 'HELLO, %s!' % self.name.upper() else: print 'Hello, %s' % self.name g = Greeter('Fred') # Construct an instance of the Greeter class g.greet() # Call an instance method; prints "Hello, Fred" g.greet(loud=True) # Call an instance method; prints "HELLO, FRED!"
能夠參考這裏獲取更多信息。