既然英文才是程序員的母語,就嘗試着用英文寫博文吧。。python
Iteration is actually the process of iterating over an iterable object, common iterable objects are Dict, String, File, etc.nginx
The iteration consumes the contents in its targeted iterable object.程序員
Functions like sum(), min(), list(), tuple() and in operator makes an iterable object not iterable.ajax
To make a list iterable, we can simply call iter(item_list), and then call next() on it, all elements will be returned.express
Any object has iter() and next() is considered as Iterable.ide
# in Operator for x in obj: # statements # What's inside _iter = iter(obj) while 1: try: x = _iter.next() except StopIteration: break # statements
Generator might be a easier-used Iterator。oop
def countdown(n): print "Counting down from", n while n > 0: yield n n -= 1 # Note that two lines below didn't start calling countdown until the next() was called. # yield produced the n, but suspend the whole function until next time next() was called. >>> x = countdown(10) >>> x <generator object at 0x58490> >>> x.next() Counting down from 10 10 >>> x.next() 9 ... >>> x.next() 1 # When x returns, a next() will raise exception. >>> x.next() Traceback (most recent call last): File "<stdin>", line 1, in ? StopIteration >>>
Python 3.4 version belowthis
def countdown(n): print("Counting down from", n) while n>0: yield n n -= 1 return 'exits' >>> x= countdown(3) >>> x <generator object countdown at 0x101bd7288> >>> next(x) counting down 3 3 >>> next(x) 2 >>> next(x) 1 >>> next(x) # In Python 3.4, Generator Function can also return some value, and the value will be something like error message in the raised exception later. # This feature is considered as Syntax Error in Python 2.7. Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration: exits
Variable b is an Generator below.code
>>> a = [1,2,3,4] >>> b = (2*x for x in a) >>> b <generator object at 0x58760> >>> for i in b: print b, ... 2 4 6 8
When list a is super large, the use of generator can save a lot memory actually, simply because it doesn't store another big list in memory.orm
>>> a = [1,2,3,4] >>> b = [2*x for x in a] >>> b [2, 4, 6, 8]
We now have a 1Gb access.log from nginx, the problem here is to sum up sizes of all the packets.
Every line of access.log looks like this below:
xx.xx.xx.xx - - [01/Jul/2014:10:06:06 +0800] "GET /share/ajax/?image_id=xxx&user_id=xxx HTTP/1.1" 200 72 "http://www.baidu.com/" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36"
We have two solutions, one was implemented by Generator, and the other simply use for-loop.
import cProfile, pstats, StringIO def gene(): with open('access.log', 'r') as f: lines = (line.split(' ', 11)[9] for line in f) sizes = (int(size) for size in lines if not size == '-') print "Generators Result: ", sum(sizes) pr = cProfile.Profile() pr.enable() gene() pr.disable() s = StringIO.StringIO() sortby = 'cumulative' ps = pstats.Stats(pr, stream=s).sort_stats(sortby) ps.print_stats() print s.getvalue() def loop(): size_sum = 0 with open('access.log', 'r') as f: for line in f.readlines(): size = line.split(' ', 11)[9] if not size == '-': size_sum += int(size) print "Forloop Result: ", size_sum pr = cProfile.Profile() pr.enable() loop() pr.disable() s = StringIO.StringIO() sortby = 'cumulative' ps = pstats.Stats(pr, stream=s).sort_stats(sortby) ps.print_stats() print s.getvalue() Sh4n3@Macintosh:~% python ger.py Generators Result: 13678125506 12481726 function calls in 41.487 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 41.487 41.487 ger.py:3(gene) 1 1.864 1.864 41.487 41.487 {sum} 4160297 17.209 0.000 39.623 0.000 ger.py:6(<genexpr>) 4160713 11.972 0.000 22.414 0.000 ger.py:5(<genexpr>) 4160712 10.442 0.000 10.442 0.000 {method 'split' of 'str' objects} 1 0.000 0.000 0.000 0.000 {open} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} Forloop Result: 13678125506 4160716 function calls in 142.672 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 84.979 84.979 142.672 142.672 ger.py:9(loop) 1 47.609 47.609 47.609 47.609 {method 'readlines' of 'file' objects} 4160712 10.084 0.000 10.084 0.000 {method 'split' of 'str' objects} 1 0.000 0.000 0.000 0.000 {open} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
So the result here shows the generator version is 3x faster than the for-loop version.