退而求其次(3)——宿管員的煩惱

時間 2019-12-12

標籤退而求其次煩惱简体版

原文原文鏈接

　　書接上文，在《退而求其次(1)——隨機法》中宿管員使用了隨機法分配宿舍，如今嘗試使用遺傳算法。html

順序編碼和初始種羣

　　遺傳算法的首要問題是基因編碼。對於分宿舍問題，每種分配方案是一個個體，其基因序列的每個編碼表明一個同窗，要求處於同一基因序列中的全部基因代碼均不能重複，也就是每一個同窗都是獨一無二的。在這種規則下，使用二進制編碼就顯得笨拙了。一種簡單的編碼方案是直接使用同窗的序號做爲基因編碼，這種編碼稱爲順序碼。算法

　　順序碼又稱天然數編碼，使用從1到n 的天然數進行編碼，且不容許重複。例如[1,2,3,4,5,6,7,8,9,10,11,12]是一個合法的個體，表示按編碼從左到右的順序四人一個宿舍，而[1,1,3,3,6,6,8,8,9,10,11,12]則不是。app

　　隨機選擇1000個個體做爲初始種羣，約佔解空間的20%：dom

1 POPULATION_SIZE = 1000 # 種羣數量
2
3 def init_population():
4     ''' 構造初始種羣 '''
5     population = []
6     code_len = len(base_data.STUDENTS_NAME)  # 編碼長度
7     for i in range(POPULATION_SIZE):
8         population.append(base_data.upset())
9     return population

適應度評估和種羣選擇

　　能夠利用成本函數cost_fun來進行適應度評估。因爲cost_fun識別的是解而不是基因編碼，所以在使用cost_fun以前還須要經過solution_adapter將基因編碼適配成解，將基因編碼[1,2,3,4,5,6,7,8,9,10,11,12]翻譯成cost_fun可以有效計算的[[1,2,3,4],[5,6,7,8],[9,10,11,12]]。函數

 1 def solution_adapter(code):
 2     ''' 將基因編碼翻適配成cost_fun可以識別的解 '''
 3     solution = []
 4     for i in range(0, len(base_data.STUDENTS_NAME), base_data.NUM_PER_DROM):
 5         solution.append(code[i:i + base_data.NUM_PER_DROM])
 6     return solution
 7
 8 def fitness_fun(code):
 9     '''
10     適應度評估
11     :param code:   二進制基因編碼
12     :return: 適應度評估值, 二元組, (宿舍總成本, 每一個宿舍的成本)
13     '''
14     return base_data.cost_fun(solution_adapter(code))

　　咱們依然使用錦標賽法選擇種羣。因爲成本越低表示適應度越越高，所以在錦標賽中適應度值低的是勝出者。post

1 def selection(population):
2     '''選擇策略, 錦標賽法'''
3     pop_next = []  # 下一代種羣
4     for i in range(POPULATION_SIZE):
5         tour_list = random.choices(population, k=2) # 二元錦標賽
6         winner = min(tour_list, key=lambda x: fitness_fun(x)) # 成本低的勝出
7         pop_next.append(winner)
8     return pop_next

部分匹配交叉和循環交叉

　　對於宿舍編碼來講，單點交叉和兩點交叉都沒法產生合法的個體，下圖展現了一個不合法的單點交叉。學習

　　在交叉後的個體 r1' 中，基因代碼6和8出現了兩次，表示6號同窗和8號同窗同時住在兩個宿舍，這顯然不是一個合法的解。反過來，若是交叉後獲得了合法的個體，那麼新個體又會和它們的父代沒有任何區別，即沒有產生任何新個體：編碼

　　r1和 r1' 沒有任何區別，最後一個宿舍的四個同窗僅僅是交換了一下牀位。看來必須另闢他徑，尋找其它的交叉策略。spa

部分匹配交叉

　　部分匹配交叉（Partially Matched Crossover，PMC）在1985年被提出，是由兩點交叉改進而來的。部分匹配交叉的第一步和兩點交叉同樣，首先在個體基因序列中隨機設置兩個交叉點，而後隨機選擇兩個個體作爲父代個體，相互交換它們交叉點之間的那部分基因塊翻譯

　　在交叉時須要記住交叉前的基因塊r1→[5,6,7,8]，r2→[9,11,2,4] 。

　　接下來對交叉後生成的新個體 r1' 和 r2' 中的×部分，分別繼承r1和 r2 中對應位置的編碼，若是待繼承的編碼在交換後的基因塊中，則不作繼承：

　　最後，把交叉前記住的基因塊按順序依次填入×部分，獲得最終的r1'和 r2' ：

　　若是交叉前的編碼已經在 r1' 中，則略過該編碼。下圖在替換x時須要略過2和6：

　　r1 是初始個體，它的基因塊[2,4,6,9]與另外一個個體的對應基因塊[2,6,7,8]交叉，獲得[×,×,×,×.2,6,7,8,×,×,×,×]。繼承 r1 後獲得 r1' = [1,3,5,×,2,6,7,8,×,10,11,12]。在[2,4,6,9]中，編碼2和6已經在 r1' 中，所以只有4和9能夠替換對應的×。

　　部分匹配交叉的編碼以下。

 1 def crossover_pmc(population):
 2     ''' 部分匹配交叉（PMC）'''
 3
 4     def create_mapping(cross_code_1, cross_code_2):
 5         '''
 6         創建兩個交叉片斷間的映射關係
 7         :param cross_code_1:
 8         :param cross_code_2:
 9         :return:  映射關係set
10         '''
11         mapping = set()
12         for i in range(len(cross_code_1)):
13             c1, c2 = cross_code_1[i], cross_code_2[i]
14             if (c1, c2) not in mapping and  (c2, c1) not in mapping:
15                 mapping.add((c1, c2))
16         return mapping
17
18     def code_extends(child, parent, start, end):
19         '''
20         繼承父代的編碼
21         :param child: 子代個體
22         :param parent: 父代個體
23         :param start: 基因編碼起始位置
24         :param end: 基因編碼終止位置
25         '''
26         for i in range(start, end):
27            if parent[i] not in child:
28                child[i] = parent[i]
29
30     def code_rest(child, cross_code):
31         '''
32         通交叉前的基因片斷修改子代的編碼
33         :param child: 子代個體
34         :param cross_code: 交叉前的的基因片斷
35         :return:
36         '''
37         for i, x in enumerate(child):
38             if x != -1:
39                 continue
40             for x_old in cross_code:
41                 if x_old not in child:
42                     child[i] = x_old
43                     break
44
45     pop_new = []  # 新種羣
46     code_len = len(population[0])  # 基因編碼的長度
47     for i in range(POPULATION_SIZE):
48         # 選擇兩個隨機的交叉點
49         p1, p2 = random.randint(0, code_len - 1), random.randint(0, code_len - 1)
50         if p1 > p2:
51             p1, p2 = p2, p1
52         parent1, parent2 = random.choices(population, k=2)  # 選擇兩個隨機的個體
53         cross_code_1 = parent1[p1:p2] # 交叉前的編碼塊
54         cross_code_2 = parent2[p1:p2] # 交叉後的編碼塊
55         # 構造新的個體，-1表示基因編碼還沒有肯定
56         r = [-1] * p1 + cross_code_2 + [-1] * (code_len - p2)
57         code_extends(r, parent1, 0, p1) #  繼承父代的編碼
58         code_extends(r, parent1, p2, code_len) #  繼承父代的編碼
59         mapping = create_mapping(cross_code_1, cross_code_2) # 兩個交叉塊的映射關係
60         code_rest(r, cross_code_1)  # 經過交換前的基因片斷肯定剩餘編碼
61         pop_new.append(r)
62     return pop_new

循環交叉

　　循環交叉（Cycle Crossover，CX）是另外一種適合順序編碼的交叉策略。不一樣於其它交叉策略，循環交叉不需事先要選擇交叉點。

　　假設有兩個父代個體

　　先從 r1 中選擇第0個編碼，做爲子代 r1' 的第一個編碼：

　　r2 的第0個編碼是2，所以 r1' 中第2個被肯定的編碼是2：

　　2在 r1 中的序號是1， r2[1]=3, r1' 中第3個肯定的編碼是3：

　　3在 r1 中的序號是2， r2[2]=5， r1' 中第4個肯定的編碼是5：

　　5在 r1 中的序號是3， r2[3]=1 ，和 r1' 中第1個編碼相同，至此稱爲一個循環。剩餘未肯定的編碼從 r2 的對應位置映射便可

　　循環交叉匹配的代碼以下：

 1 def crossover_cx(population):
 2     ''' 循環交叉匹配 '''
 3     pop_new = []  # 新種羣
 4     code_len = len(population[0])  # 基因編碼的長度
 5     for i in range(POPULATION_SIZE):
 6         parent1, parent2 = random.choices(population, k=2)  # 選擇兩個隨機的個體
 7         r_new = [-1] * code_len  # 新個體
 8         r_new[0] = parent1[0]
 9         i = 0
10         while True:  # 循環交叉
11             x = parent2[i]
12             if r_new[0] == x:
13                 break
14             i = parent1.index(x)
15             r_new[i] = x
16         # r_new中剩餘未肯定的編碼直接從parent2中繼承
17         for i, x in enumerate(r_new):
18             if x == -1:
19                 r_new[i] = parent2[i]
20         pop_new.append(r_new)
21     return pop_new

變異

　　順序編碼的變異策略很簡單，僅僅是將兩個隨機變異點的編碼互相交換：

 1 def mutation(population):
 2     ''' 變異 '''
 3     code_len = len(population[0])  # 基因編碼的長度
 4     mp = 0.2  # 變異率
 5     for i, r in enumerate(population):
 6         if random.random() < mp:
 7             # 兩個隨機變異點
 8             p1, p2 = random.randint(0, code_len - 1), random.randint(0, code_len - 1)
 9             # 交換兩個變異點的數據
10             population[p1], population[p2] = population[p2], population[p1]

分配宿舍

　　準備工做已經就緒，能夠開始使用遺傳算法分配宿舍：

 1 def sum_fitness(population):
 2     ''' 計算種羣的總適應度 '''
 3     return sum([fitness_fun(code)[0] for code in population])
 4
 5 def ga():
 6     ''' 遺傳算法分配宿舍 '''
 7     population = init_population() # 構建初始化種羣
 8     s_fitness = sum_fitness(population) # 種羣的總適應度
 9     i = 0
10     while i < 10: # 若是連續10代沒有改進，結束算法
11         pop_next = selection(population) # 選擇種羣
12         pop_new = crossover_cx(pop_next) # 交叉
13         mutation(pop_new) # 變異
14         s_fitness_new = sum_fitness(pop_new) # 新種羣的總適應度
15         if s_fitness > s_fitness_new: # 成本越低，適應度越高
16             s_fitness = s_fitness_new
17             i = 0
18         else:
19             i += 1
20         population = pop_new
21     # 按適應度值從大到小排序
22     population = sorted(population, key=lambda x: fitness_fun(x), reverse=True)
23     # 返回最優的個體
24     return population[0]
25
26 if __name__ == '__main__':
27     best = ga()
28     solution = solution_adapter(best)
29     total_cost, dorms_cost = base_data.cost_fun(solution)
30     base_data.print_solution(solution, total_cost, dorms_cost)

　　須要注意的是第15行的適應度比較，因爲這裏使用的是成本函數，所以種羣的成本值越低，適應度越高，越應該被保留。一種可能的運行結果：

　　能夠看到，在總成本較低的同時，「不均」的問題也獲得瞭解決。

整體代碼

　　base_data.py:

  1 import random
  2
  3 # 學生調查表數據
  4 STUDENTS = [
  5     [32, 1, 2, 2, [11, 33, 42], 5],
  6     [32, 1, 2, 1, [11], 5],
  7     [41, 1, 2, 5, [21, 22], 4],
  8     [43, 2, 3, 3, [11, 21], 3],
  9     [36, 2, 3, 4, [11, 33], 3],
 10     [44, 2, 3, 4, [41, 42], 2],
 11     [42, 1, 2, 1, [11, 12], 1],
 12     [32, 1, 1, 2, [31, 32], 2],
 13     [61, 1, 1, 3, [51], 3],
 14     [61, 1, 1, 2, [13], 3],
 15     [44, 3, 4, 1, [21, 43], 1],
 16     [22, 3, 4, 4, [22, 43], 2]
 17 ]
 18 # 學生姓名
 19 STUDENTS_NAME = ['蕾娜', '琪琳', '薔薇', '炙心',
 20                  '靈犀', '莫伊', '憐風', '語琴',
 21                  '涼冰', '鶴熙', '瑞萌萌', '何蔚藍']
 22 DROM_SIZE = 3  # 宿舍數量
 23 NUM_PER_DROM = 4 # 每一個宿舍的人數
 24 MAX_COST = 5 # 同一維度間的最大差別
 25
 26 def cost_stu(stu_1, stu_2):
 27     ''' 以stu_1爲主，計算stu_1與stu_2的差別 '''
 28     cost = [] # 各維度的成本值（差別度）
 29     cost.append(cost_equal(stu_1[0], stu_2[0])) # 籍貫成本
 30     cost.append(cost_equal(stu_1[1], stu_2[1])) # 專業成本
 31     cost.append(cost_class(stu_1[2], stu_2[2], stu_1[1], stu_2[1])) # 班級成本
 32     cost.append(cost_get_up(stu_1[3], stu_2[3])) # 起牀成本
 33     cost.append(cost_interest(stu_1[4], stu_2[4])) # 愛好成本
 34     w_idx_1, w_idx_2 = stu_1[len(stu_1) - 1] - 1, stu_2[len(stu_2) - 1] - 1 # 權重序號
 35     w_cost_1, w_cost_2 = cost[w_idx_1] * 1.5,  cost[w_idx_2] * 1.5  # 加權處理
 36     # 判斷兩者最在乎的是否相同
 37     if w_idx_1 == w_idx_1:
 38         cost[w_idx_1] = w_cost_1 + w_cost_2
 39     else:
 40         cost[w_idx_1], cost[w_idx_2] = w_cost_1, w_cost_2
 41
 42     return sum(cost)
 43
 44 def cost_equal(d_1, d_2):
 45     ''' 同質化比較成本  '''
 46     return 0 if d_1 == d_2 else MAX_COST
 47
 48 def cost_class(d_1, d_2, sub_1, sub_2):
 49     ''' 班級成本 '''
 50     if d_1 == d_1: # 班級相同
 51         return 0
 52     elif sub_1 == sub_1: # 不一樣班級，同一專業
 53         return 1
 54     else: # 不一樣班級，不一樣專業
 55         return MAX_COST
 56
 57 def cost_get_up(d_1, d_2):
 58     ''' 起牀成本 '''
 59     return 1.2 * (d_2 - d_1)
 60
 61 def cost_interest(d_1, d_2):
 62     ''' 愛好成本 '''
 63     for t_1 in d_1:
 64         # 若是兩個同窗都有一個共同的愛好，兩者就是零距離
 65         if t_1 in d_2:
 66             return 0
 67
 68         obj_1 = t_1 // 10 # 愛好的「大類」
 69         # 若是兩個同窗都有一個共同的大類，兩者距離是2
 70         for t_2 in d_2:
 71             if obj_1 == t_2 // 10:
 72                 return 2
 73     return MAX_COST
 74
 75 def cost_fun(solution):
 76     '''
 77     計算方案中每一個宿舍的成本
 78     :param solution: 宿舍分配方案
 79     :return: 宿舍總成本和每一個宿舍的成本
 80     '''
 81     droms_cost = []
 82     for drom in solution:
 83         # 同一宿舍中的四個同窗兩兩比對
 84         d_cost = 0
 85         d_cost += cost_stu(STUDENTS[drom[0]], STUDENTS[drom[1]])
 86         d_cost += cost_stu(STUDENTS[drom[0]], STUDENTS[drom[2]])
 87         d_cost += cost_stu(STUDENTS[drom[0]], STUDENTS[drom[3]])
 88         d_cost += cost_stu(STUDENTS[drom[1]], STUDENTS[drom[2]])
 89         d_cost += cost_stu(STUDENTS[drom[1]], STUDENTS[drom[3]])
 90         d_cost += cost_stu(STUDENTS[drom[2]], STUDENTS[drom[3]])
 91         droms_cost.append(d_cost)
 92     diff_cost = max(droms_cost) - min(droms_cost) # 宿舍之間的貧富差
 93     total_cost = sum(droms_cost) + diff_cost * DROM_SIZE # 該方案的總成本
 94     return total_cost, droms_cost
 95
 96 def upset():
 97     ''' 打亂學生順序 '''
 98     n = len(STUDENTS_NAME)
 99     stu_list = list(range(n))
100     # 打亂學生順序
101     for i in range(n):
102         rand_idx = random.randint(0, n - 1)
103         stu_list[i], stu_list[rand_idx] = stu_list[rand_idx], stu_list[i]
104     return stu_list
105
106 def print_solution(solution, total_cost, dorms_cost):
107     for i, drom in enumerate(solution):
108         print('宿舍%d:\t' % i, end='')
109         for j in drom:
110             print('%-8s' % STUDENTS_NAME[j], end='')
111         print('\tcost=%f' % dorms_cost[i])
112     print('total=%f' % total_cost)

genetic_optimize.py

  1 from __future__ import division
  2 import random
  3 import os
  4 import sys
  5 parent_dir_name = os.path.dirname(os.path.realpath(__file__))
  6 sys.path.append(parent_dir_name )
  7 import base_data
  8
  9 POPULATION_SIZE = 1000 # 種羣數量
 10
 11 def init_population():
 12     ''' 構造初始種羣 '''
 13     population = []
 14     code_len = len(base_data.STUDENTS_NAME)  # 編碼長度
 15     for i in range(POPULATION_SIZE):
 16         population.append(base_data.upset())
 17     return population
 18
 19 def solution_adapter(code):
 20     ''' 將基因編碼翻適配成cost_fun可以識別的解 '''
 21     solution = []
 22     for i in range(0, len(base_data.STUDENTS_NAME), base_data.NUM_PER_DROM):
 23         solution.append(code[i:i + base_data.NUM_PER_DROM])
 24     return solution
 25
 26 def fitness_fun(code):
 27     '''
 28     適應度評估
 29     :param code:   二進制基因編碼
 30     :return: 適應度評估值, 二元組, (宿舍總成本, 每一個宿舍的成本)
 31     '''
 32     return base_data.cost_fun(solution_adapter(code))
 33
 34 def selection(population):
 35     '''選擇策略, 錦標賽法'''
 36     pop_next = []  # 下一代種羣
 37     for i in range(POPULATION_SIZE):
 38         tour_list = random.choices(population, k=2) # 二元錦標賽
 39         winner = min(tour_list, key=lambda x: fitness_fun(x)) # 成本低的勝出
 40         pop_next.append(winner)
 41     return pop_next
 42
 43 def crossover_pmc(population):
 44     ''' 部分匹配交叉（PMC）'''
 45
 46     def create_mapping(cross_code_1, cross_code_2):
 47         '''
 48         創建兩個交叉片斷間的映射關係
 49         :param cross_code_1:
 50         :param cross_code_2:
 51         :return:  映射關係set
 52         '''
 53         mapping = set()
 54         for i in range(len(cross_code_1)):
 55             c1, c2 = cross_code_1[i], cross_code_2[i]
 56             if (c1, c2) not in mapping and  (c2, c1) not in mapping:
 57                 mapping.add((c1, c2))
 58         return mapping
 59
 60     def code_extends(child, parent, start, end):
 61         '''
 62         繼承父代的編碼
 63         :param child: 子代個體
 64         :param parent: 父代個體
 65         :param start: 基因編碼起始位置
 66         :param end: 基因編碼終止位置
 67         '''
 68         for i in range(start, end):
 69            if parent[i] not in child:
 70                child[i] = parent[i]
 71
 72     def code_rest(child, cross_code):
 73         '''
 74         通交叉前的基因片斷修改子代的編碼
 75         :param child: 子代個體
 76         :param cross_code: 交叉前的的基因片斷
 77         :return:
 78         '''
 79         for i, x in enumerate(child):
 80             if x != -1:
 81                 continue
 82             for x_old in cross_code:
 83                 if x_old not in child:
 84                     child[i] = x_old
 85                     break
 86
 87     pop_new = []  # 新種羣
 88     code_len = len(population[0])  # 基因編碼的長度
 89     for i in range(POPULATION_SIZE):
 90         # 選擇兩個隨機的交叉點
 91         p1, p2 = random.randint(0, code_len - 1), random.randint(0, code_len - 1)
 92         if p1 > p2:
 93             p1, p2 = p2, p1
 94         parent1, parent2 = random.choices(population, k=2)  # 選擇兩個隨機的個體
 95         cross_code_1 = parent1[p1:p2] # 交叉前的編碼塊
 96         cross_code_2 = parent2[p1:p2] # 交叉後的編碼塊
 97         # 構造新的個體，-1表示基因編碼還沒有肯定
 98         r = [-1] * p1 + cross_code_2 + [-1] * (code_len - p2)
 99         code_extends(r, parent1, 0, p1) #  繼承父代的編碼
100         code_extends(r, parent1, p2, code_len) #  繼承父代的編碼
101         mapping = create_mapping(cross_code_1, cross_code_2) # 兩個交叉塊的映射關係
102         code_rest(r, cross_code_1)  # 經過交換前的基因片斷肯定剩餘編碼
103         pop_new.append(r)
104     return pop_new
105
106 def crossover_cx(population):
107     ''' 循環交叉匹配 '''
108     pop_new = []  # 新種羣
109     code_len = len(population[0])  # 基因編碼的長度
110     for i in range(POPULATION_SIZE):
111         parent1, parent2 = random.choices(population, k=2)  # 選擇兩個隨機的個體
112         r_new = [-1] * code_len  # 新個體
113         r_new[0] = parent1[0]
114         i = 0
115         while True:  # 循環交叉
116             x = parent2[i]
117             if r_new[0] == x:
118                 break
119             i = parent1.index(x)
120             r_new[i] = x
121         # r_new中剩餘未肯定的編碼直接從parent2中繼承
122         for i, x in enumerate(r_new):
123             if x == -1:
124                 r_new[i] = parent2[i]
125         pop_new.append(r_new)
126     return pop_new
127
128 def mutation(population):
129     ''' 變異 '''
130     code_len = len(population[0])  # 基因編碼的長度
131     mp = 0.2  # 變異率
132     for i, r in enumerate(population):
133         if random.random() < mp:
134             # 兩個隨機變異點
135             p1, p2 = random.randint(0, code_len - 1), random.randint(0, code_len - 1)
136             # 交換兩個變異點的數據
137             population[p1], population[p2] = population[p2], population[p1]
138
139 def sum_fitness(population):
140     ''' 計算種羣的總適應度 '''
141     return sum([fitness_fun(code)[0] for code in population])
142
143 def ga():
144     ''' 遺傳算法分配宿舍 '''
145     population = init_population() # 構建初始化種羣
146     s_fitness = sum_fitness(population) # 種羣的總適應度
147     i = 0
148     while i < 10: # 若是連續10代沒有改進，結束算法
149         pop_next = selection(population) # 選擇種羣
150         pop_new = crossover_cx(pop_next) # 交叉
151         mutation(pop_new) # 變異
152         s_fitness_new = sum_fitness(pop_new) # 新種羣的總適應度
153         if s_fitness > s_fitness_new: # 成本越低，適應度越高
154             s_fitness = s_fitness_new
155             i = 0
156         else:
157             i += 1
158         population = pop_new
159     # 按適應度值從大到小排序
160     population = sorted(population, key=lambda x: fitness_fun(x), reverse=True)
161     # 返回最優的個體
162     return population[0]
163
164 if __name__ == '__main__':
165     best = ga()
166     solution = solution_adapter(best)
167     total_cost, dorms_cost = base_data.cost_fun(solution)
168     base_data.print_solution(solution, total_cost, dorms_cost)

　　做者：我是8位的

　　出處：http://www.cnblogs.com/bigmonkey

　　本文以學習、研究和分享爲主，如需轉載，請聯繫本人，標明做者和出處，非商業用途！

　　掃描二維碼關注公衆號「我是8位的」

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。