在結核菌中,以12SNP爲閾值判斷菌株間是否爲近期傳播。利用生物信息學方法從原始測序結果獲得各菌株的SNP文件,經過互相比較獲得兩個文件:(1)各菌株間相互比較的狀況,格式爲:菌株AVS菌株B SNP距離,總行數爲n*(n-1)/2 (2)利用文件1獲得近期傳播的菌株列表接下來要把這些菌株串起來,造成傳播鏈,核心思想是:鏈內的任一菌株與至少1個鏈內其它菌株的SNP距離<=12;鏈內的任一菌株與鏈外全部菌株的距離都>12所以,核心思想也就是「迭代」,並不斷刪除數據,減小計算量import sysimport ref1=open(sys.argv[1]).readlines() #strain distancef2=open(sys.argv[2]).readlines() #distance <= 12 strain listf3=open("rough_transmission_chain","a")sum_chain_list=[i.rstrip().split()[0] for i in f1 if int(i.rstrip().split()[1])<=12]def find_transmission_strain(strain): cycle_strain = [] match_strain = [] True_strain = [i for i in sum_chain_list if i.startswith(strain + "VS") or i.endswith("VS" + strain)] for item in True_strain: temp = re.sub("VS", "", item) sum_chain_list.remove(item) if temp[:len(strain)] == strain: match_strain.append(temp[len(strain):]) cycle_strain.append(temp[len(strain):]) else: match_strain.append(temp[0:len(temp) - len(strain)]) cycle_strain.append(temp[0:len(temp) - len(strain)]) if len(cycle_strain) > 0: for a in range(len(cycle_strain)): for b in range(len(cycle_strain)): try: sum_chain_list.remove(cycle_strain[a] + "VS" + cycle_strain[b]) except ValueError as e: pass for i in cycle_strain: match_strain.extend(find_transmission_strain(i)) return set(match_strain)temp = []for i in f2: i = i.rstrip()if i not in temp: a = list(find_transmission_strain(i)) a.append(i) temp.extend(a) for strain in a: f3.write(strain + "\t") f3.write("\n")else: pass