關聯分析(關聯挖掘)是指在交易數據、關係數據或其餘信息載體中,查找存在於項目集合或對象集合之間的頻繁模式、關聯、相關性或因果結構。關聯分析的一個典型例子是購物籃分析。經過發現顧客放入購物籃中不一樣商品之間的聯繫,分析顧客的購買習慣。好比,67%的顧客在購買尿布的同時也會購買啤酒。經過了解哪些商品頻繁地被顧客同時購買,能夠幫助零售商制定營銷策略。分析結果能夠應用於商品貨架佈局、貨存安排以及根據購買模式對顧客進行分類。html
FPGrowth算法是韓嘉煒等人在2000年提出的關聯分析算法,在算法中使用了一種稱爲頻繁模式樹(Frequent Pattern Tree)的數據結構,基於上述數據結構加快整個關聯規則挖掘過程。採起以下分治策略:將提供頻繁項集的數據庫壓縮到一棵頻繁模式樹(FP-Tree),但仍保留項集關聯信息。該算法和Apriori算法最大的不一樣有兩點:第一,不產生候選集,第二,只須要兩次遍歷數據庫,大大提升了效率。java
1、前言node
首先理解頻繁項集中的如下概念:算法
頻繁項:在多個集合中,頻繁出現的元素項。數據庫
頻繁項集:在一系列集合中每項都含有某些相同的元素,這些元素造成一個子集,知足必定閥值就是頻繁項集。數據結構
K項集:K個頻繁項組成的一個集合。app
下面用一個例子(事務數據庫)說明支持度與置信度,每一行爲一個事務,事務由若干個互不相同的項構成,任意幾個項的組合稱爲一個項集。
框架
A E F G
A F G
A B E F G
E F G
支持度:在全部項集中出現的可能性。如項集{A,F,G}的支持數爲3,支持度爲3/4。支持數大於閾值minSuport的項集稱爲頻繁項集。{F,G}的支持數爲4,支持度爲4/4。{A}的支持數爲3,支持度爲3/4。
置信度:頻繁項與某項的並集的支持度與頻繁項集支持度的比值。如{F,G}-->{A}的置信度則爲{A,F,G}的支持數除以{F,G}的支持數,即3/4。{A}-->{F,G}的置信度則爲{A,F,G}的支持數除以{A}的支持數,即3/3。
dom
綜上所述,理論上能夠經過FPGrowth算法從頻繁集中挖掘相關規則,再經過置信度篩選出規則用於推薦功能。在本人這個JavaWeb項目中,使用FPGrowth算法基於全部用戶搜索歷史記錄,結合當前搜索記錄推薦用戶可能感興趣的(置信度大於閾值的搜索記錄)、以及其餘用戶搜索過的(頻繁項集中非當前搜索記錄)。上述僅是我的觀點,若有錯誤之處還請不吝賜教。ide
2、正文
一、用戶搜索記錄實體類:
1 package entity; 2 3 /** 4 * 用戶搜索歷史記錄 5 * @author: yjl 6 * @date: 2018/5/24 7 */ 8 public class TQueryHistory { 9 10 private Integer id; 11 12 private String userAccount; //用戶帳號 13 14 private String queryCorpName; //用戶搜索的企業 15 16 public TQueryHistory() { 17 } 18 19 public TQueryHistory(String userAccount, String queryCorpName) { 20 this.userAccount = userAccount; 21 this.queryCorpName = queryCorpName; 22 } 23 24 public TQueryHistory(Integer id, String userAccount, String queryCorpName) { 25 this.id = id; 26 this.userAccount = userAccount; 27 this.queryCorpName = queryCorpName; 28 } 29 30 public Integer getId() { 31 return id; 32 } 33 34 public void setId(Integer id) { 35 this.id = id; 36 } 37 38 public String getUserAccount() { 39 return userAccount; 40 } 41 42 public void setUserAccount(String userAccount) { 43 this.userAccount = userAccount; 44 } 45 46 public String getQueryCorpName() { 47 return queryCorpName; 48 } 49 50 public void setQueryCorpName(String queryCorpName) { 51 this.queryCorpName = queryCorpName; 52 } 53 54 55 @Override 56 public String toString() { 57 return "TQueryHistory{" + 58 "id=" + id + 59 ", userAccount='" + userAccount + '\'' + 60 ", queryCorpName='" + queryCorpName + '\'' + 61 '}'; 62 } 63 }
二、FPGrowth挖掘相關規則前的數據準備,相似於上述的事務數據庫,corpName爲用戶當前搜索的企業,最後獲得的interestedCorpList與otherSearchCorpList集合分別表示用戶感興趣的企業、其餘用戶搜索過的企業,若集合數量不足能夠根據企業行業等屬性補充:
1 //獲取全部用戶的搜索記錄 2 List<TQueryHistory> allQueryHistory = searchCorpService.getAllQueryHistory(); 3 4 //根據用戶帳號分類 5 Map<String, Integer> accountMap = new HashMap(); 6 for(TQueryHistory tQueryHistory: allQueryHistory){ 7 accountMap.put(tQueryHistory.getUserAccount(),0); 8 } 9 10 //根據已分類帳號分配 11 Map<String,List<String>> newQueryHistoryMap = new HashMap<>(); 12 for(Map.Entry<String,Integer> entry: accountMap.entrySet()){ 13 String account = entry.getKey(); 14 List<String> accountTQueryHistoryList = new ArrayList<>(); 15 for(TQueryHistory tQueryHistory: allQueryHistory){ 16 if(tQueryHistory.getUserAccount().equals(account)){ 17 accountTQueryHistoryList.add(tQueryHistory.getQueryCorpName()); 18 } 19 } 20 newQueryHistoryMap.put(account,accountTQueryHistoryList); 21 } 22 23 //遍歷Map將企業名稱寫入文件,並傳至FPTree 24 String outfile = "QueryHistory.txt"; 25 BufferedWriter bw = new BufferedWriter(new FileWriter(outfile)); 26 for(Map.Entry<String,List<String>> entry: newQueryHistoryMap.entrySet()){ 27 List<String> corpNameList = entry.getValue(); 28 29 bw.write(joinList(corpNameList)); 30 bw.newLine(); 31 } 32 bw.close(); 33 34 //Map取值分別放入對應的集合 35 Map<String, List<String>> corpMap = FPTree.introQueryHistory(outfile,corpName); 36 List<String> interestedCorpList = new ArrayList<>(); 37 List<String> otherSearchCorpList = new ArrayList<>(); 38 for(Map.Entry<String,List<String>> entry: corpMap.entrySet()){ 39 if("interestedCorpList".equals(entry.getKey())){ 40 interestedCorpList = entry.getValue(); 41 } 42 if("otherSearchCorpList".equals(entry.getKey())){ 43 otherSearchCorpList = entry.getValue(); 44 } 45 }
1 //設置文件寫入規則 2 private static String joinList(List<String> list) { 3 if (list == null || list.size() == 0) { 4 return ""; 5 } 6 StringBuilder sb = new StringBuilder(); 7 for (String ele : list) { 8 sb.append(ele); 9 sb.append(","); 10 } 11 return sb.substring(0, sb.length() - 1); 12 }
三、FPStrongAssociationRule類爲強關聯規則變量:
1 package util; 2 3 import java.util.List; 4 5 public class FPStrongAssociationRule { 6 7 public List<String> condition; 8 9 public String result; 10 11 public int support; 12 13 public double confidence; 14 15 }
四、FPTreeNode類爲FPTree的相關變量:
1 package util; 2 3 import java.util.ArrayList; 4 import java.util.List; 5 6 public class FPTreeNode { 7 8 private String name; //節點名稱 9 private int count; //頻數 10 private FPTreeNode parent; //父節點 11 private List<FPTreeNode> children; //子節點 12 private FPTreeNode nextHomonym; //下一個節點(由表頭項維護的那個鏈表) 13 private FPTreeNode tail; //末節點(由表頭項維護的那個鏈表) 14 15 16 17 public FPTreeNode() { 18 } 19 20 public FPTreeNode(String name) { 21 this.name = name; 22 } 23 24 public String getName() { 25 return this.name; 26 } 27 28 public void setName(String name) { 29 this.name = name; 30 } 31 32 public int getCount() { 33 return this.count; 34 } 35 36 public void setCount(int count) { 37 this.count = count; 38 } 39 40 public FPTreeNode getParent() { 41 return this.parent; 42 } 43 44 public void setParent(FPTreeNode parent) { 45 this.parent = parent; 46 } 47 48 public List<FPTreeNode> getChildren() { 49 return this.children; 50 } 51 52 public void setChildren(List<FPTreeNode> children) { 53 this.children = children; 54 } 55 56 public FPTreeNode getNextHomonym() { 57 return this.nextHomonym; 58 } 59 60 public void setNextHomonym(FPTreeNode nextHomonym) { 61 this.nextHomonym = nextHomonym; 62 } 63 64 public FPTreeNode getTail() { 65 return tail; 66 } 67 68 public void setTail(FPTreeNode tail) { 69 this.tail = tail; 70 } 71 72 //添加子節點 73 public void addChild(FPTreeNode child) { 74 if (getChildren() == null) { 75 List<FPTreeNode> list = new ArrayList<>(); 76 list.add(child); 77 setChildren(list); 78 } else { 79 getChildren().add(child); 80 } 81 } 82 83 //查詢子節點 84 public FPTreeNode findChild(String name) { 85 List<FPTreeNode> children = getChildren(); 86 if (children != null) { 87 for (FPTreeNode child : children) { 88 if (child.getName().equals(name)) { 89 return child; 90 } 91 } 92 } 93 return null; 94 } 95 96 97 public void countIncrement(int n) { 98 this.count += n; 99 } 100 101 102 @Override 103 public String toString() { 104 return name; 105 } 106 }
五、FPTree類爲FPGrowth算法挖掘規則,introQueryHistory函數根據傳入全部用戶的搜索記錄以及當前搜索的企業,獲得用戶可能感興趣的企業以及其餘用戶搜索過的企業,以及限制每一個集合中的企業數量:
1 package util; 2 3 import java.io.BufferedReader; 4 import java.io.FileReader; 5 import java.io.IOException; 6 import java.text.DecimalFormat; 7 import java.util.*; 8 import java.util.Map.Entry; 9 10 public class FPTree { 11 12 private int minSuport; //頻繁模式的最小支持數 13 private double confident; //關聯規則的最小置信度 14 private int totalSize; //事務項的總數 15 private Map<List<String>, Integer> frequentMap = new HashMap<>(); //存儲每一個頻繁項及其對應的計數 16 private Set<String> decideAttr = null; //關聯規則中,哪些項可做爲被推導的結果,默認狀況下全部項均可以做爲被推導的結果 17 18 19 20 public void setMinSuport(int minSuport) { 21 this.minSuport = minSuport; 22 } 23 24 public void setConfident(double confident) { 25 this.confident = confident; 26 } 27 28 public void setDecideAttr(Set<String> decideAttr) { this.decideAttr = decideAttr;} 29 30 31 32 /** 33 * 獲取強關聯規則 34 * @return 35 * @Description: 36 */ 37 private List<FPStrongAssociationRule> getRules(List<String> list) { 38 List<FPStrongAssociationRule> rect = new LinkedList<>(); 39 if (list.size() > 1) { 40 for (int i = 0; i < list.size(); i++) { 41 String result = list.get(i); 42 if (decideAttr.contains(result)) { 43 List<String> condition = new ArrayList<>(); 44 condition.addAll(list.subList(0, i)); 45 condition.addAll(list.subList(i + 1, list.size())); 46 FPStrongAssociationRule rule = new FPStrongAssociationRule(); 47 rule.condition = condition; 48 rule.result = result; 49 rect.add(rule); 50 } 51 } 52 } 53 return rect; 54 } 55 56 57 /** 58 * 從若干個文件中讀入Transaction Record,同時把全部項設置爲decideAttr 59 * @return 60 * @Description: 61 */ 62 public List<List<String>> readTransRocords(String[] filenames) { 63 Set<String> set = new HashSet<>(); 64 List<List<String>> transaction = null; 65 if (filenames.length > 0) { 66 transaction = new LinkedList<>(); 67 for (String filename : filenames) { 68 try { 69 FileReader fr = new FileReader(filename); 70 BufferedReader br = new BufferedReader(fr); 71 try { 72 String line; 73 // 一項事務佔一行 74 while ((line = br.readLine()) != null) { 75 if (line.trim().length() > 0) { 76 // 每一個item之間用","分隔 77 String[] str = line.split(","); 78 //每一項事務中的重複項須要排重 79 Set<String> record = new HashSet<>(); 80 for (String w : str) { 81 record.add(w); 82 set.add(w); 83 } 84 List<String> rl = new ArrayList<>(); 85 rl.addAll(record); 86 transaction.add(rl); 87 } 88 } 89 } finally { 90 br.close(); 91 } 92 } catch (IOException ex) { 93 System.out.println("Read transaction records failed." + ex.getMessage()); 94 System.exit(1); 95 } 96 } 97 } 98 99 this.setDecideAttr(set); 100 return transaction; 101 } 102 103 104 /** 105 * 生成一個序列的各類子序列(序列是有順序的) 106 * @param residualPath 107 * @param results 108 */ 109 private void combine(LinkedList<FPTreeNode> residualPath, List<List<FPTreeNode>> results) { 110 if (residualPath.size() > 0) { 111 //若是residualPath太長,則會有太多的組合,內存會被耗盡的 112 FPTreeNode head = residualPath.poll(); 113 List<List<FPTreeNode>> newResults = new ArrayList<>(); 114 for (List<FPTreeNode> list : results) { 115 List<FPTreeNode> listCopy = new ArrayList<>(list); 116 newResults.add(listCopy); 117 } 118 119 for (List<FPTreeNode> newPath : newResults) { 120 newPath.add(head); 121 } 122 results.addAll(newResults); 123 List<FPTreeNode> list = new ArrayList<>(); 124 list.add(head); 125 results.add(list); 126 combine(residualPath, results); 127 } 128 } 129 130 /** 131 * 判斷是否爲單節點 132 * @param root 133 */ 134 private boolean isSingleBranch(FPTreeNode root) { 135 boolean rect = true; 136 while (root.getChildren() != null) { 137 if (root.getChildren().size() > 1) { 138 rect = false; 139 break; 140 } 141 root = root.getChildren().get(0); 142 } 143 return rect; 144 } 145 146 /** 147 * 計算事務集中每一項的頻數 148 * @param transRecords 149 * @return 150 */ 151 private Map<String, Integer> getFrequency(List<List<String>> transRecords) { 152 Map<String, Integer> rect = new HashMap<>(); 153 for (List<String> record : transRecords) { 154 for (String item : record) { 155 Integer cnt = rect.get(item); 156 if (cnt == null) { 157 cnt = new Integer(0); 158 } 159 rect.put(item, ++cnt); 160 } 161 } 162 return rect; 163 } 164 165 /** 166 * 根據事務集合構建FPTree 167 * @param transRecords 168 * @Description: 169 */ 170 public void buildFPTree(List<List<String>> transRecords) { 171 totalSize = transRecords.size(); 172 //計算每項的頻數 173 final Map<String, Integer> freqMap = getFrequency(transRecords); 174 //每條事務中的項按F1排序 175 for (List<String> transRecord : transRecords) { 176 Collections.sort(transRecord, (o1, o2) -> freqMap.get(o2) - freqMap.get(o1)); 177 } 178 FPGrowth(transRecords, null); 179 } 180 181 182 /** 183 * FP樹遞歸生長,從而獲得全部的頻繁模式 184 * @param cpb 條件模式基 185 * @param postModel 後綴模式 186 */ 187 private void FPGrowth(List<List<String>> cpb, LinkedList<String> postModel) { 188 Map<String, Integer> freqMap = getFrequency(cpb); 189 Map<String, FPTreeNode> headers = new HashMap<>(); 190 for (Entry<String, Integer> entry : freqMap.entrySet()) { 191 String name = entry.getKey(); 192 int cnt = entry.getValue(); 193 //每一次遞歸時都有可能出現一部分模式的頻數低於閾值 194 if (cnt >= minSuport) { 195 FPTreeNode node = new FPTreeNode(name); 196 node.setCount(cnt); 197 headers.put(name, node); 198 } 199 } 200 201 FPTreeNode treeRoot = buildSubTree(cpb,headers); 202 //若是隻剩下虛根節點,則遞歸結束 203 if ((treeRoot.getChildren() == null) || (treeRoot.getChildren().size() == 0)) { 204 return; 205 } 206 207 //若是樹是單枝的,則直接把「路徑的各類組合+後綴模式」添加到頻繁模式集中。這個技巧是可選的,即跳過此步進入下一輪遞歸也能夠獲得正確的結果 208 if (isSingleBranch(treeRoot)) { 209 LinkedList<FPTreeNode> path = new LinkedList<>(); 210 FPTreeNode currNode = treeRoot; 211 while (currNode.getChildren() != null) { 212 currNode = currNode.getChildren().get(0); 213 path.add(currNode); 214 } 215 //調用combine時path不宜過長,不然會OutOfMemory 216 if (path.size() <= 20) { 217 List<List<FPTreeNode>> results = new ArrayList<>(); 218 combine(path, results); 219 for (List<FPTreeNode> list : results) { 220 int cnt = 0; 221 List<String> rule = new ArrayList<>(); 222 for (FPTreeNode node : list) { 223 rule.add(node.getName()); 224 cnt = node.getCount(); //cnt最FPTree葉節點的計數 225 } 226 if (postModel != null) { 227 rule.addAll(postModel); 228 } 229 frequentMap.put(rule, cnt); 230 } 231 return; 232 } else { 233 System.err.println("length of path is too long: " + path.size()); 234 } 235 } 236 237 for (FPTreeNode header : headers.values()) { 238 List<String> rule = new ArrayList<>(); 239 rule.add(header.getName()); 240 if (postModel != null) { 241 rule.addAll(postModel); 242 } 243 //表頭項+後綴模式 構成一條頻繁模式(頻繁模式內部也是按照F1排序的),頻繁度爲表頭項的計數 244 frequentMap.put(rule, header.getCount()); 245 //新的後綴模式:表頭項+上一次的後綴模式(注意保持順序,始終按F1的順序排列) 246 LinkedList<String> newPostPattern = new LinkedList<>(); 247 newPostPattern.add(header.getName()); 248 if (postModel != null) { 249 newPostPattern.addAll(postModel); 250 } 251 //新的條件模式基 252 List<List<String>> newCPB; 253 newCPB = new LinkedList<>(); 254 FPTreeNode nextNode = header; 255 while ((nextNode = nextNode.getNextHomonym()) != null) { 256 int counter = nextNode.getCount(); 257 //得到從虛根節點(不包括虛根節點)到當前節點(不包括當前節點)的路徑,即一條條件模式基。注意保持順序:你節點在前,子節點在後,即始終保持頻率高的在前 258 LinkedList<String> path = new LinkedList<>(); 259 FPTreeNode parent = nextNode; 260 while ((parent = parent.getParent()).getName() != null) {//虛根節點的name爲null 261 path.push(parent.getName());//往表頭插入 262 } 263 //事務要重複添加counter次 264 while (counter-- > 0) { 265 newCPB.add(path); 266 } 267 } 268 FPGrowth(newCPB, newPostPattern); 269 } 270 } 271 272 /** 273 * 把全部事務插入到一個FP樹當中 274 * @param transRecords 275 * @param headers 276 * @return 277 */ 278 private FPTreeNode buildSubTree(List<List<String>> transRecords,final Map<String, FPTreeNode> headers) { 279 FPTreeNode root = new FPTreeNode();//虛根節點 280 for (List<String> transRecord : transRecords) { 281 LinkedList<String> record = new LinkedList<>(transRecord); 282 FPTreeNode subTreeRoot = root; 283 FPTreeNode tmpRoot; 284 if (root.getChildren() != null) { 285 //延已有的分支,令各節點計數加1 286 while (!record.isEmpty() 287 && (tmpRoot = subTreeRoot.findChild(record.peek())) != null) { 288 tmpRoot.countIncrement(1); 289 subTreeRoot = tmpRoot; 290 record.poll(); 291 } 292 } 293 //長出新的節點 294 addNodes(subTreeRoot, record, headers); 295 } 296 return root; 297 } 298 299 /** 300 * 往特定的節點下插入一串後代節點,同時維護表頭項到同名節點的鏈表指針 301 * @param ancestor 302 * @param record 303 * @param headers 304 */ 305 private void addNodes(FPTreeNode ancestor, LinkedList<String> record, 306 final Map<String, FPTreeNode> headers) { 307 while (!record.isEmpty()) { 308 String item = record.poll(); 309 //單個項的出現頻數必須大於最小支持數,不然不容許插入FP樹。達到最小支持度的項都在headers中。每一次遞歸根據條件模式基本創建新的FPTree時,把要把頻數低於minSuport的排除在外,這也正是FPTree比窮舉法快的真正緣由 310 if (headers.containsKey(item)) { 311 FPTreeNode leafnode = new FPTreeNode(item); 312 leafnode.setCount(1); 313 leafnode.setParent(ancestor); 314 ancestor.addChild(leafnode); 315 316 FPTreeNode header = headers.get(item); 317 FPTreeNode tail=header.getTail(); 318 if(tail!=null){ 319 tail.setNextHomonym(leafnode); 320 }else{ 321 header.setNextHomonym(leafnode); 322 } 323 header.setTail(leafnode); 324 addNodes(leafnode, record, headers); 325 } 326 327 } 328 } 329 330 /** 331 * 獲取全部的強規則 332 * @return 333 */ 334 public List<FPStrongAssociationRule> getAssociateRule() { 335 assert totalSize > 0; 336 List<FPStrongAssociationRule> rect = new ArrayList<>(); 337 //遍歷全部頻繁模式 338 for (Entry<List<String>, Integer> entry : frequentMap.entrySet()) { 339 List<String> items = entry.getKey(); 340 int count1 = entry.getValue(); 341 //一條頻繁模式能夠生成不少關聯規則 342 List<FPStrongAssociationRule> rules = getRules(items); 343 //計算每一條關聯規則的支持度和置信度 344 for (FPStrongAssociationRule rule : rules) { 345 if (frequentMap.containsKey(rule.condition)) { 346 int count2 = frequentMap.get(rule.condition); 347 double confidence = 1.0 * count1 / count2; 348 if (confidence >= this.confident) { 349 rule.support = count1; 350 rule.confidence = confidence; 351 rect.add(rule); 352 } 353 } else { 354 System.err.println(rule.condition + " is not a frequent pattern, however " 355 + items + " is a frequent pattern"); 356 } 357 } 358 } 359 return rect; 360 } 361 362 /** 363 * 限制List集合中企業數目爲5條 364 */ 365 private static void limitFiveCorp(List<String> corpList) { 366 if(corpList.size() > 5){ 367 Random randomId = new Random(); 368 //對隨機的5個企業名稱排成原來的默認順序 369 List<Integer> indexes = new ArrayList<>(); 370 while(indexes.size() < 5) { 371 int index = randomId.nextInt(corpList.size()); 372 if(!indexes.contains(index)) { 373 indexes.add(index); 374 } 375 } 376 Collections.sort(indexes); 377 //取出indexes對應的list放到newList 378 List<String> tempRelationsCorpList = new ArrayList<>(); 379 for(int index : indexes) { 380 tempRelationsCorpList.add(corpList.get(index)); 381 } 382 corpList.clear(); 383 corpList.addAll(tempRelationsCorpList); 384 } 385 } 386 387 388 public static Map<String, List<String>> introQueryHistory(String outfile,String corpName) { 389 FPTree fpTree = new FPTree(); 390 391 //設置置信度與支持數 392 fpTree.setConfident(0.3); 393 fpTree.setMinSuport(3); 394 395 List<List<String>> trans = fpTree.readTransRocords(new String[] { outfile }); 396 for(int i = 1;i < trans.size() - 1;i++){ 397 System.out.println("第"+i+"行數據:"+ trans.get(i)); 398 } 399 400 fpTree.buildFPTree(trans); 401 402 List<FPStrongAssociationRule> rules = fpTree.getAssociateRule(); 403 DecimalFormat dfm = new DecimalFormat("#.##"); 404 405 Map<String, String> interestedCorpMap = new HashMap<>(); //須要返回的關聯企業(您可能感興趣的公司) 406 Map<String, String> otherSearchCorpMap = new HashMap<>(); //須要返回的關聯企業(其餘人還搜過的公司) 407 //根據置信度查詢關聯企業用於返回感興趣的公司 408 for (FPStrongAssociationRule rule : rules) { 409 System.out.println(rule.condition + "->" + rule.result + "\t" + dfm.format(rule.support) + "\t" + dfm.format(rule.confidence)); 410 List<String> corpCondition = rule.condition; 411 for(int i = 0;i < corpCondition.size();i++){ 412 if(corpName.equals(corpCondition.get(i))){ 413 interestedCorpMap.put(rule.result,dfm.format(rule.confidence)); 414 } 415 } 416 if(corpName.equals(rule.result)){ 417 for(int i = 0;i < corpCondition.size();i++){ 418 if(!corpName.equals(corpCondition.get(i))){ 419 interestedCorpMap.put(corpCondition.get(i),dfm.format(rule.confidence)); 420 } 421 } 422 } 423 } 424 425 //根據多項集查詢關聯企業用於返回其它搜過的公司 426 for (FPStrongAssociationRule rule : rules) { 427 List<String> corpCondition = rule.condition; 428 for (int i = 0; i < corpCondition.size(); i++) { 429 if (corpName.equals(corpCondition.get(i)) && corpCondition.size() > 1) { 430 for (int j = 0; j < corpCondition.size(); j++) { 431 if (!corpName.equals(corpCondition.get(j))) { 432 otherSearchCorpMap.put(corpCondition.get(j), "0.00"); 433 } 434 } 435 } 436 } 437 } 438 439 440 List<String> interestedCorpList = new ArrayList<>(); 441 List<String> otherSearchCorpList = new ArrayList<>(); 442 for(Map.Entry<String,String> entry: interestedCorpMap.entrySet()){ 443 interestedCorpList.add(entry.getKey()); 444 } 445 for(Map.Entry<String,String> entry: otherSearchCorpMap.entrySet()){ 446 otherSearchCorpList.add(entry.getKey()); 447 } 448 449 limitFiveCorp(interestedCorpList); 450 limitFiveCorp(otherSearchCorpList); 451 452 Map<String, List<String>> corpMap = new HashMap<>(); 453 corpMap.put("interestedCorpList",interestedCorpList); 454 corpMap.put("otherSearchCorpList",otherSearchCorpList); 455 456 return corpMap; 457 } 458 459 460 }
附上控制檯打印部分截圖:
3、總結
在上面的代碼中將整個事務數據庫傳給FPGrowth,在實際中這是不可取的,由於內存不可能容下整個事務數據庫,咱們可能須要從關係數據庫中一條一條地讀入來創建FP-Tree。但不管如何 FP-Tree是確定須要放在內存中的,但內存若是容不下怎麼辦?另外FPGrowth仍然是很是耗時的,想提升速度怎麼辦?解決辦法:分而治之,並行計算。
在實踐中,關聯規則挖掘可能並不像人們指望的那麼有用。一方面是由於支持度置信度框架會產生過多的規則,並非每個規則都是有用的。另外一方面大部分的關聯規則並不像「啤酒與尿布」這種經典故事這麼廣泛。關聯規則分析是須要技巧的,有時須要用更嚴格的統計學知識來控制規則的增殖。
本文部分學習參考了:http://www.cnblogs.com/zhangchaoyang/articles/2198946.html
至此是關於關聯分析FPGrowth算法在JavaWeb項目中的應用,上述僅是我的觀點,僅供參考。
若有疏漏錯誤之處,還請不吝賜教!