關聯分析FPGrowth算法在JavaWeb項目中的應用

關聯分析(關聯挖掘)是指在交易數據、關係數據或其餘信息載體中,查找存在於項目集合或對象集合之間的頻繁模式、關聯、相關性或因果結構。關聯分析的一個典型例子是購物籃分析。經過發現顧客放入購物籃中不一樣商品之間的聯繫,分析顧客的購買習慣。好比,67%的顧客在購買尿布的同時也會購買啤酒。經過了解哪些商品頻繁地被顧客同時購買,能夠幫助零售商制定營銷策略。分析結果能夠應用於商品貨架佈局、貨存安排以及根據購買模式對顧客進行分類。html

FPGrowth算法是韓嘉煒等人在2000年提出的關聯分析算法,在算法中使用了一種稱爲頻繁模式樹(Frequent Pattern Tree)的數據結構,基於上述數據結構加快整個關聯規則挖掘過程。採起以下分治策略:將提供頻繁項集的數據庫壓縮到一棵頻繁模式樹(FP-Tree),但仍保留項集關聯信息。該算法和Apriori算法最大的不一樣有兩點:第一,不產生候選集,第二,只須要兩次遍歷數據庫,大大提升了效率。java

 

1、前言node

首先理解頻繁項集中的如下概念:算法

頻繁項:在多個集合中,頻繁出現的元素項。數據庫

頻繁項集:在一系列集合中每項都含有某些相同的元素,這些元素造成一個子集,知足必定閥值就是頻繁項集。數據結構

K項集:K個頻繁項組成的一個集合。app

 

下面用一個例子(事務數據庫)說明支持度與置信度,每一行爲一個事務,事務由若干個互不相同的項構成,任意幾個項的組合稱爲一個項集。
框架

A  E  F  G
A  F  G
A  B  E  F  G
E  F  G

支持度:在全部項集中出現的可能性。如項集{A,F,G}的支持數爲3,支持度爲3/4。支持數大於閾值minSuport的項集稱爲頻繁項集。{F,G}的支持數爲4,支持度爲4/4。{A}的支持數爲3,支持度爲3/4。
置信度:頻繁項與某項的並集的支持度與頻繁項集支持度的比值。如{F,G}-->{A}的置信度則爲{A,F,G}的支持數除以{F,G}的支持數,即3/4。{A}-->{F,G}的置信度則爲{A,F,G}的支持數除以{A}的支持數,即3/3。

dom

 

綜上所述,理論上能夠經過FPGrowth算法從頻繁集中挖掘相關規則,再經過置信度篩選出規則用於推薦功能。在本人這個JavaWeb項目中,使用FPGrowth算法基於全部用戶搜索歷史記錄,結合當前搜索記錄推薦用戶可能感興趣的(置信度大於閾值的搜索記錄)、以及其餘用戶搜索過的(頻繁項集中非當前搜索記錄)。上述僅是我的觀點,若有錯誤之處還請不吝賜教。ide

 

2、正文

一、用戶搜索記錄實體類:

 1 package entity;
 2 
 3 /**
 4  * 用戶搜索歷史記錄
 5  * @author: yjl
 6  * @date: 2018/5/24
 7  */
 8 public class TQueryHistory {
 9 
10     private Integer id;
11 
12     private String userAccount;    //用戶帳號
13 
14     private String queryCorpName;  //用戶搜索的企業
15 
16     public TQueryHistory() {
17     }
18 
19     public TQueryHistory(String userAccount, String queryCorpName) {
20         this.userAccount = userAccount;
21         this.queryCorpName = queryCorpName;
22     }
23 
24     public TQueryHistory(Integer id, String userAccount, String queryCorpName) {
25         this.id = id;
26         this.userAccount = userAccount;
27         this.queryCorpName = queryCorpName;
28     }
29 
30     public Integer getId() {
31         return id;
32     }
33 
34     public void setId(Integer id) {
35         this.id = id;
36     }
37 
38     public String getUserAccount() {
39         return userAccount;
40     }
41 
42     public void setUserAccount(String userAccount) {
43         this.userAccount = userAccount;
44     }
45 
46     public String getQueryCorpName() {
47         return queryCorpName;
48     }
49 
50     public void setQueryCorpName(String queryCorpName) {
51         this.queryCorpName = queryCorpName;
52     }
53 
54 
55     @Override
56     public String toString() {
57         return "TQueryHistory{" +
58                 "id=" + id +
59                 ", userAccount='" + userAccount + '\'' +
60                 ", queryCorpName='" + queryCorpName + '\'' +
61                 '}';
62     }
63 }

 

二、FPGrowth挖掘相關規則前的數據準備,相似於上述的事務數據庫,corpName爲用戶當前搜索的企業,最後獲得的interestedCorpList與otherSearchCorpList集合分別表示用戶感興趣的企業、其餘用戶搜索過的企業,若集合數量不足能夠根據企業行業等屬性補充

 1 //獲取全部用戶的搜索記錄
 2 List<TQueryHistory> allQueryHistory = searchCorpService.getAllQueryHistory();
 3 
 4 //根據用戶帳號分類
 5 Map<String, Integer> accountMap = new HashMap();
 6 for(TQueryHistory tQueryHistory: allQueryHistory){
 7     accountMap.put(tQueryHistory.getUserAccount(),0);
 8 }
 9 
10 //根據已分類帳號分配
11 Map<String,List<String>> newQueryHistoryMap = new HashMap<>();
12 for(Map.Entry<String,Integer> entry: accountMap.entrySet()){
13     String account = entry.getKey();
14     List<String> accountTQueryHistoryList = new ArrayList<>();
15     for(TQueryHistory tQueryHistory: allQueryHistory){
16         if(tQueryHistory.getUserAccount().equals(account)){
17             accountTQueryHistoryList.add(tQueryHistory.getQueryCorpName());
18         }
19     }
20     newQueryHistoryMap.put(account,accountTQueryHistoryList);
21 }
22 
23 //遍歷Map將企業名稱寫入文件,並傳至FPTree
24 String outfile = "QueryHistory.txt";
25 BufferedWriter bw = new BufferedWriter(new FileWriter(outfile));
26 for(Map.Entry<String,List<String>> entry: newQueryHistoryMap.entrySet()){
27     List<String> corpNameList = entry.getValue();
28 
29     bw.write(joinList(corpNameList));
30     bw.newLine();
31 }
32 bw.close();
33 
34 //Map取值分別放入對應的集合
35 Map<String, List<String>> corpMap = FPTree.introQueryHistory(outfile,corpName);
36 List<String> interestedCorpList = new ArrayList<>();
37 List<String> otherSearchCorpList = new ArrayList<>();
38 for(Map.Entry<String,List<String>> entry: corpMap.entrySet()){
39     if("interestedCorpList".equals(entry.getKey())){
40         interestedCorpList = entry.getValue();
41     }
42     if("otherSearchCorpList".equals(entry.getKey())){
43         otherSearchCorpList = entry.getValue();
44     }
45 }
 1 //設置文件寫入規則
 2 private static String joinList(List<String> list) {
 3     if (list == null || list.size() == 0) {
 4         return "";
 5     }
 6     StringBuilder sb = new StringBuilder();
 7     for (String ele : list) {
 8         sb.append(ele);
 9         sb.append(",");
10     }
11     return sb.substring(0, sb.length() - 1);
12 }

 

三、FPStrongAssociationRule類爲強關聯規則變量:

 1 package util;
 2 
 3 import java.util.List;
 4 
 5 public class FPStrongAssociationRule {
 6 
 7     public List<String> condition;
 8 
 9     public String result;
10 
11     public int support;
12 
13     public double confidence;
14     
15 }

 

四、FPTreeNode類爲FPTree的相關變量:

  1 package util;
  2 
  3 import java.util.ArrayList;
  4 import java.util.List;
  5 
  6 public class FPTreeNode {
  7 
  8     private String name;                    //節點名稱
  9     private int count;                      //頻數
 10     private FPTreeNode parent;              //父節點
 11     private List<FPTreeNode> children;      //子節點
 12     private FPTreeNode nextHomonym;         //下一個節點(由表頭項維護的那個鏈表)
 13     private FPTreeNode tail;                //末節點(由表頭項維護的那個鏈表)
 14 
 15 
 16 
 17     public FPTreeNode() {
 18     }
 19 
 20     public FPTreeNode(String name) {
 21         this.name = name;
 22     }
 23 
 24     public String getName() {
 25         return this.name;
 26     }
 27 
 28     public void setName(String name) {
 29         this.name = name;
 30     }
 31 
 32     public int getCount() {
 33         return this.count;
 34     }
 35 
 36     public void setCount(int count) {
 37         this.count = count;
 38     }
 39 
 40     public FPTreeNode getParent() {
 41         return this.parent;
 42     }
 43 
 44     public void setParent(FPTreeNode parent) {
 45         this.parent = parent;
 46     }
 47 
 48     public List<FPTreeNode> getChildren() {
 49         return this.children;
 50     }
 51 
 52     public void setChildren(List<FPTreeNode> children) {
 53         this.children = children;
 54     }
 55 
 56     public FPTreeNode getNextHomonym() {
 57         return this.nextHomonym;
 58     }
 59 
 60     public void setNextHomonym(FPTreeNode nextHomonym) {
 61         this.nextHomonym = nextHomonym;
 62     }
 63 
 64     public FPTreeNode getTail() {
 65         return tail;
 66     }
 67 
 68     public void setTail(FPTreeNode tail) {
 69         this.tail = tail;
 70     }
 71 
 72     //添加子節點
 73     public void addChild(FPTreeNode child) {
 74         if (getChildren() == null) {
 75             List<FPTreeNode> list = new ArrayList<>();
 76             list.add(child);
 77             setChildren(list);
 78         } else {
 79             getChildren().add(child);
 80         }
 81     }
 82 
 83     //查詢子節點
 84     public FPTreeNode findChild(String name) {
 85         List<FPTreeNode> children = getChildren();
 86         if (children != null) {
 87             for (FPTreeNode child : children) {
 88                 if (child.getName().equals(name)) {
 89                     return child;
 90                 }
 91             }
 92         }
 93         return null;
 94     }
 95 
 96 
 97     public void countIncrement(int n) {
 98         this.count += n;
 99     }
100 
101 
102     @Override
103     public String toString() {
104         return name;
105     }
106 }

 

五、FPTree類爲FPGrowth算法挖掘規則,introQueryHistory函數根據傳入全部用戶的搜索記錄以及當前搜索的企業,獲得用戶可能感興趣的企業以及其餘用戶搜索過的企業,以及限制每一個集合中的企業數量:

  1 package util;
  2 
  3 import java.io.BufferedReader;
  4 import java.io.FileReader;
  5 import java.io.IOException;
  6 import java.text.DecimalFormat;
  7 import java.util.*;
  8 import java.util.Map.Entry;
  9 
 10 public class FPTree {
 11 
 12     private int minSuport;      //頻繁模式的最小支持數
 13     private double confident;   //關聯規則的最小置信度
 14     private int totalSize;      //事務項的總數
 15     private Map<List<String>, Integer> frequentMap = new HashMap<>();  //存儲每一個頻繁項及其對應的計數
 16     private Set<String> decideAttr = null; //關聯規則中,哪些項可做爲被推導的結果,默認狀況下全部項均可以做爲被推導的結果
 17 
 18 
 19 
 20     public void setMinSuport(int minSuport) {
 21         this.minSuport = minSuport;
 22     }
 23 
 24     public void setConfident(double confident) {
 25         this.confident = confident;
 26     }
 27 
 28     public void setDecideAttr(Set<String> decideAttr) { this.decideAttr = decideAttr;}
 29 
 30 
 31 
 32     /**
 33      * 獲取強關聯規則
 34      * @return
 35      * @Description:
 36      */
 37     private List<FPStrongAssociationRule> getRules(List<String> list) {
 38         List<FPStrongAssociationRule> rect = new LinkedList<>();
 39         if (list.size() > 1) {
 40             for (int i = 0; i < list.size(); i++) {
 41                 String result = list.get(i);
 42                 if (decideAttr.contains(result)) {
 43                     List<String> condition = new ArrayList<>();
 44                     condition.addAll(list.subList(0, i));
 45                     condition.addAll(list.subList(i + 1, list.size()));
 46                     FPStrongAssociationRule rule = new FPStrongAssociationRule();
 47                     rule.condition = condition;
 48                     rule.result = result;
 49                     rect.add(rule);
 50                 }
 51             }
 52         }
 53         return rect;
 54     }
 55 
 56 
 57     /**
 58      * 從若干個文件中讀入Transaction Record,同時把全部項設置爲decideAttr
 59      * @return
 60      * @Description:
 61      */
 62     public List<List<String>> readTransRocords(String[] filenames) {
 63         Set<String> set = new HashSet<>();
 64         List<List<String>> transaction = null;
 65         if (filenames.length > 0) {
 66             transaction = new LinkedList<>();
 67             for (String filename : filenames) {
 68                 try {
 69                     FileReader fr = new FileReader(filename);
 70                     BufferedReader br = new BufferedReader(fr);
 71                     try {
 72                         String line;
 73                         // 一項事務佔一行
 74                         while ((line = br.readLine()) != null) {
 75                             if (line.trim().length() > 0) {
 76                                 // 每一個item之間用","分隔
 77                                 String[] str = line.split(",");
 78                                 //每一項事務中的重複項須要排重
 79                                 Set<String> record = new HashSet<>();
 80                                 for (String w : str) {
 81                                     record.add(w);
 82                                     set.add(w);
 83                                 }
 84                                 List<String> rl = new ArrayList<>();
 85                                 rl.addAll(record);
 86                                 transaction.add(rl);
 87                             }
 88                         }
 89                     } finally {
 90                         br.close();
 91                     }
 92                 } catch (IOException ex) {
 93                     System.out.println("Read transaction records failed." + ex.getMessage());
 94                     System.exit(1);
 95                 }
 96             }
 97         }
 98 
 99         this.setDecideAttr(set);
100         return transaction;
101     }
102 
103 
104     /**
105      * 生成一個序列的各類子序列(序列是有順序的)
106      * @param residualPath
107      * @param results
108      */
109     private void combine(LinkedList<FPTreeNode> residualPath, List<List<FPTreeNode>> results) {
110         if (residualPath.size() > 0) {
111             //若是residualPath太長,則會有太多的組合,內存會被耗盡的
112             FPTreeNode head = residualPath.poll();
113             List<List<FPTreeNode>> newResults = new ArrayList<>();
114             for (List<FPTreeNode> list : results) {
115                 List<FPTreeNode> listCopy = new ArrayList<>(list);
116                 newResults.add(listCopy);
117             }
118 
119             for (List<FPTreeNode> newPath : newResults) {
120                 newPath.add(head);
121             }
122             results.addAll(newResults);
123             List<FPTreeNode> list = new ArrayList<>();
124             list.add(head);
125             results.add(list);
126             combine(residualPath, results);
127         }
128     }
129 
130     /**
131      * 判斷是否爲單節點
132      * @param root
133      */
134     private boolean isSingleBranch(FPTreeNode root) {
135         boolean rect = true;
136         while (root.getChildren() != null) {
137             if (root.getChildren().size() > 1) {
138                 rect = false;
139                 break;
140             }
141             root = root.getChildren().get(0);
142         }
143         return rect;
144     }
145 
146     /**
147      * 計算事務集中每一項的頻數
148      * @param transRecords
149      * @return
150      */
151     private Map<String, Integer> getFrequency(List<List<String>> transRecords) {
152         Map<String, Integer> rect = new HashMap<>();
153         for (List<String> record : transRecords) {
154             for (String item : record) {
155                 Integer cnt = rect.get(item);
156                 if (cnt == null) {
157                     cnt = new Integer(0);
158                 }
159                 rect.put(item, ++cnt);
160             }
161         }
162         return rect;
163     }
164 
165     /**
166      * 根據事務集合構建FPTree
167      * @param transRecords
168      * @Description:
169      */
170     public void buildFPTree(List<List<String>> transRecords) {
171         totalSize = transRecords.size();
172         //計算每項的頻數
173         final Map<String, Integer> freqMap = getFrequency(transRecords);
174         //每條事務中的項按F1排序
175         for (List<String> transRecord : transRecords) {
176             Collections.sort(transRecord, (o1, o2) -> freqMap.get(o2) - freqMap.get(o1));
177         }
178         FPGrowth(transRecords, null);
179     }
180 
181 
182     /**
183      * FP樹遞歸生長,從而獲得全部的頻繁模式
184      * @param cpb  條件模式基
185      * @param postModel   後綴模式
186      */
187     private void FPGrowth(List<List<String>> cpb, LinkedList<String> postModel) {
188         Map<String, Integer> freqMap = getFrequency(cpb);
189         Map<String, FPTreeNode> headers = new HashMap<>();
190         for (Entry<String, Integer> entry : freqMap.entrySet()) {
191             String name = entry.getKey();
192             int cnt = entry.getValue();
193             //每一次遞歸時都有可能出現一部分模式的頻數低於閾值
194             if (cnt >= minSuport) {
195                 FPTreeNode node = new FPTreeNode(name);
196                 node.setCount(cnt);
197                 headers.put(name, node);
198             }
199         }
200 
201         FPTreeNode treeRoot = buildSubTree(cpb,headers);
202         //若是隻剩下虛根節點,則遞歸結束
203         if ((treeRoot.getChildren() == null) || (treeRoot.getChildren().size() == 0)) {
204             return;
205         }
206 
207         //若是樹是單枝的,則直接把「路徑的各類組合+後綴模式」添加到頻繁模式集中。這個技巧是可選的,即跳過此步進入下一輪遞歸也能夠獲得正確的結果
208         if (isSingleBranch(treeRoot)) {
209             LinkedList<FPTreeNode> path = new LinkedList<>();
210             FPTreeNode currNode = treeRoot;
211             while (currNode.getChildren() != null) {
212                 currNode = currNode.getChildren().get(0);
213                 path.add(currNode);
214             }
215             //調用combine時path不宜過長,不然會OutOfMemory
216             if (path.size() <= 20) {
217                 List<List<FPTreeNode>> results = new ArrayList<>();
218                 combine(path, results);
219                 for (List<FPTreeNode> list : results) {
220                     int cnt = 0;
221                     List<String> rule = new ArrayList<>();
222                     for (FPTreeNode node : list) {
223                         rule.add(node.getName());
224                         cnt = node.getCount();  //cnt最FPTree葉節點的計數
225                     }
226                     if (postModel != null) {
227                         rule.addAll(postModel);
228                     }
229                     frequentMap.put(rule, cnt);
230                 }
231                 return;
232             } else {
233                 System.err.println("length of path is too long: " + path.size());
234             }
235         }
236 
237         for (FPTreeNode header : headers.values()) {
238             List<String> rule = new ArrayList<>();
239             rule.add(header.getName());
240             if (postModel != null) {
241                 rule.addAll(postModel);
242             }
243             //表頭項+後綴模式  構成一條頻繁模式(頻繁模式內部也是按照F1排序的),頻繁度爲表頭項的計數
244             frequentMap.put(rule, header.getCount());
245             //新的後綴模式:表頭項+上一次的後綴模式(注意保持順序,始終按F1的順序排列)
246             LinkedList<String> newPostPattern = new LinkedList<>();
247             newPostPattern.add(header.getName());
248             if (postModel != null) {
249                 newPostPattern.addAll(postModel);
250             }
251             //新的條件模式基
252             List<List<String>> newCPB;
253             newCPB = new LinkedList<>();
254             FPTreeNode nextNode = header;
255             while ((nextNode = nextNode.getNextHomonym()) != null) {
256                 int counter = nextNode.getCount();
257                 //得到從虛根節點(不包括虛根節點)到當前節點(不包括當前節點)的路徑,即一條條件模式基。注意保持順序:你節點在前,子節點在後,即始終保持頻率高的在前
258                 LinkedList<String> path = new LinkedList<>();
259                 FPTreeNode parent = nextNode;
260                 while ((parent = parent.getParent()).getName() != null) {//虛根節點的name爲null
261                     path.push(parent.getName());//往表頭插入
262                 }
263                 //事務要重複添加counter次
264                 while (counter-- > 0) {
265                     newCPB.add(path);
266                 }
267             }
268             FPGrowth(newCPB, newPostPattern);
269         }
270     }
271 
272     /**
273      * 把全部事務插入到一個FP樹當中
274      * @param transRecords
275      * @param headers
276      * @return
277      */
278     private FPTreeNode buildSubTree(List<List<String>> transRecords,final Map<String, FPTreeNode> headers) {
279         FPTreeNode root = new FPTreeNode();//虛根節點
280         for (List<String> transRecord : transRecords) {
281             LinkedList<String> record = new LinkedList<>(transRecord);
282             FPTreeNode subTreeRoot = root;
283             FPTreeNode tmpRoot;
284             if (root.getChildren() != null) {
285                 //延已有的分支,令各節點計數加1
286                 while (!record.isEmpty()
287                         && (tmpRoot = subTreeRoot.findChild(record.peek())) != null) {
288                     tmpRoot.countIncrement(1);
289                     subTreeRoot = tmpRoot;
290                     record.poll();
291                 }
292             }
293             //長出新的節點
294             addNodes(subTreeRoot, record, headers);
295         }
296         return root;
297     }
298 
299     /**
300      * 往特定的節點下插入一串後代節點,同時維護表頭項到同名節點的鏈表指針
301      * @param ancestor
302      * @param record
303      * @param headers
304      */
305     private void addNodes(FPTreeNode ancestor, LinkedList<String> record,
306                           final Map<String, FPTreeNode> headers) {
307         while (!record.isEmpty()) {
308             String item = record.poll();
309             //單個項的出現頻數必須大於最小支持數,不然不容許插入FP樹。達到最小支持度的項都在headers中。每一次遞歸根據條件模式基本創建新的FPTree時,把要把頻數低於minSuport的排除在外,這也正是FPTree比窮舉法快的真正緣由
310             if (headers.containsKey(item)) {
311                 FPTreeNode leafnode = new FPTreeNode(item);
312                 leafnode.setCount(1);
313                 leafnode.setParent(ancestor);
314                 ancestor.addChild(leafnode);
315 
316                 FPTreeNode header = headers.get(item);
317                 FPTreeNode tail=header.getTail();
318                 if(tail!=null){
319                     tail.setNextHomonym(leafnode);
320                 }else{
321                     header.setNextHomonym(leafnode);
322                 }
323                 header.setTail(leafnode);
324                 addNodes(leafnode, record, headers);
325             }
326 
327         }
328     }
329 
330     /**
331      * 獲取全部的強規則
332      * @return
333      */
334     public List<FPStrongAssociationRule> getAssociateRule() {
335         assert totalSize > 0;
336         List<FPStrongAssociationRule> rect = new ArrayList<>();
337         //遍歷全部頻繁模式
338         for (Entry<List<String>, Integer> entry : frequentMap.entrySet()) {
339             List<String> items = entry.getKey();
340             int count1 = entry.getValue();
341             //一條頻繁模式能夠生成不少關聯規則
342             List<FPStrongAssociationRule> rules = getRules(items);
343             //計算每一條關聯規則的支持度和置信度
344             for (FPStrongAssociationRule rule : rules) {
345                 if (frequentMap.containsKey(rule.condition)) {
346                     int count2 = frequentMap.get(rule.condition);
347                     double confidence = 1.0 * count1 / count2;
348                     if (confidence >= this.confident) {
349                         rule.support = count1;
350                         rule.confidence = confidence;
351                         rect.add(rule);
352                     }
353                 } else {
354                     System.err.println(rule.condition + " is not a frequent pattern, however "
355                             + items + " is a frequent pattern");
356                 }
357             }
358         }
359         return rect;
360     }
361 
362     /**
363      * 限制List集合中企業數目爲5條
364      */
365     private static void limitFiveCorp(List<String> corpList) {
366         if(corpList.size() > 5){
367             Random randomId = new Random();
368             //對隨機的5個企業名稱排成原來的默認順序
369             List<Integer> indexes = new ArrayList<>();
370             while(indexes.size() < 5) {
371                 int index = randomId.nextInt(corpList.size());
372                 if(!indexes.contains(index)) {
373                     indexes.add(index);
374                 }
375             }
376             Collections.sort(indexes);
377             //取出indexes對應的list放到newList
378             List<String> tempRelationsCorpList = new ArrayList<>();
379             for(int index : indexes) {
380                 tempRelationsCorpList.add(corpList.get(index));
381             }
382             corpList.clear();
383             corpList.addAll(tempRelationsCorpList);
384         }
385     }
386 
387 
388     public static Map<String, List<String>> introQueryHistory(String outfile,String corpName) {
389         FPTree fpTree = new FPTree();
390 
391         //設置置信度與支持數
392         fpTree.setConfident(0.3);
393         fpTree.setMinSuport(3);
394 
395         List<List<String>> trans = fpTree.readTransRocords(new String[] { outfile });
396         for(int i = 1;i < trans.size() - 1;i++){
397             System.out.println("第"+i+"行數據:"+ trans.get(i));
398         }
399 
400         fpTree.buildFPTree(trans);
401 
402         List<FPStrongAssociationRule> rules = fpTree.getAssociateRule();
403         DecimalFormat dfm = new DecimalFormat("#.##");
404 
405         Map<String, String> interestedCorpMap = new HashMap<>();  //須要返回的關聯企業(您可能感興趣的公司)
406         Map<String, String> otherSearchCorpMap = new HashMap<>(); //須要返回的關聯企業(其餘人還搜過的公司)
407         //根據置信度查詢關聯企業用於返回感興趣的公司
408         for (FPStrongAssociationRule rule : rules) {
409             System.out.println(rule.condition + "->" + rule.result + "\t" + dfm.format(rule.support) + "\t" + dfm.format(rule.confidence));
410             List<String> corpCondition = rule.condition;
411             for(int i = 0;i < corpCondition.size();i++){
412                 if(corpName.equals(corpCondition.get(i))){
413                     interestedCorpMap.put(rule.result,dfm.format(rule.confidence));
414                 }
415             }
416             if(corpName.equals(rule.result)){
417                 for(int i = 0;i < corpCondition.size();i++){
418                     if(!corpName.equals(corpCondition.get(i))){
419                         interestedCorpMap.put(corpCondition.get(i),dfm.format(rule.confidence));
420                     }
421                 }
422             }
423         }
424 
425         //根據多項集查詢關聯企業用於返回其它搜過的公司
426         for (FPStrongAssociationRule rule : rules) {
427             List<String> corpCondition = rule.condition;
428             for (int i = 0; i < corpCondition.size(); i++) {
429                 if (corpName.equals(corpCondition.get(i)) && corpCondition.size() > 1) {
430                     for (int j = 0; j < corpCondition.size(); j++) {
431                         if (!corpName.equals(corpCondition.get(j))) {
432                             otherSearchCorpMap.put(corpCondition.get(j), "0.00");
433                         }
434                     }
435                 }
436             }
437         }
438 
439 
440         List<String> interestedCorpList = new ArrayList<>();
441         List<String> otherSearchCorpList = new ArrayList<>();
442         for(Map.Entry<String,String> entry: interestedCorpMap.entrySet()){
443             interestedCorpList.add(entry.getKey());
444         }
445         for(Map.Entry<String,String> entry: otherSearchCorpMap.entrySet()){
446             otherSearchCorpList.add(entry.getKey());
447         }
448 
449         limitFiveCorp(interestedCorpList);
450         limitFiveCorp(otherSearchCorpList);
451 
452         Map<String, List<String>> corpMap = new HashMap<>();
453         corpMap.put("interestedCorpList",interestedCorpList);
454         corpMap.put("otherSearchCorpList",otherSearchCorpList);
455 
456         return corpMap;
457     }
458 
459 
460 }

 

 

附上控制檯打印部分截圖:

 

 

3、總結

 在上面的代碼中將整個事務數據庫傳給FPGrowth,在實際中這是不可取的,由於內存不可能容下整個事務數據庫,咱們可能須要從關係數據庫中一條一條地讀入來創建FP-Tree。但不管如何 FP-Tree是確定須要放在內存中的,但內存若是容不下怎麼辦?另外FPGrowth仍然是很是耗時的,想提升速度怎麼辦?解決辦法:分而治之,並行計算。

在實踐中,關聯規則挖掘可能並不像人們指望的那麼有用。一方面是由於支持度置信度框架會產生過多的規則,並非每個規則都是有用的。另外一方面大部分的關聯規則並不像「啤酒與尿布」這種經典故事這麼廣泛。關聯規則分析是須要技巧的,有時須要用更嚴格的統計學知識來控制規則的增殖。

 

 

本文部分學習參考了:http://www.cnblogs.com/zhangchaoyang/articles/2198946.html

至此是關於關聯分析FPGrowth算法在JavaWeb項目中的應用,上述僅是我的觀點,僅供參考。

若有疏漏錯誤之處,還請不吝賜教!

相關文章
相關標籤/搜索