學習using weka in your javacodejava
主要學習兩個部分的代碼:一、過濾數據集 2 使用J48決策樹進行分類。下面的例子沒有對數據集進行分割,徹底使用訓練集做爲測試集,因此不符合數據挖掘的常識,可是下面這段代碼的做用只是爲了學習using weka in javaapp
學習部分來自:http://weka.wikispaces.com/Use+WEKA+in+your+Java+code性能
part1學習
A filter has two different properties:測試
Most filters implement the OptionHandler interface, which means you can set the options via a String array, rather than setting them each manually via set-methods.
For example, if you want to remove the first attribute of a dataset, you need this filterui
weka.filters.unsupervised.attribute.Remove
with this optionthis
-R 1
If you have an Instances object, called data, you can create and apply the filter like this:lua
import weka.core.Instances; import weka.filters.Filter; import weka.filters.unsupervised.attribute.Remove; ... String[] options = new String[2]; options[0] = "-R"; // "range" options[1] = "1"; // first attribute Remove remove = new Remove(); // new instance of filter remove.setOptions(options); // set options remove.setInputFormat(data); // inform filter about dataset **AFTER** setting options Instances newData = Filter.useFilter(data, remove); // apply filter
part2spa
In case you have a dedicated test set, you can train the classifier and then evaluate it on this test set. In the following example, a J48 is instantiated, trained and then evaluated. Some statistics are printed to stdout:.net
import weka.core.Instances; import weka.classifiers.Evaluation; import weka.classifiers.trees.J48; ... Instances train = ... // from somewhere Instances test = ... // from somewhere // train classifier Classifier cls = new J48(); cls.buildClassifier(train); // evaluate classifier and print some statistics Evaluation eval = new Evaluation(train); eval.evaluateModel(cls, test); System.out.println(eval.toSummaryString("\nResults\n======\n", false));
下面是一個使用weka進行分類的小例子,後面附上實現這段過程的java代碼。
設計一個簡單的,低耗的可以區分成酒和白酒的感知器(sensor)
要求:
設計的感知器必須可以至少正確的區分95%的紅酒和白酒的樣本數據,樣本數據集大小爲:6497。
數據集Download from:www.technologyforge.net/Datasets
實驗步驟:
一、 數據預處理:移除屬性quality。在這個試驗中不須要用到酒的質量,只關注對白酒和紅酒分類的準確率
選中:quality->點擊remove
一、 運行默認設置的J48分類器獲得一個使用全部屬性值得分類結果。
從下圖咱們能夠看到分類準確率達到99.5998%,準確率至關高
3.爲了知足低耗的要求,因此咱們要儘可能使用最後的屬性值也能達到95%的分類結果。這就須要重複試驗。可使用正反兩個實驗方向的方法試錯,過程比較簡單。
屬性選擇過程:能夠根據圖示觀察不一樣屬性對於分類結果的影響,通過比較觀察能夠看到下面兩個屬性是最能區分白酒和紅酒的表明性屬性。
分類性能:
使用java重複以上實驗過程。
Javacode 以下
import weka.core.Instances; import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.File; import javax.xml.crypto.Data; import weka.classifiers.Classifier; import weka.classifiers.meta.FilteredClassifier; import weka.classifiers.trees.J48; import weka.filters.Filter; import weka.filters.unsupervised.attribute.Remove; import weka.core.converters.ArffLoader; import weka.core.converters.ConverterUtils.DataSource; import weka.classifiers.Evaluation; public class RWClassifier { public static Instances getFileInstances(String filename) throws Exception{ FileReader frData =new FileReader(filename); Instances data = new Instances(frData); int length= data.numAttributes(); String[] options = new String[2]; options[0]="-R"; options[1]=Integer.toString(length); Remove remove =new Remove(); remove.setOptions(options); remove.setInputFormat(data); Instances newData= Filter.useFilter(data, remove); return newData; } public static void main(String[] args) throws Exception { Instances instances = getFileInstances("D://Weka_tutorial//WineQuality//RedWhiteWine.arff");//存儲數據的位置 // System.out.println(instances); instances.setClassIndex(instances.numAttributes()-1); J48 j48= new J48(); j48.buildClassifier(instances); Evaluation eval = new Evaluation(instances); eval.evaluateModel(j48, instances); System.out.println(eval.toSummaryString("\nResults\n====\n", false)); } }
使用完整屬性的分類結果(能夠對比weka的運行結果,徹底一致):
Results
====
Correctly Classified Instances 6471 99.5998 %
Incorrectly Classified Instances 26 0.4002 %
Kappa statistic 0.9892
Mean absolute error 0.0076
Root mean squared error 0.0617
Relative absolute error 2.0491 %
Root relative squared error 14.3154 %
Total Number of Instances 6497