學習using weka in your javacodejava
主要學習兩個部分的代碼:一、過濾數據集 2 使用J48決策樹進行分類。下面的例子沒有對數據集進行分割,徹底使用訓練集做爲測試集,因此不符合數據挖掘的常識,可是下面這段代碼的做用只是爲了學習using weka in javaapp
A filter has two different properties:測試
Most filters implement the OptionHandler interface, which means you can set the options via a String array, rather than setting them each manually via set-methods.
For example, if you want to remove the first attribute of a dataset, you need this filterui
with this optionthis
-R 1
If you have an Instances object, called data, you can create and apply the filter like this:lua
import weka.core.Instances; import weka.filters.Filter; import weka.filters.unsupervised.attribute.Remove; ... String[] options = new String[2]; options[0] = "-R"; // "range" options[1] = "1"; // first attribute Remove remove = new Remove(); // new instance of filter remove.setOptions(options); // set options remove.setInputFormat(data); // inform filter about dataset **AFTER** setting options Instances newData = Filter.useFilter(data, remove); // apply filter
In case you have a dedicated test set, you can train the classifier and then evaluate it on this test set. In the following example, a J48 is instantiated, trained and then evaluated. Some statistics are printed to stdout:.net
import weka.core.Instances; import weka.classifiers.Evaluation; import weka.classifiers.trees.J48; ... Instances train = ... // from somewhere Instances test = ... // from somewhere // train classifier Classifier cls = new J48(); cls.buildClassifier(train); // evaluate classifier and print some statistics Evaluation eval = new Evaluation(train); eval.evaluateModel(cls, test); System.out.println(eval.toSummaryString("\nResults\n======\n", false));
數據集Download from:www.technologyforge.net/Datasets
一、 數據預處理:移除屬性quality。在這個試驗中不須要用到酒的質量,只關注對白酒和紅酒分類的準確率
一、 運行默認設置的J48分類器獲得一個使用全部屬性值得分類結果。
Javacode 以下
import weka.core.Instances; import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.File; import javax.xml.crypto.Data; import weka.classifiers.Classifier; import weka.classifiers.meta.FilteredClassifier; import weka.classifiers.trees.J48; import weka.filters.Filter; import weka.filters.unsupervised.attribute.Remove; import weka.core.converters.ArffLoader; import weka.core.converters.ConverterUtils.DataSource; import weka.classifiers.Evaluation; public class RWClassifier { public static Instances getFileInstances(String filename) throws Exception{ FileReader frData =new FileReader(filename); Instances data = new Instances(frData); int length= data.numAttributes(); String[] options = new String[2]; options[0]="-R"; options[1]=Integer.toString(length); Remove remove =new Remove(); remove.setOptions(options); remove.setInputFormat(data); Instances newData= Filter.useFilter(data, remove); return newData; } public static void main(String[] args) throws Exception { Instances instances = getFileInstances("D://Weka_tutorial//WineQuality//RedWhiteWine.arff");//存儲數據的位置 // System.out.println(instances); instances.setClassIndex(instances.numAttributes()-1); J48 j48= new J48(); j48.buildClassifier(instances); Evaluation eval = new Evaluation(instances); eval.evaluateModel(j48, instances); System.out.println(eval.toSummaryString("\nResults\n====\n", false)); } }
Correctly Classified Instances 6471 99.5998 %
Incorrectly Classified Instances 26 0.4002 %
Kappa statistic 0.9892
Mean absolute error 0.0076
Root mean squared error 0.0617
Relative absolute error 2.0491 %
Root relative squared error 14.3154 %
Total Number of Instances 6497