設計一個簡單的,低耗的可以區分成酒和白酒的感知器(sensor)

學習using weka in your javacodejava

主要學習兩個部分的代碼:一、過濾數據集 2 使用J48決策樹進行分類。下面的例子沒有對數據集進行分割,徹底使用訓練集做爲測試集,因此不符合數據挖掘的常識,可是下面這段代碼的做用只是爲了學習using weka in javaapp

學習部分來自:http://weka.wikispaces.com/Use+WEKA+in+your+Java+code性能

part1學習

Filter

A filter has two different properties:測試

  • supervised or unsupervised
    either takes the class attribute into account or not
  • attribute- or instance-based
    e.g., removing a certain attribute or removing instances that meet a certain condition


Most filters implement the OptionHandler interface, which means you can set the options via a String array, rather than setting them each manually via set-methods.
For example, if you want to remove the first attribute of a dataset, you need this filterui

 weka.filters.unsupervised.attribute.Remove

with this optionthis

 -R 1

If you have an Instances object, called data, you can create and apply the filter like this:lua

 import weka.core.Instances;
 import weka.filters.Filter;
 import weka.filters.unsupervised.attribute.Remove;
 ...
 String[] options = new String[2];
 options[0] = "-R";                                    // "range"
 options[1] = "1";                                     // first attribute
 Remove remove = new Remove();                         // new instance of filter
 remove.setOptions(options);                           // set options
 remove.setInputFormat(data);                          // inform filter about dataset **AFTER** setting options
 Instances newData = Filter.useFilter(data, remove);   // apply filter

part2spa

Train/test set

In case you have a dedicated test set, you can train the classifier and then evaluate it on this test set. In the following example, a J48 is instantiated, trained and then evaluated. Some statistics are printed to stdout:.net

 import weka.core.Instances;
 import weka.classifiers.Evaluation;
 import weka.classifiers.trees.J48;
 ...
 Instances train = ...   // from somewhere
 Instances test = ...    // from somewhere
 // train classifier
 Classifier cls = new J48();
 cls.buildClassifier(train);
 // evaluate classifier and print some statistics
 Evaluation eval = new Evaluation(train);
 eval.evaluateModel(cls, test);
 System.out.println(eval.toSummaryString("\nResults\n======\n", false));

下面是一個使用weka進行分類的小例子,後面附上實現這段過程的java代碼。

設計一個簡單的,低耗的可以區分成酒和白酒的感知器(sensor)

要求:

設計的感知器必須可以至少正確的區分95%的紅酒和白酒的樣本數據,樣本數據集大小爲:6497。

數據集Download from:www.technologyforge.net/Datasets

實驗步驟:

一、  數據預處理:移除屬性quality。在這個試驗中不須要用到酒的質量,只關注對白酒和紅酒分類的準確率

選中:quality->點擊remove

 

一、  運行默認設置的J48分類器獲得一個使用全部屬性值得分類結果。

從下圖咱們能夠看到分類準確率達到99.5998%,準確率至關高

3.爲了知足低耗的要求,因此咱們要儘可能使用最後的屬性值也能達到95%的分類結果。這就須要重複試驗。可使用正反兩個實驗方向的方法試錯,過程比較簡單。

屬性選擇過程:能夠根據圖示觀察不一樣屬性對於分類結果的影響,通過比較觀察能夠看到下面兩個屬性是最能區分白酒和紅酒的表明性屬性。

分類性能:

使用java重複以上實驗過程。

Javacode 以下

import weka.core.Instances;

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.File;

import javax.xml.crypto.Data;

import weka.classifiers.Classifier;
import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;
import weka.core.converters.ArffLoader;
import weka.core.converters.ConverterUtils.DataSource;
import weka.classifiers.Evaluation;
public class RWClassifier {
    
    public static Instances getFileInstances(String filename) throws Exception{
        FileReader frData =new FileReader(filename);
        Instances data = new Instances(frData);
        int length= data.numAttributes();
        String[] options = new String[2];
        options[0]="-R";
        options[1]=Integer.toString(length);
        Remove remove =new Remove();
        remove.setOptions(options);
        remove.setInputFormat(data);
        Instances newData= Filter.useFilter(data, remove);
        return newData;
    }

    public static void main(String[] args) throws Exception {
        Instances instances = getFileInstances("D://Weka_tutorial//WineQuality//RedWhiteWine.arff");//存儲數據的位置
//        System.out.println(instances);
        instances.setClassIndex(instances.numAttributes()-1);
        
        J48 j48= new J48();
        j48.buildClassifier(instances);
        
        Evaluation eval = new Evaluation(instances);
        eval.evaluateModel(j48, instances);
        System.out.println(eval.toSummaryString("\nResults\n====\n", false));
            
        
    }
    
    
}

 

 

 

使用完整屬性的分類結果(能夠對比weka的運行結果,徹底一致):

Results

====

 

Correctly Classified Instances        6471               99.5998 %

Incorrectly Classified Instances        26                0.4002 %

Kappa statistic                          0.9892

Mean absolute error                      0.0076

Root mean squared error                  0.0617

Relative absolute error                  2.0491 %

Root relative squared error             14.3154 %

Total Number of Instances             6497

相關文章
相關標籤/搜索