Orange的擴展插件Widgets開發(四)-Channels和Tokens

Orange的擴展插件Widgets開發(四)

Channels 和 Tokens

        咱們上次介紹的數據抽樣的widget例子,在數據傳輸通道上是簡單和直接的。widget 被設計從一個widget接收數據,處理後將Token經過另一個Channel發送出去。像下面這個圖同樣:
html

_images/schemawithdatasamplerB.png

        關於channels和tokens的管理,其實這裏有一些更多的狀況,這裏咱們將更復雜的事情作一個概覽,這些瞭解能夠幫助你作出一些複雜的widgets,用於處理多路輸出、多路輸入的一些處理邏輯。python

多輸入通道:Multi-Input Channels

        簡單來講,「multi-input」 channels 就是這個widget能夠與多個widgets的多個output channels進行鏈接。這樣子的話,多個來源的數據能夠被 feed 到一個Widget中進行處理,就像一個函數能夠輸入多個參數同樣的狀況。算法

        好比說,咱們想構建一個widget,將獲取數據而且經過多種預測模型在之上進行測試。widget必須有 input data channel, 咱們已經知道如何進行處理。可是,不一樣的是,咱們但願鏈接多個widgets,像下圖定義的邏輯:
app

_images/learningcurve.png

        咱們將瞭解如何定義learning curve widget的多個channels,以及如何管理多個input tokens。但在此以前,先簡單說明一下: learning curve是用於測試機器學習算法的,以試圖肯定在特定的訓練集上的執行性能。爲了這個,須要先抽出一個數據的子集,學習分類器,而後再在其它的數據集上進行測試。爲了作這件事情 (by Salzberg, 1997),咱們執行一個k-fold cross validation, 但只使用必定比例的數據用於訓練。output widget看起來像下面的樣子:dom

_images/learningcurve-output.png

        如今,回到channels 和 tokens的話題。 定義咱們的widget的Input和output channels,像下面這樣:機器學習

    inputs = [("Data", Orange.data.Table, "set_dataset"),
              ("Learner", Orange.classification.Learner, "set_learner",
               widget.Multiple + widget.Default)]

        不知道你注意到了沒有,這個與以前定義的widgets,大部分都是相同的,除了widget.Multiple + widget.Default (from theOrange.widgets.widget namespace)。 做爲input列表定義的最後一項,定義了一個Learner channel。 這個widget.Multiple + widget.Default 的意思是說,這是一個multi-input channel,而且是這個類型輸入的缺省input。若是沒有指定這個參數,那麼缺省狀況下widget.Single將被使用。這意味着,這個widget只能從一個widget接收input,而且不是缺省的通道 (缺省的channels後面再說)。ide

注意:Default flag here is used for illustration. Since 「Learner」channel is the only channel for a Orange.classification.Learner type it is also the default.函數

        在 Orange中,tokens被髮送是依賴於widget的id的,具備multi-input channel 僅僅告訴 Orange 發送token together with sending widget id, the two arguments with which the receiving function is called. 對於咱們的「Learner」channel,接收函數是 set_learner(),看起來像下面這個樣子:性能

    def set_learner(self, learner, id):
        """Set the input learner for channel id."""
        if id in self.learners:
            if learner is None:
                # remove a learner and corresponding results
                del self.learners[id]
                del self.results[id]
                del self.curves[id]
            else:
                # update/replace a learner on a previously connected link
                self.learners[id] = learner
                # invalidate the cross-validation results and curve scores
                # (will be computed/updated in `_update`)
                self.results[id] = None
                self.curves[id] = None
        else:
            if learner is not None:
                self.learners[id] = learner
                # initialize the cross-validation results and curve scores
                # (will be computed/updated in `_update`)
                self.results[id] = None
                self.curves[id] = None

        if len(self.learners):
            self.infob.setText("%d learners on input." % len(self.learners))
        else:
            self.infob.setText("No learners.")

        self.commitBtn.setEnabled(len(self.learners))

        OK,看起來有點長、有點複雜。可是,保持耐心! Learning curve 不是一個特簡單的widget。學習

這個函數中,有一些更多的代碼,用於管理其特定狀況的信息。要理解這個信號,咱們下面介紹其機制。咱們存儲 learners (objects that learn from data) 在一個OrderedDict中: self.learners。這個詞典對象是input id 和input value (the input learner itself)的Mapping類型數據。The reason this is an OrderedDict is that the order of the input learners is important as we want to maintain a consistent column order in the table view of the learning curve point scores.

上面的函數首先檢查channelid 是否已經在self.learners中,若是是則刪除對象的。若是learnerNone (記住:收到None值意味着鏈接被移除或者關閉) or invalidates the cross validation results, and curve point for that channel id, marking for update in handleNewSignals(). 一樣的狀況就是當咱們收到learner for a new channel id。

The function above first checks if the learner sent is empty (None). Remember that sending an empty learner essentially means that the link with the sending widget was removed, hence we need to remove such learner from our list. If a non-empty learner was sent, then it is either a new learner (say, from a widget we have just linked to our learning curve widget), or an update version of the previously sent learner. If the later is the case, then there is an id which we already have in the learners list, and we need to replace previous information on that learner. If a new learner was sent, the case is somehow simpler, and we just add this learner and its learning curve to the corresponding variables that hold this information.

The function that handles learners as shown above is the most complicated function in our learning curve widget. In fact, the rest of the widget does some simple GUI management, and calls learning curve routines from testing and performance scoring functions from evaluation.

注意,在這個widget中求值 (k-fold cross validation)實施只當給出learner, data set 和 evaluation parameters, 而且 scores are then derived from class probability estimates as obtained from the evaluation procedure. 意味着從一個 scoring function到另外一個 (and displaying the result in the table) takes only a split of a second. 查看其它的方面,獲取代碼: its code.

多通道輸出:Using Several Output Channels

        這裏沒啥新鮮的,只是須要一個widget,具備幾個輸出通道,演示缺省channels(下面會用到)。爲了這個目的,咱們修改以前構建的數據抽樣的例子,讓抽樣數據從一個通道輸出,而其它的數據從另外一個通道輸出。對應的通道定義以下:

    outputs = [("Sampled Data", Orange.data.Table),
               ("Other Data", Orange.data.Table)]

        咱們使用  data sampler widget 的第三個變體。變化主要在函數selection() and commit()中:

    def selection(self):
        if self.dataset is None:
            return

        n_selected = int(numpy.ceil(len(self.dataset) * self.proportion / 100.))
        indices = numpy.random.permutation(len(self.dataset))
        indices_sample = indices[:n_selected]
        indices_other = indices[n_selected:]
        self.sample = self.dataset[indices_sample]
        self.otherdata = self.dataset[indices_other]
        self.infob.setText('%d sampled instances' % len(self.sample))
    def commit(self):
        self.send("Sampled Data", self.sample)
        self.send("Other Data", self.otherdata)

    若是widget具備同一種類型的多個通道,Orange Canvas打開一個窗口詢問用戶將鏈接到哪個通道。所以,若是咱們鏈接數據抽樣器Data Sampler (C) widget 到 Data Table widget,以下:

_images/datasampler-totable.png

咱們獲得下面的窗口請求用戶輸入多個通道的鏈接信息:

_images/datasampler-channelquerry.png

Default Channels (當使用 Input Channels of the Same Type)

Now, let’s say we want to extend our learning curve widget such that it does the learning the same way as it used to, but can - provided that such data set is defined - test the learners (always) on the same, external data set. That is, besides the training data set, we need another channel of the same type but used for training data set. Notice, however, that most often we will only provide the training data set, so we would not like to be bothered (in Orange Canvas) with the dialog which channel to connect to, as the training data set channel will be the default one.

When enlisting the input channel of the same type, the default channels have a special flag in the channel specification list. So for our new learning curvewidget, the channel specification is

    inputs = [("Data", Orange.data.Table, "set_dataset", widget.Default),
              ("Test Data", Orange.data.Table, "set_testdataset"),
              ("Learner", Orange.classification.Learner, "set_learner",
               widget.Multiple + widget.Default)]

 這個 Train Data channel是一個single-token channel,缺省的一個(第三個參數)。注意,標誌能夠被添加 (or OR-d)到一塊兒,所以 Default + Multiple 是一個有效的標誌。爲了測試其是否工做,鏈接一個file widget到learning curve widget ,可是,什麼也沒有發生:

_images/file-to-learningcurveb.png

直到缺省的「Train Data」被選擇時,是沒有查詢窗口在給定的 channels去鏈接和打開的。

相關文章
相關標籤/搜索