Storm Component基本接口

IComponent

IComponent接口是全部組件的接口。java

主要包含兩個方法:react

  1. declareOutputFields:爲拓撲的全部流組件生命輸出模式。
  2. getComponentConfiguration:聲明指定組件大的配置。只有"topology.*"配置的一個子集能夠被覆蓋。當使用TopologyBuilder構建拓撲是,組件配置能夠被進一步覆蓋。
package org.apache.storm.topology;
import java.io.Serializable;
import java.util.Map;
/**
 * Common methods for all possible components in a topology. This interface is used
 * when defining topologies using the Java API. 
 */
public interface IComponent extends Serializable {
    /**
     * Declare the output schema for all the streams of this topology.
     *
     * @param declarer this is used to declare output stream ids, output fields, and whether or not each output stream is a direct stream
     */
    void declareOutputFields(OutputFieldsDeclarer declarer);
    /**
     * Declare configuration specific to this component. Only a subset of the "topology.*" configs can
     * be overridden. The component configuration can be further overridden when constructing the 
     * topology using {@link TopologyBuilder}
     *
     */
    Map<String, Object> getComponentConfiguration();
}


ISpout

ISpout是實現Spout的核心接口。Spout負責提供消息給拓撲進行處理。Storm將跟蹤基於Spout發射的元組產生的有向無環圖。當Storm檢測到有向無環圖的每一個元組已經成功被處理時,它將發送一個ack信息到Spout。apache

若是一個元組在配置的超時時間以前不能被徹底處理,Storm將發送fail信息到Spout。安全

當一個Spout發送一個元組時,可使用messageId來標記元組。消息id能夠是任何類型。當Storm進行ack或者fail消息時,它可使用messageId來識別是哪些元組。若是Spout漏掉了messageId,或者將它設置爲null,那麼Storm將不會跟蹤信息,而且Spout也不會收到任何ack或者fail信息的回調。併發

Storm在同一個線程裏執行ack()、fail()和nextTuple()方法。覺得這ISpout的實現並不想須要擔憂這些方法之間的併發問題。然而,這也覺得這ISpout的實現必須確保nextTuple()方法是非阻塞,不然nextTuple()方法可能會組織等待處理的ack()和fail()方法。ide

包含以下方法:函數

  1. open:open()方法在該組件的一個任務在集羣的工做進程內被初始化時調用,提供了Spout執行所須要的環境。conf參數是這個Spout的Storm配置,提供給拓撲與這臺主機的集羣配置一塊兒進行合併。Context參數能夠用來獲取關於這個任務在拓撲中位置信息,包括該任務的id、該任務的組件id、輸入和輸出信息等。  collector參數是收集器,用於從這個Spout發射元組。元組能夠隨時被髮射,包括open()和close()方法。收集器是線程安全的,應該做爲這個Spout對象的實例變量進行保存。
  2. close:close()方法當一個ISpout即將關閉時被調用。不能保證close()方法必定會被調用,由於Supervisor能夠對集羣的工做進程使用kill -9命令強制殺死進程命令。若是在本地模式下運行Storm,當拓撲被殺死的時候,能夠保證close()方法必定會被調用。
  3. activate:activate()方法當Spout已經從失效模式中激活時被調用。該Spout的nextTuple()方法很快會調用。當使用Storm客戶端操做拓撲時,Spout能夠在失效狀態以後變成激活模式。
  4. deactivate:當Spout已經失效時被調用。當Spout失效期間,nextTuple不會被調用。Spout未來可能會也可能不會被從新激活。
  5. nextTuple:當調用nextTuple()方法時,Storm要求Spout發射元組到輸出收集器OutputCollecotr。nextTuple()方法應該是非阻塞的,因此,若是Spout沒有元組能夠發射,該方法應該返回。nextTuple()、ack()和fail()方法都在Spout任務的單一線程內緊密循環被調用。當沒有元組能夠發射時,可讓nextTuple()去sleep很短的時間,例如1毫秒,這樣就不會浪費太多的CPU資源。
  6. ack:Storm已經判定該Spout發射的標識符爲msgId的元組已經被徹底處理時,會調用ack方法。一般狀況下,ack()方法會將該信息移出隊列以防止它被重發。
  7. fail:該Spout發射的標識符爲msgId的元組未能被徹底處理時,會調用fail()方法。一般狀況下,fail方法會將消息放回隊列中,並在稍後重發消息。
package org.apache.storm.spout;
import org.apache.storm.task.TopologyContext;
import java.util.Map;
import java.io.Serializable;
/**
 * ISpout is the core interface for implementing spouts. A Spout is responsible
 * for feeding messages into the topology for processing. For every tuple emitted by
 * a spout, Storm will track the (potentially very large) DAG of tuples generated
 * based on a tuple emitted by the spout. When Storm detects that every tuple in
 * that DAG has been successfully processed, it will send an ack message to the Spout.
 *
 * If a tuple fails to be fully processed within the configured timeout for the
 * topology (see {@link org.apache.storm.Config}), Storm will send a fail message to the spout
 * for the message.
 *
 * When a Spout emits a tuple, it can tag the tuple with a message id. The message id
 * can be any type. When Storm acks or fails a message, it will pass back to the
 * spout the same message id to identify which tuple it's referring to. If the spout leaves out
 * the message id, or sets it to null, then Storm will not track the message and the spout
 * will not receive any ack or fail callbacks for the message.
 *
 * Storm executes ack, fail, and nextTuple all on the same thread. This means that an implementor
 * of an ISpout does not need to worry about concurrency issues between those methods. However, it 
 * also means that an implementor must ensure that nextTuple is non-blocking: otherwise 
 * the method could block acks and fails that are pending to be processed.
 */
public interface ISpout extends Serializable {
    /**
     * Called when a task for this component is initialized within a worker on the cluster.
     * It provides the spout with the environment in which the spout executes.
     *
     * This includes the:
     *
     * @param conf The Storm configuration for this spout. This is the configuration provided to the topology merged in with cluster configuration on this machine.
     * @param context This object can be used to get information about this task's place within the topology, including the task id and component id of this task, input and output information, etc.
     * @param collector The collector is used to emit tuples from this spout. Tuples can be emitted at any time, including the open and close methods. The collector is thread-safe and should be saved as an instance variable of this spout object.
     */
    void open(Map<String, Object> conf, TopologyContext context, SpoutOutputCollector collector);
    /**
     * Called when an ISpout is going to be shutdown. There is no guarentee that close
     * will be called, because the supervisor kill -9's worker processes on the cluster.
     *
     * The one context where close is guaranteed to be called is a topology is
     * killed when running Storm in local mode.
     */
    void close();
    
    /**
     * Called when a spout has been activated out of a deactivated mode.
     * nextTuple will be called on this spout soon. A spout can become activated
     * after having been deactivated when the topology is manipulated using the 
     * `storm` client. 
     */
    void activate();
    
    /**
     * Called when a spout has been deactivated. nextTuple will not be called while
     * a spout is deactivated. The spout may or may not be reactivated in the future.
     */
    void deactivate();
    /**
     * When this method is called, Storm is requesting that the Spout emit tuples to the 
     * output collector. This method should be non-blocking, so if the Spout has no tuples
     * to emit, this method should return. nextTuple, ack, and fail are all called in a tight
     * loop in a single thread in the spout task. When there are no tuples to emit, it is courteous
     * to have nextTuple sleep for a short amount of time (like a single millisecond)
     * so as not to waste too much CPU.
     */
    void nextTuple();
    /**
     * Storm has determined that the tuple emitted by this spout with the msgId identifier
     * has been fully processed. Typically, an implementation of this method will take that
     * message off the queue and prevent it from being replayed.
     */
    void ack(Object msgId);
    /**
     * The tuple emitted by this spout with the msgId identifier has failed to be
     * fully processed. Typically, an implementation of this method will put that
     * message back on the queue to be replayed at a later time.
     */
    void fail(Object msgId);
}


IRichSpout

IRichSpout繼承了ISpout接口和IComponent接口。oop

package org.apache.storm.topology;
import org.apache.storm.spout.ISpout;
/**
 * When writing topologies using Java, {@link IRichBolt} and {@link IRichSpout} are the main interfaces
 * to use to implement components of the topology.
 *
 */
public interface IRichSpout extends ISpout, IComponent {
}


IBolt

IBolt是實現Bolt的核心接口。IBolt表示一個以元組做爲輸入並生成元組做爲輸出的組件。IBolt能夠完成過濾、鏈接、函數、聚合等任何功能。IBolt沒有當即處理元組,能夠保留元組之後再處理。ui

Bolt的生命週期以下:在客戶端主機上建立IBolt對象,IBolt被序列化到拓撲(使用Java序列化)並提交到集羣的主控節點(Nimbus)。而後supervisor啓動工做進程(Worker)反序列化對象,調用對象上的prepare方法,而後開始處理元組。this

若是你喜歡參數化一個IBolt,應該經過其構造函數設置參數並做爲實例變量保存參數化狀態,而後,實例變量會序列化,併發送給跨集羣的每一個任務來執行這個Bolt。若是使用Java來定義Bolt,應該使用IRichBolt接口,IRichBolt接口添加了使用Java TopologyBuilder API的必要方法。

IBolt以下方法:

  1. prepare:prepare()方法在該組件的一個任務是集羣的工做進程內被初始化時調用,提供了Bolt執行時所須要的環境。topoConf參數是這個Bolt的Storm配置,提供給拓撲和這臺主機上的集羣配置一塊兒進行合併。context參數能夠用來獲取關於這個任務在拓撲中的位置信息,包括該任務的id、該任務的組件id、輸入輸出信息等。collector參數是收集器,用於從這個Bolt發射元組。元組能夠隨社被髮射,包括prepare()和cleanup()方法。收集器是線程安全的,應該做爲這個Bolt對象的實例變量進行保存。
  2. execute:execute()方法處理一個輸入元組。元組對象包括元組來自哪一個component/stream/task的元數據。元組的值可使用Tuple#getValue()進行訪問。IBolt沒有當即處理元組,而是徹底的捕獲一個元組並在之後進行處理,例如,作聚合或者鏈接計算。元組應該使用prepare方法提供的OutputCollector進行發射。使用OutputCollector在某種程度上要求全部輸入元組是ack或者fail。不然,Storm將沒法肯定來自Spout的元組何時會處理完成。常規作法是,在executor方法結束是對輸入元組調用ack方法,而IBasicBolt會自動處理該部分。input參數爲被處理的輸入元組。
  3. cleanup:cleanup方法當一個IBolt即將關閉時被調用。不能保證cleanup方法必定被調用,由於Supervisor能夠對集羣的工做進程使用kill -9命令強制殺死進程命令。若是在本地模式下運行Storm,當拓撲被殺死的時候,能夠保證cleanup()方法必定會被調用。
package org.apache.storm.task;
import org.apache.storm.tuple.Tuple;
import java.util.Map;
import java.io.Serializable;
/**
 * An IBolt represents a component that takes tuples as input and produces tuples
 * as output. An IBolt can do everything from filtering to joining to functions
 * to aggregations. It does not have to process a tuple immediately and may
 * hold onto tuples to process later.
 *
 * A bolt's lifecycle is as follows:
 *
 * IBolt object created on client machine. The IBolt is serialized into the topology
 * (using Java serialization) and submitted to the master machine of the cluster (Nimbus).
 * Nimbus then launches workers which deserialize the object, call prepare on it, and then
 * start processing tuples.
 *
 * If you want to parameterize an IBolt, you should set the parameters through its
 * constructor and save the parameterization state as instance variables (which will
 * then get serialized and shipped to every task executing this bolt across the cluster).
 *
 * When defining bolts in Java, you should use the IRichBolt interface which adds
 * necessary methods for using the Java TopologyBuilder API.
 */
public interface IBolt extends Serializable {
    /**
     * Called when a task for this component is initialized within a worker on the cluster.
     * It provides the bolt with the environment in which the bolt executes.
     *
     * This includes the:
     * 
     * @param topoConf The Storm configuration for this bolt. This is the configuration provided to the topology merged in with cluster configuration on this machine.
     * @param context This object can be used to get information about this task's place within the topology, including the task id and component id of this task, input and output information, etc.
     * @param collector The collector is used to emit tuples from this bolt. Tuples can be emitted at any time, including the prepare and cleanup methods. The collector is thread-safe and should be saved as an instance variable of this bolt object.
     */
    void prepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector);
    /**
     * Process a single tuple of input. The Tuple object contains metadata on it
     * about which component/stream/task it came from. The values of the Tuple can
     * be accessed using Tuple#getValue. The IBolt does not have to process the Tuple
     * immediately. It is perfectly fine to hang onto a tuple and process it later
     * (for instance, to do an aggregation or join).
     *
     * Tuples should be emitted using the OutputCollector provided through the prepare method.
     * It is required that all input tuples are acked or failed at some point using the OutputCollector.
     * Otherwise, Storm will be unable to determine when tuples coming off the spouts
     * have been completed.
     *
     * For the common case of acking an input tuple at the end of the execute method,
     * see IBasicBolt which automates this.
     * 
     * @param input The input tuple to be processed.
     */
    void execute(Tuple input);
    /**
     * Called when an IBolt is going to be shutdown. There is no guarentee that cleanup
     * will be called, because the supervisor kill -9's worker processes on the cluster.
     *
     * The one context where cleanup is guaranteed to be called is when a topology
     * is killed when running Storm in local mode.
     */
    void cleanup();
}

IRichBolt

IRichBolt繼承了IBolt接口和IComponent接口。

 package org.apache.storm.topology;
import org.apache.storm.task.IBolt;
/**
 * When writing topologies using Java, {@link IRichBolt} and {@link IRichSpout} are the main interfaces
 * to use to implement components of the topology.
 *
 */
public interface IRichBolt extends IBolt, IComponent {
}

IBasicBolt

IBasicBolt繼承了IComponent接口。

IBasicBolt與IRichBolt具備同樣的同名方法,可是IBasicBolt的execute方法會自動處理Acking機制。

package org.apache.storm.topology;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.tuple.Tuple;
import java.util.Map;
public interface IBasicBolt extends IComponent {
    void prepare(Map<String, Object> topoConf, TopologyContext context);
    /**
     * Process the input tuple and optionally emit new tuples based on the input tuple.
     * 
     * All acking is managed for you. Throw a FailedException if you want to fail the tuple.
     */
    void execute(Tuple input, BasicOutputCollector collector);
    void cleanup();
}
相關文章
相關標籤/搜索