在 《從0到1學習Flink》—— Data Source 介紹 文章中,我給你們介紹了 Flink Data Source 以及簡短的介紹了一下自定義 Data Source,這篇文章更詳細的介紹下,並寫一個 demo 出來讓你們理解。java
咱們先來看下 Flink 從 Kafka topic 中獲取數據的 demo,首先你須要安裝好了 FLink 和 Kafka 。mysql
運行啓動 Flink、Zookepeer、Kafka,sql
好了,都啓動了!數據庫
maven 依賴apache
<!--flink java--> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>${flink.version}</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_${scala.binary.version}</artifactId> <version>${flink.version}</version> <scope>provided</scope> </dependency> <!--日誌--> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> <version>1.7.7</version> <scope>runtime</scope> </dependency> <dependency> <groupId>log4j</groupId> <artifactId>log4j</artifactId> <version>1.2.17</version> <scope>runtime</scope> </dependency> <!--flink kafka connector--> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-kafka-0.11_${scala.binary.version}</artifactId> <version>${flink.version}</version> </dependency> <!--alibaba fastjson--> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>1.2.51</version> </dependency>
實體類 Metric.javajson
package com.zhisheng.flink.model; import java.util.Map; /** * Desc: * weixi: zhisheng_tian * blog: http://www.54tianzhisheng.cn/ */ public class Metric { public String name; public long timestamp; public Map<String, Object> fields; public Map<String, String> tags; public Metric() { } public Metric(String name, long timestamp, Map<String, Object> fields, Map<String, String> tags) { this.name = name; this.timestamp = timestamp; this.fields = fields; this.tags = tags; } @Override public String toString() { return "Metric{" + "name='" + name + '\'' + ", timestamp='" + timestamp + '\'' + ", fields=" + fields + ", tags=" + tags + '}'; } public String getName() { return name; } public void setName(String name) { this.name = name; } public long getTimestamp() { return timestamp; } public void setTimestamp(long timestamp) { this.timestamp = timestamp; } public Map<String, Object> getFields() { return fields; } public void setFields(Map<String, Object> fields) { this.fields = fields; } public Map<String, String> getTags() { return tags; } public void setTags(Map<String, String> tags) { this.tags = tags; } }
往 kafka 中寫數據工具類:KafkaUtils.javabootstrap
import com.alibaba.fastjson.JSON; import com.zhisheng.flink.model.Metric; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; import java.util.HashMap; import java.util.Map; import java.util.Properties; /** * 往kafka中寫數據 * 可使用這個main函數進行測試一下 * weixin: zhisheng_tian * blog: http://www.54tianzhisheng.cn/ */ public class KafkaUtils { public static final String broker_list = "localhost:9092"; public static final String topic = "metric"; // kafka topic,Flink 程序中須要和這個統一 public static void writeToKafka() throws InterruptedException { Properties props = new Properties(); props.put("bootstrap.servers", broker_list); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); //key 序列化 props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); //value 序列化 KafkaProducer producer = new KafkaProducer<String, String>(props); Metric metric = new Metric(); metric.setTimestamp(System.currentTimeMillis()); metric.setName("mem"); Map<String, String> tags = new HashMap<>(); Map<String, Object> fields = new HashMap<>(); tags.put("cluster", "zhisheng"); tags.put("host_ip", "101.147.022.106"); fields.put("used_percent", 90d); fields.put("max", 27244873d); fields.put("used", 17244873d); fields.put("init", 27244873d); metric.setTags(tags); metric.setFields(fields); ProducerRecord record = new ProducerRecord<String, String>(topic, null, null, JSON.toJSONString(metric)); producer.send(record); System.out.println("發送數據: " + JSON.toJSONString(metric)); producer.flush(); } public static void main(String[] args) throws InterruptedException { while (true) { Thread.sleep(300); writeToKafka(); } } }
運行:api
若是出現如上圖標記的,即表明可以不斷的往 kafka 發送數據的。session
Main.javamaven
package com.zhisheng.flink; import org.apache.flink.api.common.serialization.SimpleStringSchema; import org.apache.flink.streaming.api.datastream.DataStreamSource; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer011; import java.util.Properties; /** * Desc: * weixi: zhisheng_tian * blog: http://www.54tianzhisheng.cn/ */ public class Main { public static void main(String[] args) throws Exception { final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("zookeeper.connect", "localhost:2181"); props.put("group.id", "metric-group"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); //key 反序列化 props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("auto.offset.reset", "latest"); //value 反序列化 DataStreamSource<String> dataStreamSource = env.addSource(new FlinkKafkaConsumer011<>( "metric", //kafka topic new SimpleStringSchema(), // String 序列化 props)).setParallelism(1); dataStreamSource.print(); //把從 kafka 讀取到的數據打印在控制檯 env.execute("Flink add data source"); } }
運行起來:
看到沒程序,Flink 程序控制臺可以源源不斷的打印數據呢。
上面就是 Flink 自帶的 Kafka source,那麼接下來就模仿着寫一個從 MySQL 中讀取數據的 Source。
首先 pom.xml 中添加 MySQL 依賴:
<dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.34</version> </dependency>
數據庫建表以下:
DROP TABLE IF EXISTS `student`; CREATE TABLE `student` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `name` varchar(25) COLLATE utf8_bin DEFAULT NULL, `password` varchar(25) COLLATE utf8_bin DEFAULT NULL, `age` int(10) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
插入數據:
INSERT INTO `student` VALUES ('1', 'zhisheng01', '123456', '18'), ('2', 'zhisheng02', '123', '17'), ('3', 'zhisheng03', '1234', '18'), ('4', 'zhisheng04', '12345', '16'); COMMIT;
新建實體類:Student.java
package com.zhisheng.flink.model; /** * Desc: * weixi: zhisheng_tian * blog: http://www.54tianzhisheng.cn/ */ public class Student { public int id; public String name; public String password; public int age; public Student() { } public Student(int id, String name, String password, int age) { this.id = id; this.name = name; this.password = password; this.age = age; } @Override public String toString() { return "Student{" + "id=" + id + ", name='" + name + '\'' + ", password='" + password + '\'' + ", age=" + age + '}'; } public int getId() { return id; } public void setId(int id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public String getPassword() { return password; } public void setPassword(String password) { this.password = password; } public int getAge() { return age; } public void setAge(int age) { this.age = age; } }
新建 Source 類 SourceFromMySQL.java,該類繼承 RichSourceFunction ,實現裏面的 open、close、run、cancel 方法:
package com.zhisheng.flink.source; import com.zhisheng.flink.model.Student; import org.apache.flink.configuration.Configuration; import org.apache.flink.streaming.api.functions.source.RichSourceFunction; import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; /** * Desc: * weixi: zhisheng_tian * blog: http://www.54tianzhisheng.cn/ */ public class SourceFromMySQL extends RichSourceFunction<Student> { PreparedStatement ps; private Connection connection; /** * open() 方法中創建鏈接,這樣不用每次 invoke 的時候都要創建鏈接和釋放鏈接。 * * @param parameters * @throws Exception */ @Override public void open(Configuration parameters) throws Exception { super.open(parameters); connection = getConnection(); String sql = "select * from Student;"; ps = this.connection.prepareStatement(sql); } /** * 程序執行完畢就能夠進行,關閉鏈接和釋放資源的動做了 * * @throws Exception */ @Override public void close() throws Exception { super.close(); if (connection != null) { //關閉鏈接和釋放資源 connection.close(); } if (ps != null) { ps.close(); } } /** * DataStream 調用一次 run() 方法用來獲取數據 * * @param ctx * @throws Exception */ @Override public void run(SourceContext<Student> ctx) throws Exception { ResultSet resultSet = ps.executeQuery(); while (resultSet.next()) { Student student = new Student( resultSet.getInt("id"), resultSet.getString("name").trim(), resultSet.getString("password").trim(), resultSet.getInt("age")); ctx.collect(student); } } @Override public void cancel() { } private static Connection getConnection() { Connection con = null; try { Class.forName("com.mysql.jdbc.Driver"); con = DriverManager.getConnection("jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=UTF-8", "root", "root123456"); } catch (Exception e) { System.out.println("-----------mysql get connection has exception , msg = "+ e.getMessage()); } return con; } }
Flink 程序:
package com.zhisheng.flink; import com.zhisheng.flink.source.SourceFromMySQL; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; /** * Desc: * weixi: zhisheng_tian * blog: http://www.54tianzhisheng.cn/ */ public class Main2 { public static void main(String[] args) throws Exception { final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.addSource(new SourceFromMySQL()).print(); env.execute("Flink add data sourc"); } }
運行 Flink 程序,控制檯日誌中能夠看見打印的 student 信息。
RichSourceFunction
從上面自定義的 Source 能夠看到咱們繼承的就是這個 RichSourceFunction 類,那麼來了解一下:
一個抽象類,繼承自 AbstractRichFunction。爲實現一個 Rich SourceFunction 提供基礎能力。該類的子類有三個,兩個是抽象類,在此基礎上提供了更具體的實現,另外一個是 ContinuousFileMonitoringFunction。
本文主要講了下 Flink 使用 Kafka Source 的使用,並提供了一個 demo 教你們如何自定義 Source,從 MySQL 中讀取數據,固然你也能夠從其餘地方讀取,實現本身的數據源 source。可能平時工做會比這個更復雜,須要你們靈活應對!