最近這段時間一直在忙新集羣遷移,上了最新的cdh6.3.0 因而Flink 提交遇到了許多的問題java
還好有cloudera License 有了原廠的幫助和社區的夥伴,問題解決起來快了很多,手動滑稽shell
集羣具體狀況是,cdh6.3.0+Flink1.8.1,整個數據平臺所有組件都上了kerberos和ldap由於要過認證,因此任務提交方法咱們選擇統一oozie提交任務apache
而且由於kerberos認證,還須要Flink perjob 須要單獨的keytab,才能細膩度的控制權限,由於咱們如今部門之間計算資源的劃分是經過yarn資源隊列bash
可是如今Flink支持的不是很好,目前只能在配置文件中配置一個keytab,job啓動都去這個拉這個keytab複製到本身的contain裏面服務器
可是Flink第一提交方式仍是但願可以經過oozie提交jobsession
因爲oozie沒有天生支持Flink提交,因此只能選擇oozie shell action 的方式提交jobapp
在Flink搭建好之後開始提交任務,用oozie shell提交tcp
#!/bin/bashoop
flink run -m yarn-cluster flinktest.jarspa
立刻 Duang
flink command not find
改爲命令絕對路徑之後! 仍是 Duang
org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:387)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:259) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
調度不了yarn ,這是由於oozie會覆蓋掉HADOOP_CONF_DIR
因而在shell裏面手動export HADOOP_CONF_DIR = xxxxx
發現!!!
能夠提交了
可是!!!
有時候能成功有時候失敗????黑人問號
org.apache.flink.runtime.resourcemanager.exceptions.ResourceManagerException: Could not start the ResourceManager akka.tcp://flink@xxxxx:36166/user/resourcemanager
at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:202)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:539)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:164)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.flink.runtime.resourcemanager.exceptions.ResourceManagerException: Could not start resource manager client.
at org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:250)
at org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:212)
at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:200)
... 16 more Caused by: org.apache.hadoop.yarn.exceptions.InvalidApplicationMasterRequestException: Application Master is already regist
resourcemanager註冊 Application Master的時候已經被註冊了?而後發生了一些異常
可是有時候又能夠提交成功,這個就讓我有點困惑
最後發現是由於oozie覆蓋了不少集羣上的環境變量致使
解決辦法 在oozie 腳本的flink命令前加env -i
這樣會清除全部的環境變量,oozie就會使用登錄yarn用戶的環境變量來運行shell了
終於
#!/bin/bash
env -i /flink run -m yarn-cluster flinktest.jar
shell action成功提交flink任務
可是kerberos如今尚未解決,由於這樣提交job會去服務器上讀flink-conf.yaml文件裏的kerberos認證,而後複製對應的keytab到全部容器,全部任務都是公用的一個
這樣的話不能實現每一個job單獨使用一個keytab,每一個job使用本身對應的kerberos認證
因而在社區羣上取了下經,你們實現的方法也是千奇百怪
有所有任務公用一個認證的,有用CICD在容器每次提交的鏡像中在flink-conf.yaml中修改成指定的kerberos的
可是 咱們不同~~
由於咱們是oozie提交任務,有點頭大,還好最後仍是解決了
由於Flink是經過去FLINK_CONF_DIR路徑下去讀取默認的flink-conf.yaml文件中的kerberos認證
那咱們就須要在oozie shell 腳本中指定咱們本身修改的flink-conf.yaml文件路徑經過手動指定FLINK_CONF_DIR去覆蓋Flink默認的
這個路徑咱們填寫相對路徑,由於oozie運行時會將提交的文件複製到運行時的相對路徑下面
也就是說,咱們能夠oozie中把咱們的keytab文件以及整個conf文件夾都上傳上去,修改conf/flink-conf.yaml文件中的kerberos選項
security.kerberos.login.keytab = .
security.kerberos.login.principal = xxx
這裏的keytab路徑就填寫相對路徑./由於oozie會把你上傳的keytab拷貝過去
最後運行oozie shell 腳本
#!/bin/bash
env -i FLINK_CONF_DIR=./conf /flink run -m yarn-cluster ./flinktest.jar
成功使用本身指定的keytab用戶運行job