部署方式:docker+airflow+mysql+LocalExecutormysql
使用airflow的docker鏡像git
https://hub.docker.com/r/puckel/docker-airflowgithub
使用默認的sqlite+SequentialExecutor啓動:web
$ docker run -d -p 8080:8080 puckel/docker-airflow webserversql
將容器中的airflow.cfg拷貝出來修改docker
$ docker cp $container_id:/usr/local/airflow/airflow.cfg .post
嘗試使用自定義airflow.cfgpostgresql
-v /usr/local/airflow/airflow.cfg:/usr/local/airflow/airflow.cfgserver
其中修改sql_alchemy_conn爲mysql,修改executor = LocalExecutorsqlite
發現使用的仍是SequentialExecutor
[2019-02-28 19:37:16,170] {{__init__.py:51}} INFO - Using executor SequentialExecutor
查看Dockerfile:docker-airflow/Dockerfile
ENTRYPOINT ["/entrypoint.sh"]
CMD ["webserver"] # set default arg for entrypoint
發現最後啓動的腳本是entrypoint.sh
查看entrypoint.sh:docker-airflow/script/entrypoint.sh
: "${AIRFLOW__CORE__EXECUTOR:=${EXECUTOR:-Sequential}Executor}"
...
if [ "$AIRFLOW__CORE__EXECUTOR" != "SequentialExecutor" ]; then
AIRFLOW__CORE__SQL_ALCHEMY_CONN="postgresql+psycopg2://$POSTGRES_USER:$POSTGRES_PASSWORD@$POSTGRES_HOST:$POSTGRES_PORT/$POSTGRES_DB"
AIRFLOW__CELERY__RESULT_BACKEND="db+postgresql://$POSTGRES_USER:$POSTGRES_PASSWORD@$POSTGRES_HOST:$POSTGRES_PORT/$POSTGRES_DB"
wait_for_port "Postgres" "$POSTGRES_HOST" "$POSTGRES_PORT"
fi
...
case "$1" in
webserver)
airflow initdb
if [ "$AIRFLOW__CORE__EXECUTOR" = "LocalExecutor" ]; then
# With the "Local" executor it should all run in one container.
airflow scheduler &
fi
exec airflow webserver
;;
1)取環境變量EXECUTOR(取值爲Sequential、Local等)來構造環境變量AIRFLOW__CORE__EXECUTOR;
2)若是AIRFLOW__CORE__EXECUTOR不是SequentialExecutor,就等待postgres(這裏強制依賴postgres);
3)若是啓動參數爲webserver,同時AIRFLOW__CORE__EXECUTOR=LocalExecutor,自動啓動scheduler;
Due to Airflow’s automatic environment variable expansion, you can also set the env var AIRFLOW__CORE__* to temporarily overwrite airflow.cfg.
因爲環境變量優先級高於airflow.cfg,因此即便修改了airflow.cfg中executor=LocalExecutor,實際使用的仍是SequentialExecutor;將容器中的entrypoint.sh拷貝出來修改
$ docker cp $container_id:/entrypoint.sh .
註釋掉如下行
#if [ "$AIRFLOW__CORE__EXECUTOR" != "SequentialExecutor" ]; then
# AIRFLOW__CORE__SQL_ALCHEMY_CONN="postgresql+psycopg2://$POSTGRES_USER:$POSTGRES_PASSWORD@$POSTGRES_HOST:$POSTGRES_PORT/$POSTGRES_DB"
# AIRFLOW__CELERY__RESULT_BACKEND="db+postgresql://$POSTGRES_USER:$POSTGRES_PASSWORD@$POSTGRES_HOST:$POSTGRES_PORT/$POSTGRES_DB"
# wait_for_port "Postgres" "$POSTGRES_HOST" "$POSTGRES_PORT"
#fi
啓動命令
$ docker run -d -p 8080:8080 -e EXECUTOR=Local -v /usr/local/airflow/airflow.cfg:/usr/local/airflow/airflow.cfg -v /usr/local/airflow/entrypoint.sh:/entrypoint.sh -v /usr/local/airflow/dags:/usr/local/airflow/dags -v /usr/local/airflow/logs:/usr/local/airflow/logs puckel/docker-airflow webserver
雖然是單點,可是配合mesos+hdfs nfs能夠作成高可用用於生產環境;
參考:https://github.com/puckel/docker-airflow