-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump airflow version to 1.10.10 #537
base: master
Are you sure you want to change the base?
Conversation
fix: #535
Hopefully we can see this merged. I opened an issue a couple days ago #536 But it seems like this repo is not maintained anymore. Simple fast forward PRs are not being merged. No new commits... |
Bump |
With airflow 1.10.10 release they also support a Production docker image. You can read about it in the official blog. I've been using it instead of this one. So far so good It would be nice if puckel keeped the docker-compose.yml files updated ( |
A production ready docker-compose with secret handling etc would be nice. |
@wittfabian , github.com/apache/airflow releases an official production ready image. This repo is no longer maintained (it seems)
|
Yeah, we use that too. Or how best to build on the base image to get your own configuration. User, SSL, connections, variables. Using the image is usually only half the battle, the difficult part is not described anywhere. |
I know what you mean. I deployed my stack in a single machine with an external db as a service to make the deployment as idempotent as possible (not holding Data or state but only functionality) Depending on your needs you might want one kind of deployment or another. For example, if you cannot afford spending a couple of months learning kubernetes, or if in a big team anyone should be knowledgeable of the deployment process I suggest some easier technology (docker compose or docker stack) If you need routing between several instances of the webserver and SSL you can use traefik. But if it's not open to the public, or running inside a VPN then SSL is only makeup. If you need to scale BIG then I suggest scaling horizontally adding templated Machines docker compose and a bunch of workers. But if you are in early stages you can get away with scaling vertically simply upping the Machine resources For sincing your dags there are options too, depending on how do you want your deployments done. There's people that:
In terms of local deployment it depends on the airflow version you are using . If you choose the stable 1.10.x you can get away with the pip version to run the whole thing. But if you run the 2.0 then it's best to run a small docker-compose stack, because web resources are not built and dependencies are not installed . So you see, there's lots of options (even more than I listed). It depends on what you want |
Since migrating to the official image is being discussed here I want to add some stuff I figured out today. First of all, here's a discussion on the official docker image with docker-compose examples: And here's what I had to do to migrate from puckel's image... Migration to official imageAnd in order to change from the puckel version to the official one, I had to...
I was also upgrading from 1.10 and had exceptions when accessing the web interface. It turned out NULL in the UPDATE dag
SET description = ''
WHERE description IS NULL; And Here's my docker-compose config using LocalExecutor... docker-compose.airflow.yml:version: '2.1'
services:
airflow:
# image: apache/airflow:1.10.10
build:
context: .
args:
- DOCKER_UID=${DOCKER_UID-1000}
dockerfile: Dockerfile
restart: always
environment:
- AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgres://airflow:${POSTGRES_PW-airflow}@postgres:5432/airflow
- AIRFLOW__CORE__FERNET_KEY=${AF_FERNET_KEY-GUYoGcG5xdn5K3ysGG3LQzOt3cc0UBOEibEPxugDwas=}
- AIRFLOW__CORE__EXECUTOR=LocalExecutor
- AIRFLOW__CORE__AIRFLOW_HOME=/opt/airflow/
- AIRFLOW__CORE__LOAD_EXAMPLES=False
- AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False
- AIRFLOW__CORE__LOGGING_LEVEL=${AF_LOGGING_LEVEL-info}
volumes:
- ../airflow/dags:/opt/airflow/dags:z
- ../airflow/plugins:/opt/airflow/plugins:z
- ./volumes/airflow_data_dump:/opt/airflow/data_dump:z
- ./volumes/airflow_logs:/opt/airflow/logs:z
healthcheck:
test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3 docker-compose.yml:version: '2.1'
services:
postgres:
image: postgres:9.6
container_name: af_postgres
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=${POSTGRES_PW-airflow}
- POSTGRES_DB=airflow
- PGDATA=/var/lib/postgresql/data/pgdata
volumes:
- ./volumes/postgres_data:/var/lib/postgresql/data/pgdata:Z
ports:
- 127.0.0.1:5432:5432
webserver:
extends:
file: docker-compose.airflow.yml
service: airflow
container_name: af_webserver
command: webserver
depends_on:
- postgres
ports:
- ${DOCKER_PORTS-8080}
networks:
- proxy
- default
environment:
# Web Server Config
- AIRFLOW__WEBSERVER__DAG_DEFAULT_VIEW=graph
- AIRFLOW__WEBSERVER__HIDE_PAUSED_DAGS_BY_DEFAULT=true
- AIRFLOW__WEBSERVER__RBAC=true
# Web Server Performance tweaks
# 2 * NUM_CPU_CORES + 1
- AIRFLOW__WEBSERVER__WORKERS=${AF_WORKERS-2}
# Restart workers every 30min instead of 30seconds
- AIRFLOW__WEBSERVER__WORKER_REFRESH_INTERVAL=1800
labels:
- "traefik.enable=true"
- "traefik.http.routers.airflow.rule=Host(`af.example.com`)"
- "traefik.http.routers.airflow.middlewares=admin-auth@file"
scheduler:
extends:
file: docker-compose.airflow.yml
service: airflow
container_name: af_scheduler
command: scheduler
depends_on:
- postgres
environment:
# Performance Tweaks
# Reduce how often DAGs are reloaded to dramatically reduce CPU use
- AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL=${AF_MIN_FILE_PROCESS_INTERVAL-60}
- AIRFLOW__SCHEDULER__MAX_THREADS=${AF_THREADS-1}
networks:
proxy:
external: true Dockerfile:# Custom Dockerfile
FROM apache/airflow:1.10.10
# Install mssql support & dag dependencies
USER root
RUN apt-get update -yqq \
&& apt-get install -y gcc freetds-dev \
&& apt-get install -y git procps \
&& apt-get install -y vim
RUN pip install apache-airflow[mssql,mssql,ssh,s3,slack]
RUN pip install azure-storage-blob sshtunnel google-api-python-client oauth2client \
&& pip install git+https://github.com/infusionsoft/Official-API-Python-Library.git \
&& pip install rocketchat_API
# This fixes permission issues on linux.
# The airflow user should have the same UID as the user running docker on the host system.
# make build is adjust this value automatically
ARG DOCKER_UID
RUN \
: "${DOCKER_UID:?Build argument DOCKER_UID needs to be set and non-empty. Use 'make build' to set it automatically.}" \
&& usermod -u ${DOCKER_UID} airflow \
&& find / -path /proc -prune -o -user 50000 -exec chown -h airflow {} \; \
&& echo "Set airflow's uid to ${DOCKER_UID}"
USER airflow MakefileAnd here's my Makefile to control it the containers like SERVICE = "scheduler"
TITLE = "airflow containers"
ACCESS = "http://af.example.com"
.PHONY: run
build:
docker-compose build
run:
@echo "Starting $(TITLE)"
docker-compose up -d
@echo "$(TITLE) running on $(ACCESS)"
runf:
@echo "Starting $(TITLE)"
docker-compose up
stop:
@echo "Stopping $(TITLE)"
docker-compose down
restart: stop print-newline run
tty:
docker-compose run --rm --entrypoint='' $(SERVICE) bash
ttyr:
docker-compose run --rm --entrypoint='' -u root $(SERVICE) bash
attach:
docker-compose exec $(SERVICE) bash
attachr:
docker-compose exec -u root $(SERVICE) bash
logs:
docker-compose logs --tail 50 --follow $(SERVICE)
conf:
docker-compose config
initdb:
docker-compose run --rm $(SERVICE) initdb
upgradedb:
docker-compose run --rm $(SERVICE) upgradedb
print-newline:
@echo ""
@echo "" |
Also , in the official repo they are working on a docker-compose config file. Feel free to contribute |
Hello. How were you able to run the official airflow image? I have made the pull and after that |
You can use a "init task". |
@KimchaC Works for me!
|
@athenawisdoms yes you can adjust the first line in the Makefile: |
Just for reference, we've also been using airflow by dividing it into two repositories of dags and the other one is for the docker-compose infra. |
@gnomeria Is there any resource on how to do that? |
I'm sorry I'm pretty swamped at the moment, couldn't get it cleaned up. import git
import subprocess
import time
from flask import Flask, request, abort
from flask import jsonify
app = Flask(__name__)
def rebuild_docker_compose():
dc_build = subprocess.Popen("docker-compose build", shell=True)
dc_build_status = dc_build.wait()
dc_restart = subprocess.Popen("docker-compose restart", shell=True)
dc_restart_status = dc_restart.wait()
return {'build_status': dc_build_status, 'restart_status': dc_restart_status}
@app.route('/trigger-update', methods=['GET'])
def webhook():
return_status = rebuild_docker_compose()
print("Return code: {}".format(return_status))
res = {
'status': 200,
'extra': return_status
}
return jsonify(res), 200
@app.route('/trigger-update', methods=['POST'])
def webhook_post():
repo = git.Repo('../dags')
repo.remotes.origin.pull()
res = {
'status': 200
}
return jsonify(res), 200
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8081) and run this with a
Some of the steps are a bit manual too. And it lacks some security measures. For the webhook, I think you should do something like https://github.com/adnanh/webhook to run a git pull with a webhook token. Though it's been running fine for almost a year now with that😃 For the airflow-dag repo, it's a bit more straightforward, and it contains only a dag codes. It's also being structured with multiple folders and has some shared commons/modules with unit testing with |
Hi Uses official airflow image in Puckel's docker compose files. |
Hey. This is really nice. Thanks a lot. |
The easy way is mount requirement.txt
requirement.txt:
|
Fix #535 and close #536 where the SQLAlchemy==1.3.16 causing issue based on https://github.com/apache/airflow/issues/8211