Skip to content

Files

Latest commit

1e136e2 · Jan 16, 2024

History

History

airflow

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Dec 4, 2022
Jan 16, 2024

Apache Airflow

Alternatives

Guides

CLI

Installation

pip

export SLUGIFY_USES_TEXT_UNIDECODE=yes​

pip3 install apache-airflow[postgres]

Commands

airflow -h

Usage

#
airflow initdb

#
airflow webserver

Helm

References

Repository

helm repo add airflow-stable 'https://airflow-helm.github.io/charts'
helm repo update

Install

#
kubectl create ns airflow

#
export KUBERNETES_IP='<kubernetes-ip>'
export DOMAIN="${KUBERNETES_IP}.nip.io"

#
helm install airflow airflow-stable/airflow \
  --namespace airflow \
  --version 8.5.2 \
  -f <(cat << EOF
airflow:
  fernetKey: '$(echo -n $(openssl rand -base64 32))'
  webserverSecretKey: '$(echo -n $(openssl rand -base64 32))'

  users:
  - username: admin
    password: admin
    role: Admin
    email: [email protected]
    firstName: admin
    lastName: admin

  config:
    AIRFLOW__CORE__DEFAULT_TIMEZONE: America/Sao_Paulo
    AIRFLOW__CORE__LOAD_EXAMPLES: 'True'

ingress:
  enabled: true
  web:
    host: airflow.${K8S_DOMAIN}
  flower:
    host: flower.${K8S_DOMAIN}
EOF
)

Persistence

Local

helm upgrade airflow airflow-stable/airflow \
  --namespace airflow \
  -f <(yq m <(cat << EOF
logs:
  persistence:
    enabled: true
    size: 1Gi

dags:
  persistence:
    enabled: true
    size: 1Gi
EOF
) <(helm get values airflow --namespace airflow))

Remote (Logging)

Dependencies: MinIO

#
helm upgrade airflow airflow-stable/airflow \
  --namespace airflow \
  -f <(yq m <(cat << EOF
airflow:
  config:
    AWS_DEFAULT_REGION: us-east-1
    AIRFLOW__LOGGING__REMOTE_LOGGING: 'True'
    AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: s3://airflow/logs
    AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: local_minio

  connections:
    - id: local_minio
      type: s3
      login: minio
      password: minio123
      extra: |-
        {
          "host": "http://minio:9000",
          "region_name": "us-east-1"
        }
EOF
) <(helm get values airflow --namespace airflow))

Prometheus Stack

Dependencies: kube-prometheus (a.k.a prometheus-stack, p.k.a. prometheus-operator)

#
kubectl get prometheus \
  -o jsonpath='{.items[*].spec.serviceMonitorSelector}' \
  -n monitoring

#
helm upgrade airflow airflow-stable/airflow \
  --namespace airflow \
  -f <(yq m <(cat << EOF
serviceMonitor:
  enabled: true
  selector:
    release: prometheus-stack

prometheusRule:
  enabled: true
EOF
) <(helm get values airflow --namespace airflow))

Status

kubectl rollout status deploy/airflow-web \
  -n airflow

Logs

kubectl logs \
  -l 'component=web' \
  -n airflow \
  -f

Issues

Missing SSM Full Access Policy

An error occurred (AccessDeniedException) when calling the GetParameter operation: User: arn:aws:sts::<...>:assumed-role/nodes.<...>.k8s.local/i-<...> is not authorized to perform: ssm:GetParameter on resource: arn:aws:ssm:us-east-1:<...>:parameter/airflow/connections/aws_s3
  1. Go to Identity and Access Management (IAM)
  2. Roles tab -> Find the role
  3. Atache "AmazonSSMFullAccess" policy

Delete

helm uninstall airflow \
  -n airflow

kubectl delete ns airflow \
  --grace-period=0 \
  --force

Kubernetes

Running

kubectl run -it \
  curl \
  --image docker.io/apache/airflow:2.1.3-python3.8 \
  --command \
  -- /bin/bash