Welcome to Siva's Blog

~-Scribbles by Sivananda Hanumanthu
My experiences and learnings on Technology, Leadership, Domains, Life and on various topics as a reference!
What you can expect here, it could be something on Java, J2EE, Databases, or altogether on a newer Programming language, Software Engineering Best Practices, Software Architecture, SOA, REST, Web Services, Micro Services, APIs, Technical Architecture, Design, Programming, Cloud, Application Security, Artificial Intelligence, Machine Learning, Big data and Analytics, Integrations, Middleware, Continuous Delivery, DevOps, Cyber Security, Application Security, QA/QE, Automations, Emerging Technologies, B2B, B2C, ERP, SCM, PLM, FinTech, IoT, RegTech or any other domain, Tips & Traps, News, Books, Life experiences, Notes, latest trends and many more...

Wednesday, November 3, 2021

When to use Airbyte along with Airflow

 When to use Airbyte along with Airflow?

Airflow shines as a workflow orchestrator. Because Airflow is widely adopted, many data teams also use Airflow transfer and transformation operators to schedule and author their ETL pipelines. Several of those data teams have migrated their ETL pipelines to follow the ELT paradigm. We have seen some of the challenges of building full data replication and incremental loads DAGs with Airflow. More troublesome is that sources and destinations are tightly coupled in Airflow transfer operators. Because of this, it will be hard for Airflow to cover the long-tail of integrations for your business applications. 

One alternative is to keep using Airflow as a scheduler and integrate it with two other open-source projects that are better suited for ELT pipelines, Airbyte for the EL parts and dbt for the T part. Airbyte sources are decoupled from destinations so you can already sync data from 100+ sources (databases,  APIs, ...) to 10+ destinations (databases, data warehouses, data lakes, ...) and remove boilerplate code needed with Airflow. With dbt you can transform data with SQL in your data warehouse and avoid having to handle dependencies between tables in your Airflow DAGs.

References:

Airbyte https://github.com/airbytehq/airbyte

Airflow https://airbyte.io/blog/airflow-etl-pipelines

dbt https://github.com/dbt-labs/dbt-core

dbt implementation at Telegraph https://medium.com/the-telegraph-engineering/dbt-a-new-way-to-handle-data-transformation-at-the-telegraph-868ce3964eb4

No comments:

Post a Comment