Airflow simply wasn’t not built for infinitely-running event-based workflows. In contrast, streaming jobs are endless you create your pipelines and then they run constantly, reading events as they emanate from the source. Workflows are expected to be mostly static or infrequently changing. It operates strictly in the context of batch processes: a series of finite tasks with clearly-defined start and end tasks, to run at certain intervals or when prompted by trigger-based sensors (such as successful completion of a previous job). Streaming pipelines use event-based triggers.Īirflow doesn’t manage event-based jobs.Batch jobs (and Airflow) rely on time-based scheduling.The scheduling process is fundamentally different between batches and streams: What to Know About Apache Airflow Before you Get Started Airflow is not a Streaming Data SolutionĪirflow “is a batch orchestration workflow platform.” It is not a streaming data solution. Let’s review the most significant caveats when adding Airflow to your data stack. These require a separate manual effort, but they are essential for every pipeline.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |