Apache Airflow is an open source tool that can be used to programmatically author, schedule and monitor data pipelines using Python and SQL. Created at Airbnb as an open-source project in 2014, Airflow was brought into the Apache Software Foundation’s Incubator Program 2016 and announced as Top-Level Apache Project in 2019. It is used as a data orchestration solution, with over 140 integrations and community support.
N/A
HashiCorp Nomad
Score 9.6 out of 10
N/A
Nomad, from HashiCorp, is presented as a simple, flexible, and production-grade workload orchestrator that enables organizations to deploy, manage, and scale any application, containerized, legacy or batch jobs, across multiple regions, on private and public clouds. Nomad's workload support enables an organization to run containerized, non containerized, and batch applications through a single workflow. Nomad is available open source, or via a supported enterprise plan.
For a quick job scanning of status and deep-diving into job issues, details, and flows, AirFlow does a good job. No fuss, no muss. The low learning curve as the UI is very straightforward, and navigating it will be familiar after spending some time using it. Our requirements are pretty simple. Job scheduler, workflows, and monitoring. The jobs we run are >100, but still is a lot to review and troubleshoot when jobs don't run. So when managing large jobs, AirFlow dated UI can be a bit of a drawback.
Nomad is well suited for organizations who wish to tackle the problem of cloud computing with as little opinion as possible. Where competing tools like Kubernetes limit the concept of "batteries included," Nomad relies on engineers understanding the missing components and filling them in as necessary. The benefit of Nomad is the ability to build a system out of small pieces with the cost of having more complexity at a system level compared to alternatives.
Nomad only handles one part of a full platform. Expertise and vision are required in implementing an entire system that is functional enough for an organization to rely on. This includes other tools to handle things like secrets, service discovery, network routing, etc.
Nomad is delayed in some modern functionality, like features for service-mesh and open tracing. These features are on the tool's roadmap, but there's currently no native support. These paradigms can be established still, but require more expertise outside of Nomad itself.
Nomad is not the leading tool for this space, and as such risks being left behind by tools with much greater support, such as Kubernetes.
There are a number of reasons to choose Apache Airflow over other similar platforms- Integrations—ready-to-use operators allow you to integrate Airflow with cloud platforms (Google, AWS, Azure, etc) Apache Airflow helps with backups and other DevOps tasks, such as submitting a Spark job and storing the resulting data on a Hadoop cluster It has machine learning model training, such as triggering a Sage maker job.
Nomad's primary competitor is Kubernetes, specifically its scheduling component. Kubernetes is a much more complete system that will handle more things than job scheduling, including service discovery, secrets management, and service routing. There also exists a much larger community support for Kubernetes vs Nomad. One might say Kubernetes is the safer choice between the two. Kubernetes is the complete "operating system" for cloud computing, but with it includes complexities that are "Kubernetes" specific. The decision really comes down to a mindset of monolith vs components. With Kubernetes, I would argue you choose the entire system as a whole. With Nomad, you design your system piece by piece. There is no wrong answer.
Nomad has allowed our organization to deploy quicker and more frequently with a lower failure rate.
Nomad has brought in consistency from an operations perspective.
Nomad's performance allows us to scale infinitely while providing functionality that reduces mean time to repair (canary deploys, versioning, rollbacks, etc).