Introducing Apache Airflow model 2.6.3 assist on Amazon MWAA


Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that makes it easy to arrange and function end-to-end knowledge pipelines within the cloud. Trusted throughout numerous industries, Amazon MWAA helps organizations like Siemens, ENGIE, and Selection Inns Worldwide improve and scale their enterprise workflows, whereas considerably bettering safety and lowering infrastructure administration overhead.

Right now, we’re saying the provision of Apache Airflow model 2.6.3 environments. In the event you’re at the moment operating Apache Airflow model 2.x, you possibly can seamlessly improve to v2.6.3 utilizing in-place model upgrades, thereby retaining your workflow run historical past and surroundings configurations.

On this put up, we delve into a number of the new options and capabilities of Apache Airflow v2.6.3 and how one can arrange or improve your Amazon MWAA surroundings to accommodate this model as you orchestrate your workflows within the cloud at scale.

New function: Notifiers

Airflow now provides you an environment friendly approach to create reusable and standardized notifications to deal with systemic errors and failures. Notifiers introduce a brand new object in Airflow, designed to be an extensible layer for including notifications to DAGs. This framework can ship messages to exterior methods when a activity occasion or a person DAG run modifications its state. You’ll be able to construct notification logic from a brand new base object and name it immediately out of your DAG information. The BaseNotifier is an summary class that gives a primary construction for sending notifications in Airflow utilizing the varied on_*__callback. It’s supposed for suppliers to increase and customise this for his or her particular wants.

Utilizing this framework, you possibly can construct customized notification logic immediately inside your DAG information. For example, notifications will be despatched by way of e-mail, Slack, or Amazon Easy Notification Service (Amazon SNS) primarily based on the state of a DAG (on_failure, on_success, and so forth). You may also create your individual customized notifier that updates an API or posts a file to your storage system of selection.

For particulars on find out how to create and use a notifier, confer with Making a notifier.

New function: Managing duties caught in a queued state

Apache Airflow v2.6.3 brings a big enchancment to handle the long-standing challenge of duties getting caught within the queued state when utilizing the CeleryExecutor. In a typical Apache Airflow workflow, duties progress by way of a lifecycle, transferring from the scheduled state to the queued state, and finally to the operating state. Nevertheless, duties can often stay within the queued state longer than anticipated because of communication points among the many scheduler, the executor, and the employee. In Amazon MWAA, prospects have skilled such duties being queued for as much as 12 hours because of the way it makes use of the native integration of Amazon Easy Queue Service (Amazon SQS) with CeleryExecutor.

To mitigate this challenge, Apache Airflow v2.6.3 launched a mechanism that checks the Airflow database for duties which have remained within the queued state past a specified timeout, defaulting to 600 seconds. This default will be modified utilizing the surroundings configuration parameter scheduler.task_queued_timeout. The system then retries such duties if retries are nonetheless out there or fails them in any other case, guaranteeing that your knowledge pipelines proceed to operate easily.

Notably, this replace deprecates the beforehand used celery.stalled_task_timeout and celery.task_adoption_timeout settings, and consolidates their functionalities right into a single configuration, scheduler.task_queued_timeout. This allows more practical administration of duties that stay within the queued state. Operators also can configure scheduler.task_queued_timeout_check_interval, which controls how often the system checks for duties which have stayed within the queued state past the outlined timeout.

For particulars on find out how to use task_queued_timeout, confer with the official Airflow documentation.

New function: A brand new steady timetable and assist for steady schedule

With prior variations of Airflow, to run a DAG repeatedly in a loop, you had to make use of the TriggerDagRunOperator to rerun the DAG after the final activity is completed. With Apache Airflow v2.6.3, now you can run a DAG repeatedly with a predefined timetable. The simplifies scheduling for continuous DAG runs. The brand new ContinuousTimetable assemble will create one steady DAG run, respecting start_date and end_date, with the brand new run beginning as quickly because the earlier run has accomplished, no matter whether or not the earlier run has succeeded or failed. Utilizing a steady timetable is particularly helpful when sensors are used to attend for extremely irregular occasions from exterior knowledge instruments.

You’ll be able to certain the diploma of parallelism to make sure that just one DAG is operating at any given time with the max_active_runs parameter:

@dag(
    start_date=datetime(2023, 5, 9),
    schedule="@steady",
    max_active_runs=1,  
    catchup=False,
)

New function: Set off the DAG UI extension with versatile person kind idea

Previous to Apache Airflow v2.6.3, you might present parameters in JSON construction by way of the Airflow UI for customized workflow runs. You needed to mannequin, verify, and perceive the JSON and enter parameters manually with out the choice to validate them earlier than triggering a workflow. With Apache Airflow v2.6.3, whenever you select Set off DAG w/ config, a set off UI kind is rendered primarily based on the predefined DAG Params. In your advert hoc, testing, or customized runs, this simplifies the DAG’s parameter entry. If the DAG has no parameters outlined, a JSON entry masks is proven. The shape parts will be outlined with the Param class and attributes outline how a kind discipline is displayed.

For an instance DAG the next kind is generated by DAG Params.

Set Up a New Apache Airflow v2.6.3 Atmosphere

You’ll be able to arrange a brand new Apache Airflow v2.6.3 surroundings in your account and most well-liked Area utilizing the AWS Administration Console, API, or AWS Command Line Interface (AWS CLI). In the event you’re adopting infrastructure as code (IaC), you possibly can automate the setup utilizing both AWS CloudFormation, the AWS Cloud Improvement Equipment (AWS CDK), or Terraform scripts.

When you have got efficiently created an Apache Airflow v2.6.3 surroundings in Amazon MWAA, the next packages are routinely put in on the scheduler and employee nodes together with different supplier packages:

apache-airflow-providers-amazon==8.2.0

python==3.10.8

For a whole listing of supplier packages put in, confer with Apache Airflow supplier packages put in on Amazon MWAA environments.

Improve from older variations of Apache Airflow to Apache Airflow v2.6.3

You’ll be able to carry out in-place model upgrades of your present Amazon MWAA environments to replace your older Apache Airflow v2.x-based environments to v2.6.3. To be taught extra about in-place model upgrades, confer with Upgrading the Apache Airflow model or Introducing in-place model upgrades with Amazon MWAA.

Conclusion

On this put up, we talked about a number of the new options of Apache Airflow v2.6.3 and how one can get began utilizing them in Amazon MWAA. Check out these new options like notifiers and steady timetables, and different enhancements to enhance your knowledge orchestration pipelines.

For extra particulars and code examples on Amazon MWAA, go to the Amazon MWAA Consumer Information  and the Amazon MWAA examples GitHub repo.

Apache, Apache Airflow, and Airflow are both registered logos or logos of the Apache Software program Basis in the USA and/or different nations.


In regards to the Authors

Hernan Garcia is a Senior Options Architect at AWS, primarily based out of Amsterdam, working within the Monetary Companies Business since 2018. He makes a speciality of utility modernization and helps his prospects within the adoption of cloud working fashions and serverless applied sciences.

Parnab Basak is a Options Architect and a Serverless Specialist at AWS. He makes a speciality of creating new options which can be cloud native utilizing fashionable software program improvement practices like serverless, DevOps, and analytics. Parnab works carefully within the analytics and integration providers house serving to prospects undertake AWS providers for his or her workflow orchestration wants.

Shubham Mehta is an skilled product supervisor with over eight years of expertise and a confirmed monitor report of delivering profitable merchandise. In his present function as a Senior Product Supervisor at AWS, he oversees Amazon Managed Workflows for Apache Airflow (Amazon MWAA) and spearheads the Apache Airflow open-source contributions to additional improve the product’s performance.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles