For the latest updates and exclusive contents of the industry’s best AI application, join the daily and weekly newsletter. Learn more
It is a difficult task to move data from various sources to locations suitable for using AI. That’s the same place as data orchestration technology Apache air flow Suitable.
Today, the APACHE AIRFLOW community has received the biggest update in a few years with its 3.0 release debut. The new release shows the first major version update in four years. However, Airflo has steadily increased in the 2.x series, including the 2.9 and 2.10 updates in 2024 and focused on AI.
In recent years, data engineers have actually adopted Apache Airflow as a standard tool. Apache Airflow has become a major open source workflow orchestration platform with more than 3,000 contributors and 500 Fortune 500 companies. There are also several commercial services based on platforms such as Astro, Google Cloud Composer, Amazon Managed Workflow for Apache Airflow (MWAA) and Microsoft Azure Data Factory Managed Airflow.
As the organization strives to adjust the data workflow in the heterogeneous system, cloud and AI workloads, the organization is growing. Apache Airflow 3.0 solves important enterprise demands by designing an organization that can improve the way of building and distributing data applications.
Vikram Koka, a member of the Apache Airflow PMC (Project Management Committe) ASTRONOMER, is a member of the Project Management Committee (Project Management Committe). 3 is a new start, “this is a complete refactor based on what mission critical adoption in the next stage.”
Enterprise data complexity has changed the demand for data orchestration.
As the business depends on data -oriented decisions, the complexity of the data workflow has exploded. Organizations now manage complex pipelines on multiple cloud environments, various data sources and more and more sophisticated AI workloads.
AIRFLOW 3.0 appears as a specially designed solution to meet these evolving corporate demands. Unlike the previous version, this release introduces a distributed client model that offers flexibility and security away from the Monoli package. This new architecture can be done by a company:
- Run work in multiple cloud environments.
- Implement the prize security control.
- Various programming languages support.
- Activate the actual multicloud distribution.
The expansion language support of air flow 3.0 is also interesting. The previous version was mainly python, but the new release basically supports several programming languages.
AIRFLOW 3.0 is set to support Python and provide planned support for Java, TypeScript and Rust. This approach means that it can reduce the friction of workflow development and integration by writing a work in a programming language preferred by data engineers.
Event -oriented functions convert data workflow
The air flow has jumped from traditional placement processing, but companies need increasingly real -time data processing functions. Air Flow 3.0 now supports what you need.
KOKA said, “The main change in air flow 3 is that we call it event -based scheduling.
Instead of running data processing every hour, the air flow is now automatically started when a specific data file is uploaded or a specific message appears. This may include data loaded on Amazon S3 Cloud Storage Bucket or streaming data messages from Apache Kafka.
The event -based scheduling function solves the important gap between the existing ETL. [Extract, Transform and Load] Tools and stream processing frameworks Apache flink or Apache spark structural streamingThe organization allows you to use a single orchestration layer for both reservation and event triggerwork floes.
Air flow accelerates Enterprise AI reasoning and accelerates complex AI.
Event -oriented data orchestrations will help to support the execution of faster inferences with fast air flow.
For example, KOKA details the use of real -time inference in professional services such as legal time tracking. In this scenario, it can help to collect raw data from sources such as calendars, emails and documents using air flow in this scenario. You can use a large language model (LLM) to convert un structured information into structured data. Then you can analyze the structured time tracking data using other pre -trained models, make sure you can claim the task, and then assign appropriate claims and charges.
KOKA mentioned this approach as a complex AI system. A workflow that integrates other AI models and completes complex tasks efficiently and intelligently. AIRFLOW 3.0’s event -based architecture is available for this type of real -time, multi -level reasoning process in various enterprise cases.
Complex AI Berkeley AI study It is centered in 2024 and slightly different from agent AI. KOKA explained that while the agent AI allows autonomous AI decisions, the Compound AI has a more predictable and reliable workflow of business use cases.
Valley play with air flow, Texas Rangers benefits
One of the many users in Airflow is the Texas Rangers Major League Baseball Team.
Oliver Dykstra, a full stack data engineer at Texas Rangers Baseball Club, said VentureBeat used airflow hosted on the astronomy’s ASTRO platform as a ‘NERVE Center’ in baseball data. He mentioned that all player development, contracts, analysis, and game data are adjusted through the air flow.
Dykstra said, “We are looking forward to upgrading to air flow 3, event -based scheduling, observation potential, and data lineage.” Since we already rely on air flow to manage important AI/ML pipelines, the additional efficiency and reliability of air flow 3 is to improve the trust and elasticity of these data products within the entire organization. It will help. “
This is the meaning of adopting enterprise AI
For technical decision makers who evaluate data orchestration strategies, Airflow 3.0 provides executable benefits that can be implemented step by step.
The first step is to evaluate the current data workflow, which will benefit the new event -oriented function. Organizations can identify the data pipeline that triggers the current booked task, but can more efficiently manage the event -based trigger. These changes can significantly reduce the waiting time for processing while eliminating wasteful voting.
Next, the technology leader needs to evaluate the development environment to make sure that the new language support of the air can be integrated with fragmented orchestration tools. The team that maintains a separate orchestration tool for various language environments can start a migration strategy plan to simplify the technology stack.
For companies that lead the way of implementing AI, AIRFLOW 3.0 represents an important infrastructure component that can solve important challenges in AI adoption: adjusts complex multi -level AI workflows of enterprise scale. The ability of the platform to adjust the compound AI system helps the organization to move to the company’s company -wide AI distribution through appropriate governance, security and reliability.