Apache Beam is a Top-Level Project

January 11, 2017

Unified programming model for batch and streaming Big Data processing, handling data of any scale, and providing portability across multiple execution engines and environments.

The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Beam™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Beam is a unified programming model for both batch and streaming data processing. It includes software development kits in Java and Python for defining the data processing pipelines, as well as runners to execute them on several execution engines, including Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.

"Graduation is an exciting milestone for Apache Beam," said Davor Bonaci, Vice President of Apache Beam. "Becoming a top-level project is a recognition of the amazing growth of the Apache Beam community, both in terms of size and diversity. Together we are pushing forward the state of the art in distributed data processing and, at the same time, enhancing the ability to interconnect additional storage/messaging systems and execution engines."

The technology behind Apache Beam evolved in large part from Google's internal work on data processing, tracing its roots all the way back to the Google's initial MapReduce system and its fundamental changes to the science of distributed data processing. It also reflects modern advances in data processing, embodied in Google's FlumeJava and MillWheel systems, and culminating with the unified programming model of Google Cloud Dataflow, which became the heart of Apache Beam.

This unified programming model can easily and intuitively express data processing pipelines for everything from simple batch-based data ingestion to complex event-time-based stream processing.

The abstractions in the model are designed to support efficient parallel execution, while also cleanly separating the user's processing logic from details of the underlying engine.

Raising the level of abstraction allows a single Apache Beam pipeline to run, without modification, on multiple execution engines. This portability across diverse execution engines is just one of many extensibility points that let Apache Beam integrate with the broader Apache and Big Data ecosystems. Beside runners, developers can already easily add support for additional IO connectors, libraries of transformations, SDKs, and even domain-specific extensions.

"Apache Beam helps us make stream processing accessible to a broad audience of data engineers, by offering an API which is comprehensive, easy to reason about and at the same time fully decoupled from the underlying execution engine," said Assaf Pinhasi, Director of Big Data Platform at PayPal. "Our data engineers can now focus on what they do best – i.e. express their processing pipelines easily, and not have to worry about how these get translated to the complex underlying engine they run on."

"The graduation of Apache Beam as a top-level project is a great achievement and, in the fast-paced Big Data world we live in, recognition of the importance of a unified, portable, and extensible abstraction framework to build complex batch and streaming data processing pipelines," said Laurent Bride, Chief Technology Officer at Talend. "Customers don't like to be locked-in, so they will appreciate the runtime flexibility Apache Beam provides. With four mature runners already available and I'm sure more to come, Beam represents the future and will be a key
element of Talend's strategic technology stack moving forward."

"We applaud the Apache Beam working group for its success in creating a unified and consistent platform for building portable data processing pipelines," said Fausto Ibarra, Director of Product Management, Google Cloud Platform. "We believe that we all have a responsibility to share what we're learning, and we are proud and delighted to witness the successful collaboration to build not only a powerful programming model for processing data from bounded and unbounded sources, but also a portability layer for running pipelines on many processing engines, including Apache Spark, Apache Flink, Apache Apex, and Google Cloud Dataflow. Apache Beam's graduation to Top Level Project is a well-deserved recognition for the individuals and companies who contributed to the project."

"Apache Beam represents a principled approach for analyzing data streams, simplifying a range of complex data processing concepts and providing developers with a flexible, straightforward model," said Kostas Tzoumas, Co-founder and Chief Executive Officer at data Artisans. "The Apache Flink community wrote one of the first Beam runners, and those of us at data Artisans has been contributing to the Beam project since its inception."

"The Apache Beam community has quickly adapted the Apache Way and been very welcoming to new contributors and ideas. It also encourages communication across other projects that collaborate under the Beam umbrella," said Thomas Weise, Vice President of Apache Apex, and Chief Technology Officer/Co-Founder of Atrato. "Beam helps the wider ecosystem by establishing common terminology and well thought through concepts that reflect in multiple runners and even the native API of the underlying engines."

"In my work at Apache, I have rarely seen an incubating project build a community as well as the Apache Beam project has done," said Ted Dunning, Vice President of Apache Incubator, and Chief Application Architect at MapR Technologies. "The way that they have been able to complement and enhance other streaming data projects is really a credit to everyone involved."

"We'd like to invite you to consider joining us on this exciting ride, whether as a user or a contributor, as we work towards our first release with API stability," added Bonaci. "If you'd like to try out Apache Beam today, check out the latest 0.4.0 release. We welcome contribution and participation from anyone through our mailing lists, issue tracker, pull requests, and events."

Catch Apache Beam in action at numerous face-to-face meetups and conferences, including Apache: Big Data North America 2017, DataWorks Summit and Hadoop Summit Munich 2017, Strata + Hadoop World San Jose and London 2017.

Availability and Oversight
Apache Beam software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases.

Terms of Use | Copyright © 2002 - 2017 CONSTITUENTWORKS SM  CORPORATION. All rights reserved. | Privacy Statement