Apache Beam is a Top-Level Project
programming model for batch and streaming Big Data processing, handling
data of any scale, and providing portability across multiple execution
engines and environments.
The Apache Software Foundation (ASF), the all-volunteer developers,
stewards, and incubators of more than 350 Open Source projects and
initiatives, announced today that Apache® Beam™ has graduated from the
Apache Incubator to become a Top-Level Project (TLP), signifying that
the project's community and products have been well-governed under the
ASF's meritocratic process and principles.
Apache Beam is a unified programming model for both batch and streaming
data processing. It includes software development kits in Java and
Python for defining the data processing pipelines, as well as runners to
execute them on several execution engines, including Apache Apex, Apache
Flink, Apache Spark, and Google Cloud Dataflow.
"Graduation is an exciting milestone for Apache Beam," said Davor Bonaci,
Vice President of Apache Beam. "Becoming a top-level project is a
recognition of the amazing growth of the Apache Beam community, both in
terms of size and diversity. Together we are pushing forward the state
of the art in distributed data processing and, at the same time,
enhancing the ability to interconnect additional storage/messaging
systems and execution engines."
The technology behind Apache Beam evolved in large part from Google's
internal work on data processing, tracing its roots all the way back to
the Google's initial MapReduce system and its fundamental changes to the
science of distributed data processing. It also reflects modern advances
in data processing, embodied in Google's FlumeJava and MillWheel
systems, and culminating with the unified programming model of Google
Cloud Dataflow, which became the heart of Apache Beam.
This unified programming model can easily and intuitively express data
processing pipelines for everything from simple batch-based data
ingestion to complex event-time-based stream processing.
The abstractions in the model are designed to support efficient
parallel execution, while also cleanly separating the user's processing
logic from details of the underlying engine.
Raising the level of abstraction allows a single Apache Beam pipeline to
run, without modification, on multiple execution engines. This
portability across diverse execution engines is just one of many
extensibility points that let Apache Beam integrate with the broader
Apache and Big Data ecosystems. Beside runners, developers can already
easily add support for additional IO connectors, libraries of
transformations, SDKs, and even domain-specific extensions.
"Apache Beam helps us make stream processing accessible to a broad
audience of data engineers, by offering an API which is comprehensive,
easy to reason about and at the same time fully decoupled from the
underlying execution engine," said Assaf Pinhasi, Director of Big Data
Platform at PayPal. "Our data engineers can now focus on what they do
best – i.e. express their processing pipelines easily, and not have to
worry about how these get translated to the complex underlying engine
they run on."
"The graduation of Apache Beam as a top-level project is a great
achievement and, in the fast-paced Big Data world we live in,
recognition of the importance of a unified, portable, and extensible
abstraction framework to build complex batch and streaming data
processing pipelines," said Laurent Bride, Chief Technology Officer at
Talend. "Customers don't like to be locked-in, so they will appreciate
the runtime flexibility Apache Beam provides. With four mature runners
already available and I'm sure more to come, Beam represents the future
and will be a key
element of Talend's strategic technology stack moving forward."
"We applaud the Apache Beam working group for its success in creating a
unified and consistent platform for building portable data processing
pipelines," said Fausto Ibarra, Director of Product Management, Google
Cloud Platform. "We believe that we all have a responsibility to share
what we're learning, and we are proud and delighted to witness the
successful collaboration to build not only a powerful programming model
for processing data from bounded and unbounded sources, but also a
portability layer for running pipelines on many processing engines,
including Apache Spark, Apache Flink, Apache Apex, and Google Cloud
Dataflow. Apache Beam's graduation to Top Level Project is a
well-deserved recognition for the individuals and companies who
contributed to the project."
"Apache Beam represents a principled approach for analyzing data
streams, simplifying a range of complex data processing concepts and
providing developers with a flexible, straightforward model," said
Kostas Tzoumas, Co-founder and Chief Executive Officer at data Artisans.
"The Apache Flink community wrote one of the first Beam runners, and
those of us at data Artisans has been contributing to the Beam project
since its inception."
"The Apache Beam community has quickly adapted the Apache Way and been
very welcoming to new contributors and ideas. It also encourages
communication across other projects that collaborate under the Beam
umbrella," said Thomas Weise, Vice President of Apache Apex, and Chief
Technology Officer/Co-Founder of Atrato. "Beam helps the wider ecosystem
by establishing common terminology and well thought through concepts
that reflect in multiple runners and even the native API of the
my work at Apache, I have rarely seen an incubating project build a
community as well as the Apache Beam project has done," said Ted
Dunning, Vice President of Apache Incubator, and Chief Application
Architect at MapR Technologies. "The way that they have been able to
complement and enhance other streaming data projects is really a credit
to everyone involved."
"We'd like to invite you to consider joining us on this exciting ride,
whether as a user or a contributor, as we work towards our first release
with API stability," added Bonaci. "If you'd like to try out Apache Beam
today, check out the latest 0.4.0 release. We welcome contribution and
participation from anyone through our mailing lists, issue tracker, pull
requests, and events."
Catch Apache Beam in action at numerous face-to-face meetups and
conferences, including Apache: Big Data North America 2017, DataWorks
Summit and Hadoop Summit Munich 2017, Strata + Hadoop World San Jose and
Availability and Oversight
Apache Beam software is released under the Apache License v2.0 and is
overseen by a self-selected team of active contributors to the project.
A Project Management Committee (PMC) guides the Project's day-to-day
operations, including community development and product releases.