SEARCH FINANCIAL SERVICES INFRASTRUCTURE SECURITY SCIENCE INTERVIEWS

 

     

Apache Impala is Top-Level Project

November 28, 2017

The Apache Impala has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Impala is a modern, high-performance analytic database for Apache Hadoop. The massively parallel processing (MPP) SQL query engine allows for analytical queries on data stored on-premises (in HDFS or Apache Kudu) or in Cloud object storage via SQL or business intelligence tools without having to migrate data sets into specialized systems or proprietary formats.

"The Impala project has grown a lot since we entered incubation in December 2015," said Jim Apple, Vice President of Apache Impala. "With the help of our mentors and the Incubator, we have grown as a community and adopted the Apache Way, all while the Impala contributors have helped make Impala more stable and performant."

In addition to using the same unified storage platform as other Hadoop components, Impala also uses the same metadata, SQL syntax (Apache Hive SQL), ODBC driver, and user interface (Impala query UI in Hue) as Hive. This provides a familiar and unified platform for real-time or batch-oriented queries.

Impala provides:

A familiar SQL interface that data scientists and analysts already know;

The ability to query high volumes of data (Big Data) in Apache Hadoop;

Distributed queries in a cluster environment, for convenient scaling and to make use of cost-effective commodity hardware;

The ability to share data files between different components with no copy or export/import step; for example, to write with Apache Pig, transform with Hive and query with Impala. Impala can read from and write to Hive tables, enabling simple data interchange using Impala for analytics on Hive-produced data; and

A single system for big data processing and analytics, so customers can avoid costly modeling and ETL just for analytics.

Impala was inspired by Google's F1 database, which also separates query processing from storage management. It was originally released in 2012 and entered the Apache Incubator in December 2015. The project has had four releases during its incubation process.

"In 2011, we started development of Impala in order to make state-of-the-art SQL analytics available to the user community as open-source technology," said Marcel Kornacker, original founder of the Impala project. "The graduation to an Apache Top-Level Project is a recognition of the exceptional developer community that stands behind this project."

Apache Impala is deployed across a number of industries such as financial services, healthcare, and telecommunications, and is in use at companies that include Caterpillar, Cox Automotive, Jobrapido, Marketing Associates, the New York Stock Exchange, phData, and Quest Diagnostics. In addition, Impala is shipped by Cloudera, MapR, and Oracle.

"Apache Impala is our interactive SQL tool of choice. Over 30 phData customers have it deployed to production," said Brock Noland, Chief Architect at phData. "Combined with Apache Kudu for real-time storage, Impala has made architecting IoT and Data Warehousing use-cases dead simple. We can deploy more production use-cases with fewer people, delivering increased value to our customers. We're excited to see Impala graduate to a top-level project and look forward to contributing to its success."

"We use Apache Impala to boost performance of our SQL queries against our data lake," said Matteo Coloberti, Head of Analytics at Jobrapido. "Impala is an incredible service that gives us impressive performance on queries."

"We used to distribute Microsoft Excel reports to clients every one or two days but now they can search on their own by customer, sales deal, or even service type," said Andy Frey, CTO of Marketing Associates. "Apache Impala is used to query millions of rows to identify specific records that match the clients' criteria. We've even given clients a 'Query Hadoop' option that allows them to create simple SQL statements and query Hadoop directly via Impala. We're able to offer a faster, richer, and more accurate selection of services without the labor or latency concerns that we used to have."

"The Apache Impala community is growing, and we welcome new contributors to join in our efforts in our code, documentation, issue tracker, and discussion forums," added Apple.

Terms of Use | Copyright 2002 - 2017 CONSTITUENTWORKS SM  CORPORATION. All rights reserved. | Privacy Statement