from 15:40 to 16:20
Apache Hive is the most commonly used SQL interface for Hadoop. One of its most frequent uses is data warehousing applications. To meet customer warehousing requirements it is important that it scale to petabytes of data, provide the SQL that users need, and perform in interactive time. In February 2015 the Hive community released Hive 2.0 that includes significant new features and performance improvements. These include: adding LLAP, a daemon layer that enables sub-second response time; adding HBase as an option to store Hive’s metadata, resulting in faster metadata access and reduced query planning time; improving Hive’s support for ingesting data at high speed from streaming inputs such as Apache Flume and Apache Storm; improvements in Hive execution on Spark and Tez; improving Hive's integration with Apache Calcite for better cost based optimization. This talk will cover the use cases these changes enable, the architectural changes being made in Hive as part of building these features, and share performance test results on how these improvements are speeding up Hive.