Big Data Spain

15th ~ 16th OCT 2015 MADRID, SPAIN #BDS15


THANK YOU FOR AN AMAZING CONFERENCE!


THE 4th EDITION OF BIG DATA IN Oct 2015 WAS A RESOUNDING SUCCESS.

AUTOMATING BIG DATA BENCHMARKING AND PERFORMANCE ANALYSIS WITH ALOJA'S OPEN SOURCE TOOLS

Thursday 15th

from 18:30 pm to 19:15 pm

Room 19

-

Workshop

Automating Big Data Benchmarking and Performance Analysis workshop will give a hands-on experience on the different aspects getting the most value of Big Data infrastructures using ALOJA's open source tools. ALOJA (http://aloja.bsc.es), is a research initiative from the Barcelona Supercomputing Center (BSC) and Microsoft Research to explore new cost-effective hardware architectures and applications for Big Data. ALOJA's main goal and intent is to better understand the performance, therefore the costs of running different Big Data applications. As well as to automate Knowledge Discovery (KD) from system behavior, to produce insights that can optimize and guide the development of efficient Big Data applications and data centers.

Read more

During its first year, ALOJA's benchmarking efforts have produced the largest public repository with over 50,000 Hadoop benchmark runs. The searchable repository features different applications for Hadoop, software configurations, data sizes, and more than 100 different hardware deployment options. The studied deployments include several physical server and VM types, cluster sizes, network, and disk setups of Cloud services (IaaS and PasS) and on-premise hardware for comparison including commodity, low-power, and up-scale. Along with the repository, ALOJA provides open source Web Analytics and Machine Learning based tools for the analysis and characterization of results. The Web tools offer both a fine-grain view of runs, as well as a high-level glance of aggregate results, and Predictive Analytics estimations and recommendations of configurations.

Using a combination of slides and online demo, the workshop will guide Big Data practitioners first over the benchmark repository, where users can quickly search for already performed benchmarks that resemble their infrastructures. In this way, users can search for best configurations avoiding the need of time consuming benchmarking.

Followed, the workshop will cover the following topics:

  • Cluster definition and automated deployment on local (vagrant clusters) and cloud environments
  • Automating and orchestrating OS, Hadoop, JVM configuration across clusters
  • Benchmark selection and iteration of configurations
  • Metrics collections, results gathering, and importing
  • Advanced data views for aggregate results with filters

The workshop will end with an overview of the Predictive Analytics features presenting briefly how to model Hadoop application and predict expected execution times with some use cases for participants. As well as how we leverage the generated models to provide recommendations for both software and hardware configurations. To finalize, the talk will present how we use the predictive features to guide new benchmarking effort to cover a search space of millions of configuration options, of which each can take hours to execute, saving in time and costs.

For more information on the project, publications, and past results please refer to: http://aloja.bsc.es/publications Souce code and instructions at: https://github.com/Aloja/aloja

Nicolas Poggi foto

Nicolas Poggi

Supercomputing CenterR&D