from 18:00 pm to 18:45 pm
The Big data ecosystem is thriving. Driven by the productivity of the open source approach and by the markets, the community is constantly developing new tools, plugins and functionalities to improve and simplify all the aspects of data science. Unfortunately the life of system administrators is increasingly becoming more complicated. From the selection of the right architecture to the installation of the tools, their configuration and testing, a very large amount of time needs to be spent setting up the system. The burden is even more demanding when the infrastructure has high availability and strong security requirements.
For the enterprise this means committing a considerable amount of resources to the maintenance of the systems, which require dedicated Unix sysadmin support.
At Keedio we like simplicity, and in this workshop we shall demonstrate how you can, within minutes, easily deploy a full big data stack which is highly available and secure.
By harnessing the power of Vagrant and Apache Ambari we will deploy in the cloud a big data architecture inclusive of injection, batch, speed and serving layers.
In the architecture the injection layer is responsible for collecting data from different sources and channel them into the other layers of the infrastructure, it can also be used to filter or enrich the streaming data. In this deployment it will include a multitiered and load balanced network of Apache Flume agents connected with Apache Kafka.
The batch layer stores the data redundantly, and it is used by the serving layer for a data analysis that aims at completeness. It includes a Highly Available installation of Apache Hadoop 2.X, Hive and Oozie, which allows to create elaborated data processing workflows.
The speed layer collects the data directly from the injection layer and it is used for a nearly real-time analysis of the streaming data. For this we use Apache Storm and Spark.
The serving layer collects data processed by the speed and batch layer and makes it available for ad-hoc queries and views. Elasticsearch and Kibana will be installed as part of this deployment.
When the first installation will be complete, we will securize the elements of the infrastructure by using Ambari as an orchestrator and FreeIPA as the identity manager and kerberos KDC.
To simplify the user experience, we will include an installation of HUE which provides a unified interface to most of the installed software.All the software components used in this installation are open source, and many have been contributed to the community by the Keedio developers.