Big Data Spain

15th ~ 16th OCT 2015 MADRID, SPAIN #BDS15


THANK YOU FOR AN AMAZING CONFERENCE!


THE 4th EDITION OF BIG DATA IN Oct 2015 WAS A RESOUNDING SUCCESS.

DOCUMENT MODEL FOR HIGH SPEED SPARK PROCESSING

Friday 16th

from 15:30 pm to 16:15 pm

Room 19

-

Technical

Modern architectures are moving away from a "one size fits all" approach. We are well aware that we need to use the best tools for the job. Given the large selection of options available today, chances are that you will end up managing data in MongoDB for your operational workload and with Spark for your high speed data processing needs.

Read more

Description: When we model documents or data structures there are some key aspects that need to be examined not only for functional and architectural purposes but also to take into consideration the distribution of data nodes, streaming capabilities, aggregation and queryability options and how we can integrate the different data processing software, like Spark, that can benefit from subtle but substantial model changes. A clear example is when embedding or referencing documents and their implications on high speed processing.

Over the course of this talk we will detail the benefits of a good document model for the operational workload. As well as what type of transformations we should incorporate in our document model to adjust for the high speed processing capabilities of Spark.

We will look into the different options that we have to connect these two different systems, how to model according to different workloads, what kind of operators we need to be aware of for top performance and what kind of design and architectures we should put in place to make sure that all of these systems work well together.

Over the course of the talk we will showcase different libraries that enable the integration between spark and MongoDB, such as MongoDB Hadoop Connector, Stratio Connector and MongoDB Spark Native Connector.

By the end of the talk I expect the attendees to have an understanding of:

  • How they connect their MongoDB clusters with Spark
  • Which use cases show a net benefit for connecting these two systems
  • What kind of architecture design should be considered for making the most of Spark + MongoDB
  • How documents can be modeled for better performance and operational process, while processing these data sets stored in MongoDB.

The talk is suitable for:

  • Developers that want to understand how to leverage Spark
  • Architects that want to integrate their existing MongoDB cluster and have real time high speed processing needs
  • Data scientists that know about Spark, are playing with Spark and want to integrate with MongoDB for their persistency layer
Norberto Leite foto

Norberto Leite

MongoDBTechnical Evangelist