from 15:30 pm to 16:15 pm
Modern architectures are moving away from a "one size fits all" approach. We are well aware that we need to use the best tools for the job. Given the large selection of options available today, chances are that you will end up managing data in MongoDB for your operational workload and with Spark for your high speed data processing needs.
Description: When we model documents or data structures there are some key aspects that need to be examined not only for functional and architectural purposes but also to take into consideration the distribution of data nodes, streaming capabilities, aggregation and queryability options and how we can integrate the different data processing software, like Spark, that can benefit from subtle but substantial model changes. A clear example is when embedding or referencing documents and their implications on high speed processing.
Over the course of this talk we will detail the benefits of a good document model for the operational workload. As well as what type of transformations we should incorporate in our document model to adjust for the high speed processing capabilities of Spark.
We will look into the different options that we have to connect these two different systems, how to model according to different workloads, what kind of operators we need to be aware of for top performance and what kind of design and architectures we should put in place to make sure that all of these systems work well together.
Over the course of the talk we will showcase different libraries that enable the integration between spark and MongoDB, such as MongoDB Hadoop Connector, Stratio Connector and MongoDB Spark Native Connector.
By the end of the talk I expect the attendees to have an understanding of:
The talk is suitable for: