Big Data Spain

17th ~ 18th NOV 2016 MADRID, SPAIN #BDS16

Why Apache Flink is better than Spark

Friday 18th

from 12:55 to 13:35

Theatre 18



Life does not happen in batches. Reality is a continuous stream of events. Therefore, many of the scenarios and systems we want to analyze (logs, machine data, web clicks, sensors, IoT, GPS, social networking, etc.) happen in real-time. Streaming processing is a current key trend for Big Data technologies as it opens a new scope of business opportunities in many domains such as fraud detection, user behavior analysis, or custom monitoring. This emerging processing paradigm is the focus of Apache Flink. Flink is an open source scalable distributed processing engine for massive data, with a clear focus on streaming data. Apache Flink is definitely the most suitable technology to deal with event processing requirements, with remarkable advantages if we compare to its competitors (Spark Streaming, Storm, Samza, Apex, etc.). A common question is: Is Flink better than Spark? In this talk we will show using specific examples why Apache Flink is a more appropriate technology for processing streaming by comparing the requirements of streaming scenarios and the features of both technologies. We will discuss different topics including event-at-time vs micro-batching, windows aggregation models (time, elements sessions), management track of time streaming, or versioning applications. As conclusion, we will summarize the similarities and differences between the two technologies and will prove some advices to decide which is the most appropriate depending on the use case.

Rubén Casado foto

Rubén Casado

Accenture DigitalBig Data Manager