← Back to the schedule

Data Lakes for Financial Entities

Calendar icon

Wednesday 14th

Time icon

14:05 | 14:45

Location icon

Theatre 20


Keywords defining the session:

- Data Lake

- Financial Entities

- Big Data

Takeaway points of the session:

- Attendees will learn about the different architecture and technology alternatives for designing and developing a Data Lake solution.

- They will also learn about implementing a Data Lake in a way that provides quick results.


During the past year, we have been doing some market research regarding the current state of big data architectures in financial entities in 3 geographies: the USA, Europe, and Latam. The outcome has been quite surprising: even though most financial entities are immersed in a data lake project, there are no relevant success cases.

Big Data solutions can create tremendous competitive advantages in a financial entity. From soothing regulatory reporting to understanding customer behavior or having a better global management framework. Data Lake architectures are the best fit for a financial entity environment.

Most banks are taking a wrong approach to implementing Big Data Solutions. These are the most common mistakes:
– Not understanding the benefits, which leads to little support and implication from the corporation.
– Focusing on the raw layer and the irrigation process, which leads to becoming data swamps.
– Trying to do it all at once, which leads to extremely long projects without quick-wins.
– Approaching a new technology without the know-how.

The talk is meant to provide a quick but deep overview of Data Lake architectures for banking and other financial institutions. It will help understand the different architectures and technologies that may be applied, the best way to approach the implementation project so that it can deliver quick results, and how to get the best out of these technologies once they are in place.

The talk is divided into 5 parts:
1. Data Lake Architectures
Data Lakes don’t have a common global architecture. Instead, there are many ways to implement different layers, ingestion processes,…

We will go over the different layers of the Data Lake and their use:
– Common or usual layers, such as the raw layer and the curation layer.
– Optional layers, with close attention to the metadata.
– Advanced architectures, such as a speed-layer and where machine learning an AI fits in the architecture.

2. Technologies
We will discuss the various solutions that currently exist in the market, going into the detail and viewing the benefits of each option:
– Cloud vs. local infrastructures. Analyzing tools provided by the main vendors: AWS and Google Cloud.
– Serverless infrastructures.
– Alternatives for each of the different layers: HDFS, S3, Spark, Cassandra, MongoDB, Neo4J, Tableau, Qlik.

3. Development Methodologies
We won’t go into much detail about the methodologies options. However, we will see how agile methodologies can benefit the implementation process as well as how to develop using short waterfalls that can deliver quick results.
Banks usually have a hard time with agile methodologies and tend to apply long waterfall solutions. This part of the talk will be centered around how to apply waterfall methodologies based on building blocks for a faster return on investment.

4. Priorities
Banks have many, diverse, and heterogeneous systems which makes them a perfect scenario for Big Data architectures, but extremely hard implementations.
We will go over what the main priorities should be:
– Having information reconciled with the GL
– Having common management information (MIS, ALM)
– Adding “persons” information (Credit Risk)
– Adding CRM, Marketing, and other relevant information.

5. How to take advantage of a data lake solution once it’s up and running
We will examine the best way to make the most out of the system. How data scientists within each department can provide immediate results and how to set up a team for implementing continuous requirements.