Building a Modern Data Pipeline with Google Cloud
Stream analytics has emerged as a simpler, faster alternative to batch ETL for getting maximum value from user-interaction events and application and machine logs.
Ingesting, processing, and analyzing these data streams quickly and efficiently is critical in fraud detection, clickstream analysis, and online recommendations, among many examples. For such use cases, Google Cloud offers an integrated and open stream analytics solution that is easy to adopt, scale, and manage. This session is designed to explore the services that make this solution possible, leveraging the 15+ years of experience in Google at tackling data problems at scale, with a hands-on lab where attendees will deploy, configure and monitor.
Once your streaming data processing pipelines are deployed, GCP’s serverless approach removes operational overhead with performance, scaling, availability, security and compliance handled automatically.
Integration with Stackdriver, GCP’s unified logging and monitoring solution, lets you monitor and troubleshoot your pipelines as they are running. Rich visualization, logging, and advanced alerting help you identify and respond to potential issues.
In this session you will learn how to:
· Provision and consume storage buckets in Google Cloud Storage.
· Provision a test project with Firebase, set up an example application to generate data, and deploy the application code in production.
· Ingest streaming events in real time from anywhere in the world with Cloud Pub/Sub, powered by Google's unique, high-speed private network.
· Process the streams with Cloud Dataflow to ensure reliable, exactly-once, low-latency data transformation.
· Stream the transformed data into BigQuery, the cloud-native data warehousing service, for immediate analysis via SQL or popular visualization tools.
· Export the data into an open format like Avro for further exploration.
· Monitor your pipeline through Stackdriver.
Notebook with Google Chrome browser installed.
Nature of the training
Building A Modern Data Pipeline with Google Cloud
Data pipelines are traditionally challenging to build due to:
· Data volume, variety and velocity
· Exploration and processing speed
· Scalability, storage and access requirements
· Data exploration, insight extraction and visualization limitations
In this session we will build and end-to-end pipeline to ingest streaming analytics coming from an app, with the use of several Google Cloud Platform services such as Cloud Functions, Cloud Pub / Sub, Cloud Dataflow and BigQuery, to discover the potential of Google's serverless Data Platform and learn how to architect and deploy a modern data pipeline.
Big Data Spain will issue the certificate for this course to prove subject matter competency.
· Data engineers
· Cloud architects.
Bio of the instructor - Israel Herraiz
Israel Herraiz is a Strategic Cloud Engineer at Google. He has worked in different data science roles at BBVA Data & Analytics and Amadeus. He holds a PhD in Computer Science from Universidad Rey Juan Carlos (2008) and has been a visiting researcher at universities in Europe, Canada and the United States. In a prior life, he was an assistant professor at Universidad Politécnica de Madrid, where he carried out research applying data science to the study of software development and the phenomenon of open source.