Cloud Data Platform Engineering with Azure Databricks

NL/EN

Intermediate

2 days

Leiden

€ 1395 excl. BTW

Description

The Data Platform Engineering with Azure Databricks course is designed to provide participants with a comprehensive understanding of how to leverage Azure Databricks to build scalable and efficient data platforms. Throughout this course, participants will gain hands-on experience in designing, implementing, and managing data engineering workflows using Azure Databricks. From data ingestion and transformation to advanced analytics and machine learning, participants will explore the key features and best practices of Azure Databricks for building robust data platforms.

The course follows a hands-on approach, participants engage in live coding sessions and have access to all course materials upon completion of the course. The course is taught using the Microsoft Azure cloud platform and Databricks, the engineering principles are very much applicable to other cloud platforms and tooling.

Target audience

This course is designed for professionals working with data daily who want to understand what it takes to build a data platform in the cloud. This course is perfect for all:

data engineers and data architects working on on-premises systems who want to learn about cloud platforms
software developers working in the data domain
BI developers and data scientists who want to learn more about the full stack of data systems.

Prerequisites:

- Proficiency in Python programming language and SQL

- Basic understanding of data engineering principles and technologies

- Basic understanding of DevOps principles

Topics covered

Topic 1: Introduction to Azure Databricks

Overview of Azure Databricks and its role in data platform engineering
Understanding the architecture and components of Azure Databricks
Databricks dataplatform architecture, Bronze, Silver and Gold layers
Exploring the Azure Databricks workspace and notebooks

Topic 2: Data Ingestion

Orchestration, Azure Datafactory or Databricks
Overview of data ingestion techniques with Azure Databricks
Ingestion with Databricks or Azure Data Factory
Integrating Azure Databricks with data sources like Azure Storage, Azure Data Lake, and more
Configuring delta loads

Topic 3: Data Engineering Workflows

Designing and implementing end-to-end data engineering workflows with Azure Databricks
Building scalable ETL (Extract, Transform, Load) processes
A discussion about different data source characteristics
A discussion about the difference between data cleaning and applying business logic
Performing data preparation and transformation using Spark DataFrame API
Handling slowly changing dimensions
ACID transactions and the delta log

Topic 4: Scheduling and monitoring

Managing data pipelines and scheduling jobs in Azure Databricks
Monitor data pipelines using Azure Log Analytics and Logic apps
Strategies for resolving schema drift

Topic 5: Packaging your pipelines, Deployment and Integration

Deployment strategies for Azure Databricks workspaces
Integrating Azure Databricks with other Azure services like Azure Data Factory, Azure Synapse Analytics, etc.
Continuous integration and deployment (CI/CD) workflows with Azure DevOps and Azure Databricks
Unit tests for your solutions

Topic 6: Delivering analytics to the business

Dimensional modelling in Spark SQL
Leveraging slowly changing dimensions type 2 in your gold layer
Presenting your data using Power BI

Note: The course outline provided above is a general guideline and can be customized or expanded based on specific requirements or audience needs.

No (suitable) date available? Or do you want to schedule this training as an in-company training? Contact us!

About the trainer

Bram Durieux

Bram is a seasoned data professional and open-source software enthusiast with over 15 years of experience in the business as Cloud platform engineer, systems architect, data scientist and engineer and data systems specialist on platforms such as the Microsoft ecosystem and Oracle SQL servers. Bram is driven to make high performance ETL pipelines and to provide value to the business from leveraging data.

In his role as lead engineer, Bram has educated and guided starting data engineers and scientists in their journey to becoming a well-respected data professional.

FAQ

In what language is the training?

The training can be given in Dutch or English, depending on the language of the participants.

What should I bring with me to the training session(s)?

You will need to bring your own laptop with the necessary development environment set up to participate in the coding exercises and projects.