Cloud Data Platform Engineering with Azure Databricks
Description
The Data Platform Engineering with Azure Databricks course is designed to provide participants with a comprehensive understanding of how to leverage Azure Databricks to build scalable and efficient data platforms. Throughout this course, participants will gain hands-on experience in designing, implementing, and managing data engineering workflows using Azure Databricks. From data ingestion and transformation to advanced analytics and machine learning, participants will explore the key features and best practices of Azure Databricks for building robust data platforms.
The course follows a hands-on approach, participants engage in live coding sessions and have access to all course materials upon completion of the course. The course is taught using the Microsoft Azure cloud platform and Databricks, the engineering principles are very much applicable to other cloud platforms and tooling.
Target audience
This course is designed for professionals working with data daily who want to understand what it takes to build a data platform in the cloud. This course is perfect for all:
- data engineers and data architects working on on-premises systems who want to learn about cloud platforms
- software developers working in the data domain
- BI developers and data scientists who want to learn more about the full stack of data systems.
Prerequisites:
- Proficiency in Python programming language and SQL
- Basic understanding of data engineering principles and technologies
- Basic understanding of DevOps principles
Topics covered
Topic 1: Introduction to Azure Databricks
- Overview of Azure Databricks and its role in data platform engineering
- Understanding the architecture and components of Azure Databricks
- Databricks dataplatform architecture, Bronze, Silver and Gold layers
- Exploring the Azure Databricks workspace and notebooks
Topic 2: Data Ingestion
- Orchestration, Azure Datafactory or Databricks
- Overview of data ingestion techniques with Azure Databricks
- Ingestion with Databricks or Azure Data Factory
- Integrating Azure Databricks with data sources like Azure Storage, Azure Data Lake, and more
- Configuring delta loads
Topic 3: Data Engineering Workflows
- Designing and implementing end-to-end data engineering workflows with Azure Databricks
- Building scalable ETL (Extract, Transform, Load) processes
- A discussion about different data source characteristics
- A discussion about the difference between data cleaning and applying business logic
- Performing data preparation and transformation using Spark DataFrame API
- Handling slowly changing dimensions
- ACID transactions and the delta log
Topic 4: Scheduling and monitoring
- Managing data pipelines and scheduling jobs in Azure Databricks
- Monitor data pipelines using Azure Log Analytics and Logic apps
- Strategies for resolving schema drift
Topic 5: Packaging your pipelines, Deployment and Integration
- Deployment strategies for Azure Databricks workspaces
- Integrating Azure Databricks with other Azure services like Azure Data Factory, Azure Synapse Analytics, etc.
- Continuous integration and deployment (CI/CD) workflows with Azure DevOps and Azure Databricks
- Unit tests for your solutions
Topic 6: Delivering analytics to the business
- Dimensional modelling in Spark SQL
- Leveraging slowly changing dimensions type 2 in your gold layer
- Presenting your data using Power BI
Note: The course outline provided above is a general guideline and can be customized or expanded based on specific requirements or audience needs.
No (suitable) date available? Or do you want to schedule this training as an in-company training? Contact us!
About the trainer
Bram is a seasoned data professional and open-source software enthusiast with over 15 years of experience in the business as Cloud platform engineer, systems architect, data scientist and engineer and data systems specialist on platforms such as the Microsoft ecosystem and Oracle SQL servers. Bram is driven to make high performance ETL pipelines and to provide value to the business from leveraging data.
In his role as lead engineer, Bram has educated and guided starting data engineers and scientists in their journey to becoming a well-respected data professional.
FAQ
We are currently planning new trainings. Do you want to be updated of new training dates? Sign up via this form.
The training course will be held at our office in Leiden, Dellaertweg 9-E, next to Leiden Central Station. Parking can be in the surrounding parking garages at walking distance from the office. You can also ask for a custom training at your own location when you have multiple colleagues that want to follow the training. Contact us for possibilities.
The training will be held between 9:00-17:00, but exact details will be communicated well before the start of the training. Lunch and drinks are included.
The training can be given in Dutch or English, depending on the language of the participants.
You will need to bring your own laptop with the necessary development environment set up to participate in the coding exercises and projects.
Participants will have access to our Slack community, where they can stay in touch with each other and seek clarifications or assistance with any questions that arise after the training.
If you find yourself unable to attend the course after registering, don't worry! We understand that unforeseen circumstances can arise. Until 14 days before the training starts, you can get a refund. After that, you have the option to reschedule your participation with another course date. To reschedule, please reach out to academy@fresh-minds.nl. Kindly note that rescheduling is subject to availability and the terms and conditions of our rescheduling policy.