Basics To Advanced: Azure Synapse Analytics Hands-On Project

Basics To Advanced: Azure Synapse Analytics Hands-On Project
Published 8/2023
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 6.86 GB | Duration: 18h 40m

Build complete project only with Azure Synapse Analytics focused on PySpark includes delta lake and spark Optimizations

What you'll learn

Understand Azure Synapse Analytics Services Practically

Complete basic to advanced understanding on Azure Synapse Analytics

Gain hands-on experience in applying Spark optimization techniques to real-world scenarios, achieving faster insights.

Understand 50+ most commonly used PySpark Transformations

Acquire a comprehensive library of 45+ PySpark notebooks for data cleansing, enrichment, and transformation.

Hands-on learning on building a modern data warehouse using Azure Synapse

Explore the capabilities of Spark Pools and their role in processing large-scale data workloads

Understand how python is used in Data Engineering

Understand and transform data with Serverless SQL pool

Understand the principles and advantages of Delta Lake as a reliable data storage and management solution.

Explore the capabilities of Spark Pools and their role in processing large-scale data workloads

Learn How Spark is evolved and its growth

Provides insights on services that needed to clear DP-203

Create and configure a Serverless SQL pool

Create External DataSource, External Files, External Tables in Serverless SQL pool

Configure Spark Pools and understand the working of them

Explore the capabilities of Spark Pools and their role in processing large-scale data workloads

Understand the Integration of Power BI with Azure Synapse Analytics

Explore the capabilities of Spark Pools and their role in processing large-scale data workloads

Create and work with Dedicated SQL pool on a high level

Optimize your PySpark with Spark Optimization techniques

Learn history and data processing before Spark

Implement the incremental UPSERT using Delta Lake

Understand and implement versioning in delta lake

Implement MSSpark Utils and the uses of its utilities

How we can mount Data lake to Synapse Notebooks

Requirements

No Azure Synapse Analytics experience needed. You will learning everything you needed

Basics of Python programming

Basics of SQL language

Description

Are you ready to revolutionize your data analytics skills? Look no further. Welcome to our comprehensive course, where you'll delve deep into the world of Azure Synapse Analytics with PySpark and emerge equipped with the tools to excel in modern data analysis. Unlock the Power of Azure Synapse Analytics! 18.5+ HOURS OF IN-DEPTH LEARNING CONTENT! In this course we will be learning about :Serverless SQL Pool - Perform flexible querying for structured and initial data explorationSpark Pools - Dive into advanced data processing and analytics with the power of Apache Spark.Spark SQL - Seamlessly query structured data using Spark's SQL capabilities.MSSpark Utils - Leverage MSSpark Utilities for enhanced Spark functionalities for Synapse/50+ PySpark Transformations - Harness over 50 PySpark transformations to manipulate and refine your data.Dedicated SQL Pool - To report data efficiently to Power BI.Integrating Power BI with Azure Synapse Analytics - Seamlessly connect Power BI for enriched data visualization and insights.Delta Lake and its features - Integrate Delta Lake for reliable, ACID-compliant data.Spark Optimization Techniques - Employ optimization techniques to enhance Spark processing speed and efficiency.You will also learn how python is helpful in data analysis. Our project-based approach ensures hands-on learning, giving you the practical experience needed to conquer real-world data challenges.While this course not completely focuses on certification you can also learn the practical understanding about Azure Synapse analytics service that is needed to pass DP-203 - "Microsoft Certified Azure Data Engineer" and DP-500 "Designing and Implementing Enterprise-Scale Analytics Solutions Using Microsoft Azure and Microsoft Power BI"Join with me in mastering Azure Synapse Analytics !

Overview

Section 1: Introduction

Lecture 1 Introduction

Lecture 2 Project Architecture

Lecture 3 Course Slides

Section 2: Origin of Azure Synapse Analytics

Lecture 4 Section Introduction

Lecture 5 Need of separate Analytical system

Lecture 6 OLAP vs OLTP

Lecture 7 A typical Datawarehouse

Lecture 8 Datalake Introduction

Lecture 9 Modern datawarehouse and its problem

Lecture 10 The solution - Azure Synapse Analytics and its Components

Lecture 11 Azure Synapse Analytics - A Single stop solution

Lecture 12 Section Summary

Section 3: Environment Setup

Lecture 13 Section Introduction

Lecture 14 Creating a resource group in Azure

Lecture 15 Create Azure Synapse Analytics Service

Lecture 16 Exploring Azure Synapse Analytics

Lecture 17 Understanding the dataset

Section 4: Serverless SQL Pool

Lecture 18 Section Introduction

Lecture 19 Serverless SQL Pool - Introduction

Lecture 20 Serverless SQL Pool - Architecture

Lecture 21 Serverless SQL Pool- Benefits and Pricing

Lecture 22 Uploading files into Azure Datalake Storage

Lecture 23 Initial Data Exploration

Lecture 24 How to import SQL scripts or ipynb notebooks to Azure Synapse

Lecture 25 Fixing the Collation warning

Lecture 26 Creating External datasource

Lecture 27 Creating database scoped credential Using SAS

Lecture 28 Creating Database scoped cred using MI

Lecture 29 Deleting existing data sources for cleanup

Lecture 30 Creating an external file format - Demo

Lecture 31 Creating an External File Format - Practical

Lecture 32 Creating External DataSource for Refined container

Lecture 33 Creating an External Table

Lecture 34 End of section

Section 5: History and Data processing before Spark

Lecture 35 Section Introduction

Lecture 36 Big Data Approach

Lecture 37 Understanding Hadoop Yarn- Cluster Manager

Lecture 38 Understanding Hadoop - HDFS

Lecture 39 Understanding Hadoop - MapReduce Distributed Computing

Section 6: Emergence of Spark

Lecture 40 Section Introduction

Lecture 41 Drawbacks of MapReduce Framework

Lecture 42 Emergence of Spark

Section 7: Spark Core Concepts

Lecture 43 Section Introduction

Lecture 44 Spark EcoSystem

Lecture 45 Difference between Hadoop & Spark

Lecture 46 Spark Architecture

Lecture 47 Creating a Spark Pool & its benefits

Lecture 48 RDD Overview

Lecture 49 Functions Lambda, Map and Filter - Overview

Lecture 50 Understanding RDD in practical

Lecture 51 RDD- Lazy loading - Transformations and Actions

Lecture 52 What is RDD Lineage

Lecture 53 RDD - Word count program - Demo

Lecture 54 RDD - Word count - PySpark Program - Practical

Lecture 55 Optimization - ReduceByKey vs GroupByKey Explanation

Lecture 56 RDD - Understanding about Jobs in spark Practical

Lecture 57 RDD - Understanding Narrow and Wide Transformations

Lecture 58 RDD- Understanding Stages - Practical

Lecture 59 RDD- Understanding Tasks Practical

Lecture 60 Understand DAG , RDD Lineage and Differences

Lecture 61 Spark Higher level APIs Intro

Lecture 62 Synapse Notebook - Creating dataframes practical

Section 8: PySpark Transformation 1 - Select and Filter functions

Lecture 63 Introduction for PySpark Transformations

Lecture 64 Walkthrough on Notebook , Markdown cells

Lecture 65 Using Free Databricks Community Edition to practise and Save Costs

Lecture 66 Display and show Functions

Lecture 67 Stop Spark Session when not in use

Lecture 68 Select and SelectExpr

Lecture 69 Filter Function

Lecture 70 Organizing notebooks into a folder

Section 9: PySpark Transformation 2 - Handling Nulls, Duplicates and aggregation

Lecture 71 Understanding fillna and na.fill

Lecture 72 Identifying duplicates using Aggregations

Lecture 73 Handling Duplicates using dropna

Lecture 74 Organising notebooks into a folder

Lecture 75 Transformations summary of this section

Section 10: PySpark Transformation 3 - Data Transformation and Manipulation

Lecture 76 withColumn to Create Update columns

Lecture 77 Transforming and updating column withColumnRenamed

Section 11: PySpark 4 - Synapse Spark - MSSparkUtils

Lecture 78 What is MSSpark Utilities

Lecture 79 MSSpark Utils - Env utils

Lecture 80 What is mount point

Lecture 81 Creating and accessing mount point in Notebook

Lecture 82 All File System Utils

Lecture 83 Notebook Utils - Exit command

Lecture 84 Creating another spark pool

Lecture 85 Procedure to increase vCores request (optional)

Lecture 86 Calling notebook from another notebook

Lecture 87 Calling notebook from another using runtime parameters

Lecture 88 Magic commands

Lecture 89 Attaching two notebooks to a single spark pool

Lecture 90 Accessing Mount points from another notebook

Section 12: PySpark 5 - Synapse - Spark SQL

Lecture 91 Accessing data using Temporary Views - Practical

Lecture 92 Lake Database - Overview

Lecture 93 Understanding and creating database in Lake Database

Lecture 94 Using Spark SQL in notebook

Lecture 95 Managed vs External tables in Spark

Lecture 96 Metadata sharing between Spark pool and Serverless SQL Pool

Lecture 97 Deleting unwanted folders

Section 13: PySpark Transformation 6 - Join Transformations

Lecture 98 Uploading required files for Joins

Lecture 99 Python notebooks till Union

Lecture 100 Inner join

Lecture 101 Left Join

Lecture 102 Right Join

Lecture 103 Full outer join

Lecture 104 Left Semi Join

Lecture 105 Left anti and Cross Join

Lecture 106 Union Operation

Lecture 107 Performing Join Transformation on Project Dataset

Lecture 108 Summary of Transformations performed

Section 14: PySpark Transformation 7 - String Manipulation and sorting

Lecture 109 Replace function to change spaces

Lecture 110 PySpark Notebook for this section

Lecture 111 Split and concat functions

Lecture 112 Order by and sort

Lecture 113 Section Summary

Section 15: PySpark Transformation 8 - Window Functions

Lecture 114 Row number function

Lecture 115 PySpark Notebook used in this section

Lecture 116 Rank Function

Lecture 117 Dense Rank function

Section 16: PySpark Transformation 9 - Conversions and Pivoting

Lecture 118 Conversion using cast function

Lecture 119 PySpark Notebook need for casting and pivoting lectures

Lecture 120 Pivot function

Lecture 121 Unpivot using stack function

Lecture 122 Using to date to convert date column

Section 17: PySpark Transformation 10 - Schema definition and Management

Lecture 123 PySpark Notebook used in this lecture

Lecture 124 StructType and StructField - Demo

Lecture 125 Implementing explicit schema with StructType and StructField

Section 18: PySpark Transformation 11 - UDFs

Lecture 126 User Defined Functions - Demo

Lecture 127 Implementing UDFs in Notebook

Lecture 128 Writing transformed data to Processed container

Section 19: Dedicated SQL Pool

Lecture 129 Dedicated SQL pool - Demo

Lecture 130 Dedicated SQL Pool Architecture

Lecture 131 How distribution takes places based on DWU

Lecture 132 Factors to consider when choosing dedicated SQL pool

Lecture 133 Creating Dedicated SQL pool in Synapse

Lecture 134 Ways to copy data into Dedicated SQL Pool

Lecture 135 Copy command to copy to dedicated SQL pool

Lecture 136 Clustured Column Store index(optional)

Lecture 137 Types of Distributions or Sharing patterns

Lecture 138 Using Pipeline to Copy to dedicated SQL Pool

Section 20: Reporting data to Power BI

Lecture 139 Section Introduction

Lecture 140 Installing Power BI Desktop

Lecture 141 Creating report from Power BI Desktop

Lecture 142 Creating new user in Azure AD for creating workspace (if using personal account)

Lecture 143 Creating a shared workspace in Power BI

Lecture 144 Publishing report to Shared Workspace

Lecture 145 Accessing Power BI from Azure Synapse Analytics

Lecture 146 Download Power BI .pbix file from here

Lecture 147 Creating Dataset and report from Synapse Analytics

Lecture 148 Concluding the Power BI Section

Lecture 149 Summary and end of project implementation

Section 21: Spark - Optimisation Techniques

Lecture 150 Optimisation Section Intro

Lecture 151 Uploading required files for Optimisation

Lecture 152 Spark Optimisation levels

Lecture 153 Avoid using Collect function

Lecture 154 Making notebook into particular folder

Lecture 155 Avoid InferSchema

Lecture 156 Use Cache Persist 1 - Understanding Serialization and DeSerialization

Lecture 157 Use Cache Persist 2 - How cache or persist will work - Demo

Lecture 158 Use Cache Persist 3 - Understanding cache practically

Lecture 159 Use Cache Persist 4 - Persist - What is persist and different storage levels

Lecture 160 Use Cache Persist - Notebook for persist with all storage levels

Lecture 161 Use Cache Persist 5 - Persist - MEMORY_ONLY

Lecture 162 Use Cache Persist 6 - Persist - MEMORY AND DISK

Lecture 163 Use Cache Persist 7 - Persist - MEMORY_ONLY_SER (Scala Only)

Lecture 164 Use Cache Persist 8 - Persist - MEMORY_AND_DISK_SER ( Scala Only)

Lecture 165 Use Cache Persist 9 - Persist - DISK ONLY

Lecture 166 Use Cache Persist 10 - Persist - OFF HEAP (Scala Only)

Lecture 167 Use Cache Persist 11 - Persist - MEMORY_ONLY_2 (PySpark only)

Lecture 168 Use Partitioning 1 - Understanding partitioning - Demo

Lecture 169 Use Partitioning 2 - Understand partitioning - Practical

Lecture 170 Repartiton and coalesce 1 - Understanding repartition and coalesce - Demo

Lecture 171 Repartiton and coalesce 2 - Understanding repartition and coalesce - Practical

Lecture 172 Broadcast variables 1 - Understanding broadcast variables - Demo

Lecture 173 Broadcast variables 2 - Implementing broadcast variables in notebook

Lecture 174 Use Kryo Serializer

Section 22: Delta Lake

Lecture 175 Section Introduction

Lecture 176 Drawbacks of ADLS

Lecture 177 What is Delta lake

Lecture 178 Lakehouse Architecture

Lecture 179 Uploading required file for Delta lake

Lecture 180 Problems with Azure Datalake - Practical

Lecture 181 Creating a Delta lake

Lecture 182 Understanding Delta format

Lecture 183 Contents of Transaction Log or Delta log file - Practical

Lecture 184 Contents of a transaction log demo

Lecture 185 Creating delta table by Path using SQL

Lecture 186 Creating delta table in Metastore using Pyspark and SQL

Lecture 187 Schema Enforcement - Files required for Understanding Schema Enforcement -

Lecture 188 What is schema enforcement - Demo

Lecture 189 Schema Enforcement - Practical

Lecture 190 Schema Evolution - Practical

Lecture 191 16. Versioning and Time Travel

Lecture 192 Vacuum command

Lecture 193 Convert to Delta command

Lecture 194 Checkpoints in delta log

Lecture 195 Optimize command - Demo

Lecture 196 Optimize command - Practical

Lecture 197 Applying UPSERT using MERGE Command

Section 23: Conclusion

Lecture 198 Course Conclusion

Lecture 199 Bonus Lecture

Beginners who want to step into the world of Data Engineers,Professional Data Engineers who want to advance their data analysis skills,Students who are keen to learn Data Analytics,Data Engineers who want to learn data warehousing in Cloud using Azure Synapse Analytics