Data engineers who need to hit the ground running will use this book to build skills in Azure Data Factory v2 (ADF). The tutorial-first approach to ADF taken in this book gets you working from the first chapter, explaining key ideas naturally as you encounter them. From creating your first data factory to building complex, metadata-driven nested pipelines, the book guides you through essential concepts in Microsoft’s cloud-based ETL/ELT platform. It introduces components indispensable for the movement and transformation of data in the cloud. Then it demonstrates the tools necessary to orchestrate, monitor, and manage those components.
This edition, updated for 2024, includes the latest developments to the Azure Data Factory service:
- Enhancements to existing pipeline activities such as Execute Pipeline, along with the introduction of new activities such as Script, and activities designed specifically to interact with Azure Synapse Analytics.
- Improvements to flow control provided by activity deactivation and the Fail activity.
- The introduction of reusable data flow components such as user-defined functions and flowlets.
- Extensions to integration runtime capabilities including Managed VNet support.
- The ability to trigger pipelines in response to custom events.
- Tools for implementing boilerplate processes such as change data capture and metadata-driven data copying.
What You Will Learn
- Create pipelines, activities, datasets, and linked services
- Build reusable components using variables, parameters, and expressions
- Move data into and around Azure services automatically
- Transform data natively using ADF data flows and Power Query data wrangling
- Master flow-of-control and triggers for tightly orchestrated pipeline execution
- Publish and monitor pipelines easily and with confidence
Who This Book Is For
Data engineers and ETL developers taking their first steps in Azure Data Factory, SQL Server Integration Services users making the transition toward doing ETL in Microsoft’s Azure cloud, and SQL Server database administrators involved in data warehousing and ETL operations
Mục lục
1. Creating an Azure Data Factory Instance.- 2. Your First Pipeline.- 3. The Copy Data Activity.- 4. Expressions.- 5. Parameters.- 6. Controlling Flow.- 7. Data Flows.- 8. Integration Runtimes.- 9. Power Query in ADF.- 10. Publishing to ADF.- 11. Triggers.- 12. Change Monitoring.- 13. Tools and Other Services.
Giới thiệu về tác giả
Richard Swinbank is a data engineer and Microsoft Data Platform MVP. He specializes in building and automating analytics platforms using Microsoft technologies from the SQL Server stack to the Azure cloud. He is a fervent advocate of Data Ops, with a technical focus on bringing automation to both analytics development and operations. An active member of the data community and keen knowledge-sharer, Richard is a volunteer, organizer, speaker, blogger, open source contributor, and author. He holds a Ph D in computer science from the University of Birmingham (UK).