A much-needed guide to implementing new technology in workspaces
From experts in the field comes Machine Learning Upgrade: A Data Scientist’s Guide to MLOps, LLMs, and ML Infrastructure, a book that provides data scientists and managers with best practices at the intersection of management, large language models (LLMs), machine learning, and data science. This groundbreaking book will change the way that you view the pipeline of data science. The authors provide an introduction to modern machine learning, showing you how it can be viewed as a holistic, end-to-end system—not just shiny new gadget in an otherwise unchanged operational structure. By adopting a data-centric view of the world, you can begin to see unstructured data and LLMs as the foundation upon which you can build countless applications and business solutions. This book explores a whole world of decision making that hasn’t been codified yet, enabling you to forge the future using emerging best practices.
- Gain an understanding of the intersection between large language models and unstructured data
- Follow the process of building an LLM-powered application while leveraging MLOps techniques such as data versioning and experiment tracking
- Discover best practices for training, fine tuning, and evaluating LLMs
- Integrate LLM applications within larger systems, monitor their performance, and retrain them on new data
This book is indispensable for data professionals and business leaders looking to understand LLMs and the entire data science pipeline.
Table des matières
Introduction ix
1 A Gentle Introduction to Modern Machine Learning 1
Data Science Is Diverging from Business Intelligence 3
From CRISP-DM to Modern, Multicomponent ml Systems 4
The Emergence of LLMs Has Increased ML’s Power and Complexity 7
What You Can Expect from This Book 9
2 An End-to-End Approach 11
Components of a You Tube Search Agent 13
Principles of a Production Machine Learning System 16
Observability 19
Reproducibility 19
Interoperability 20
Scalability 21
Improvability 22
A Note on Tools 23
3 A Data-Centric View 25
The Emergence of Foundation Models 25
The Role of Off-the-Shelf Components 27
The Data-Driven Approach 28
A Note on Data Ethics 28
Building the Dataset 30
Working with Vector Databases 34
Data Versioning and Management 50
Getting Started with Data Versioning 53
Knowing “Just Enough” Engineering 57
4 Standing Up Your LLM 61
Selecting Your LLM 61
What Type of Inference Do I Need to Perform? 65
How Open-Ended Is This Task? 66
What Are the Privacy Concerns for This Data? 66
How Much Will This Model Cost? 67
Experiment Management with LLMs 68
LLM Inference 74
Basics of Prompt Engineering 74
In-Context Learning 77
Intermediary Computation 85
Augmented Generation 89
Agentic Techniques 94
Optimizing LLM Inference with Experiment Management 102
Fine-Tuning LLMs 111
When to Fine-Tune an LLM 112
Quantization, QLOr A, and Parameter Efficient Fine-Tuning 113
Wrapping Things Up 121
5 Putting Together an Application 123
Prototyping with Gradio 125
Creating Graphics with Plotnine 128
Adding the Author Selector 137
Adding a Logo 138
Adding a Tab 139
Adding a Title and Subtitle 140
Changing the Color of the Buttons 140
Click to Download Button 141
Putting It All Together 141
Deploying Models as APIs 144
Implementing an API with Fast API 146
Implementing Uvicorn 148
Monitoring an LLM 149
Dockerizing Your Service 151
Deploying Your Own LLM 154
Wrapping Things Up 159
6 Rounding Out the ML Life Cycle 161
Deploying a Simple Random Forest Model 161
An Introduction to Model Monitoring 167
Model Monitoring with Evidently AI 175
Building a Model Monitoring System 176
Final Thoughts on Monitoring 187
7 Review of Best Practices 189
Step 1: Understand the Problem 189
Step 2: Model Selection and Training 190
Step 3: Deploy and Maintain 192
Step 4: Collaborate and Communicate 196
Emerging Trends in LLMs 197
Next Steps in Learning 199
Appendix: Additional LLM Example 201
Index 209
A propos de l’auteur
Kristen Kehrer has been providing innovative and practical statistical modeling solutions since 2010. In 2018, she achieved recognition as a Linked In Top Voice in Data Science & Analytics. Kristen is also the founder of Data Moves Me, LLC.
Caleb Kaiser is a Full Stack Engineer at Comet. Caleb was previously on the Founding Team at Cortex Labs. Caleb also worked at Scribe Media on the Author Platform Team.