Organizations can make data science a repeatable, predictable tool, which business professionals use to get more value from their data
Enterprise data and AI projects are often scattershot, underbaked, siloed, and not adaptable to predictable business changes. As a result, the vast majority fail. These expensive quagmires can be avoided, and this book explains precisely how.
Data science is emerging as a hands-on tool for not just data scientists, but business professionals as well. Managers, directors, IT leaders, and analysts must expand their use of data science capabilities for the organization to stay competitive. Smarter Data Science helps them achieve their enterprise-grade data projects and AI goals. It serves as a guide to building a robust and comprehensive information architecture program that enables sustainable and scalable AI deployments.
When an organization manages its data effectively, its data science program becomes a fully scalable function that’s both prescriptive and repeatable. With an understanding of data science principles, practitioners are also empowered to lead their organizations in establishing and deploying viable AI. They employ the tools of machine learning, deep learning, and AI to extract greater value from data for the benefit of the enterprise.
By following a ladder framework that promotes prescriptive capabilities, organizations can make data science accessible to a range of team members, democratizing data science throughout the organization. Companies that collect, organize, and analyze data can move forward to additional data science achievements:
- Improving time-to-value with infused AI models for common use cases
- Optimizing knowledge work and business processes
- Utilizing AI-based business intelligence and data visualization
- Establishing a data topology to support general or highly specialized needs
- Successfully completing AI projects in a predictable manner
- Coordinating the use of AI from any compute node. From inner edges to outer edges: cloud, fog, and mist computing
When they climb the ladder presented in this book, businesspeople and data scientists alike will be able to improve and foster repeatable capabilities. They will have the knowledge to maximize their AI and data assets for the benefit of their organizations.
Tabla de materias
Foreword for Smarter Data Science xix
Epigraph xxi
Preamble xxiii
Chapter 1 Climbing the AI Ladder 1
Readying Data for AI 2
Technology Focus Areas 3
Taking the Ladder Rung by Rung 4
Constantly Adapt to Retain Organizational Relevance 8
Data-Based Reasoning is Part and Parcel in the Modern Business 10
Toward the AI-Centric Organization 14
Summary 16
Chapter 2 Framing Part I: Considerations for Organizations Using AI 17
Data-Driven Decision-Making 18
Using Interrogatives to Gain Insight 19
The Trust Matrix 20
The Importance of Metrics and Human Insight 22
Democratizing Data and Data Science 23
Aye, a Prerequisite: Organizing Data Must Be a Forethought 26
Preventing Design Pitfalls 27
Facilitating the Winds of Change: How Organized Data Facilitates Reaction Time 29
Quae Quaestio (Question Everything) 30
Summary 32
Chapter 3 Framing Part II: Considerations for Working with Data and AI 35
Personalizing the Data Experience for Every User 36
Context Counts: Choosing the Right Way to Display Data 38
Ethnography: Improving Understanding Through Specialized Data 42
Data Governance and Data Quality 43
The Value of Decomposing Data 43
Providing Structure Through Data Governance 43
Curating Data for Training 45
Additional Considerations for Creating Value 45
Ontologies: A Means for Encapsulating Knowledge 46
Fairness, Trust, and Transparency in AI Outcomes 49
Accessible, Accurate, Curated, and Organized 52
Summary 54
Chapter 4 A Look Back on Analytics: More Than One Hammer 57
Been Here Before: Reviewing the Enterprise Data Warehouse 57
Drawbacks of the Traditional Data Warehouse 64
Paradigm Shift 68
Modern Analytical Environments: The Data Lake 69
By Contrast 71
Indigenous Data 72
Attributes of Difference 73
Elements of the Data Lake 75
The New Normal: Big Data is Now Normal Data 77
Liberation from the Rigidity of a Single Data Model 78
Streaming Data 78
Suitable Tools for the Task 78
Easier Accessibility 79
Reducing Costs 79
Scalability 79
Data Management and Data Governance for AI 80
Schema-on-Read vs. Schema-on-Write 81
Summary 84
Chapter 5 A Look Forward on Analytics: Not Everything Can Be a Nail 87
A Need for Organization 87
The Staging Zone 90
The Raw Zone 91
The Discovery and Exploration Zone 92
The Aligned Zone 93
The Harmonized Zone 98
The Curated Zone 100
Data Topologies 100
Zone Map 103
Data Pipelines 104
Data Topography 105
Expanding, Adding, Moving, and Removing Zones 107
Enabling the Zones 108
Ingestion 108
Data Governance 111
Data Storage and Retention 112
Data Processing 114
Data Access 116
Management and Monitoring 117
Metadata 118
Summary 119
Chapter 6 Addressing Operational Disciplines on the AI Ladder 121
A Passage of Time 122
Create 128
Stability 128
Barriers 129
Complexity 129
Execute 130
Ingestion 131
Visibility 132
Compliance 132
Operate 133
Quality 134
Reliance 135
Reusability 135
The x Ops Trifecta: Dev Ops/MLOps, Data Ops, and AIOps 136
Dev Ops/MLOps 137
Data Ops 139
AIOps 142
Summary 144
Chapter 7 Maximizing the Use of Your Data: Being Value Driven 147
Toward a Value Chain 148
Chaining Through Correlation 152
Enabling Action 154
Expanding the Means to Act 155
Curation 156
Data Governance 159
Integrated Data Management 162
Onboarding 163
Organizing 164
Cataloging 166
Metadata 167
Preparing 168
Provisioning 169
Multi-Tenancy 170
Summary 173
Chapter 8 Valuing Data with Statistical Analysis and Enabling Meaningful Access 175
Deriving Value: Managing Data as an Asset 175
An Inexact Science 180
Accessibility to Data: Not All Users are Equal 183
Providing Self-Service to Data 184
Access: The Importance of Adding Controls 186
Ranking Datasets Using a Bottom-Up Approach for Data Governance 187
How Various Industries Use Data and AI 188
Benefi ting from Statistics 189
Summary 198
Chapter 9 Constructing for the Long-Term 199
The Need to Change Habits: Avoiding Hard-Coding 200
Overloading 201
Locked In 202
Ownership and Decomposition 204
Design to Avoid Change 204
Extending the Value of Data Through AI 206
Polyglot Persistence 208
Benefi ting from Data Literacy 213
Understanding a Topic 215
Skillsets 216
It’s All Metadata 218
The Right Data, in the Right Context, with the Right Interface 219
Summary 221
Chapter 10 A Journey’s End: An IA for AI 223
Development Efforts for AI 224
Essential Elements: Cloud-Based Computing, Data, and Analytics 228
Intersections: Compute Capacity and Storage Capacity 234
Analytic Intensity 237
Interoperability Across the Elements 238
Data Pipeline Flight Paths: Preflight, Inflight, Postflight 242
Data Management for the Data Puddle, Data Pond, and Data Lake 243
Driving Action: Context, Content, and Decision-Makers 245
Keep It Simple 248
The Silo is Dead; Long Live the Silo 250
Taxonomy: Organizing Data Zones 252
Capabilities for an Open Platform 256
Summary 260
Appendix Glossary of Terms 263
Index 269
Sobre el autor
NEAL FISHMAN is a Distinguished Engineer and CTO of Data-Based Pathology at IBM. He is an IBM-certified Senior IT Architect and Open Group Distinguished Chief Architect.
COLE STRYKER is a journalist based in Los Angeles. He is the author of Epic Win for Anonymous and Hacking the Future.