Malcolm Atkinson & Rob Baxter 
The Data Bonanza [EPUB ebook] 
Improving Knowledge Discovery in Science, Engineering, and Business

Support

Complete guidance for mastering the tools and techniques of the digital revolution


With the digital revolution opening up tremendous opportunities in many fields, there is a growing need for skilled professionals who can develop data-intensive systems and extract information and knowledge from them. This book frames for the first time a new systematic approach for tackling the challenges of data-intensive computing, providing decision makers and technical experts alike with practical tools for dealing with our exploding data collections.


Emphasizing data-intensive thinking and interdisciplinary collaboration, The Data Bonanza: Improving Knowledge Discovery in Science, Engineering, and Business examines the essential components of knowledge discovery, surveys many of the current research efforts worldwide, and points to new areas for innovation. Complete with a wealth of examples and DISPEL-based methods demonstrating how to gain more from data in real-world systems, the book:



  • Outlines the concepts and rationale for implementing data-intensive computing in organizations

  • Covers from the ground up problem-solving strategies for data analysis in a data-rich world

  • Introduces techniques for data-intensive engineering using the Data-Intensive Systems Process Engineering Language DISPEL

  • Features in-depth case studies in customer relations, environmental hazards, seismology, and more

  • Showcases successful applications in areas ranging from astronomy and the humanities to transport engineering

  • Includes sample program snippets throughout the text as well as additional materials on a companion website


The Data Bonanza is a must-have guide for information strategists, data analysts, and engineers in business, research, and government, and for anyone wishing to be on the cutting edge of data mining, machine learning, databases, distributed systems, or large-scale computing.

€99.99
payment methods

Table of Content

CONTRIBUTORS xv


FOREWORD xvii


PREFACE xix


THE EDITORS xxix


PART I STRATEGIES FOR SUCCESS IN THE DIGITAL-DATA REVOLUTION 1


1. The Digital-Data Challenge 5
Malcolm Atkinson and Mark Parsons


1.1 The Digital Revolution 5


1.2 Changing How We Think and Behave 6


1.3 Moving Adroitly in this Fast-Changing Field 8


1.4 Digital-Data Challenges Exist Everywhere 8


1.5 Changing How We Work 9


1.6 Divide and Conquer Offers the Solution 10


1.7 Engineering Data-to-Knowledge Highways 12


2. The Digital-Data Revolution 15
Malcolm Atkinson


2.1 Data, Information, and Knowledge 16


2.2 Increasing Volumes and Diversity of Data 18


2.3 Changing the Ways We Work with Data 28


3. The Data-Intensive Survival Guide 37
Malcolm Atkinson


3.1 Introduction: Challenges and Strategy 38


3.2 Three Categories of Expert 39


3.3 The Data-Intensive Architecture 41


3.4 An Operational Data-Intensive System 42


3.5 Introducing DISPEL 44


3.6 A Simple DISPEL Example 45


3.7 Supporting Data-Intensive Experts 47


3.8 DISPEL in the Context of Contemporary Systems 48


3.9 Datascopes 51


3.10 Ramps for Incremental Engagement 54


3.11 Readers’ Guide to the Rest of This Book 56


4. Data-Intensive Thinking with DISPEL 61
Malcolm Atkinson


4.1 Processing Elements 62


4.2 Connections 64


4.3 Data Streams and Structure 65


4.4 Functions 66


4.5 The Three-Level Type System 72


4.6 Registry, Libraries, and Descriptions 81


4.7 Achieving Data-Intensive Performance 86


4.8 Reliability and Control 108


4.9 The Data-to-Knowledge Highway 116


PART II DATA-INTENSIVE KNOWLEDGE DISCOVERY 123


5. Data-Intensive Analysis 127
Oscar Corcho and Jano van Hemert


5.1 Knowledge Discovery in Telco Inc. 128


5.2 Understanding Customers to Prevent Churn 130


5.3 Preventing Churn Across Multiple Companies 134


5.4 Understanding Customers by Combining Heterogeneous Public and Private Data 137


5.5 Conclusions 144


6. Problem Solving in Data-Intensive Knowledge Discovery 147
Oscar Corcho and Jano van Hemert


6.1 The Conventional Life Cycle of Knowledge Discovery 148


6.2 Knowledge Discovery Over Heterogeneous Data Sources 155


6.3 Knowledge Discovery from Private and Public, Structured and Nonstructured Data 158


6.4 Conclusions 162


7. Data-Intensive Components and Usage Patterns 165
Oscar Corcho


7.1 Data Source Access and Transformation Components 166


7.2 Data Integration Components 172


7.3 Data Preparation and Processing Components 173


7.4 Data-Mining Components 174


7.5 Visualization and Knowledge Delivery Components 176


8. Sharing and Reuse in Knowledge Discovery 181
Oscar Corcho


8.1 Strategies for Sharing and Reuse 182


8.2 Data Analysis Ontologies for Data Analysis Experts 185


8.3 Generic Ontologies for Metadata Generation 188


8.4 Domain Ontologies for Domain Experts 189


8.5 Conclusions 190


PART III DATA-INTENSIVE ENGINEERING 193


9. Platforms for Data-Intensive Analysis 197
David Snelling


9.1 The Hourglass Reprise 198


9.2 The Motivation for a Platform 200


9.3 Realization 201


10. Definition of the DISPEL Language 203
Paul Martin and Gagarine Yaikhom


10.1 A Simple Example 204


10.2 Processing Elements 205


10.3 Data Streams 213


10.4 Type System 217


10.5 Registration 222


10.6 Packaging 224


10.7 Workflow Submission 225


10.8 Examples of DISPEL 227


10.9 Summary 235


11. DISPEL Development 237
Adrian Mouat and David Snelling


11.1 The Development Landscape 237


11.2 Data-Intensive Workbenches 239


11.3 Data-Intensive Component Libraries 247


11.4 Summary 248


12. DISPEL Enactment 251
Chee Sun Liew, Amrey Krause, and David Snelling


12.1 Overview of DISPEL Enactment 251


12.2 DISPEL Language Processing 253


12.3 DISPEL Optimization 255


12.4 DISPEL Deployment 266


12.5 DISPEL Execution and Control 268


PART IV DATA-INTENSIVE APPLICATION EXPERIENCE 275


13. The Application Foundations of DISPEL 277
Rob Baxter


13.1 Characteristics of Data-Intensive Applications 277


13.2 Evaluating Application Performance 280


13.3 Reviewing the Data-Intensive Strategy 283


14. Analytical Platform for Customer Relationship Management 287
Maciej Jarka and Mark Parsons


14.1 Data Analysis in the Telecoms Business 288


14.2 Analytical Customer Relationship Management 289


14.3 Scenario 1: Churn Prediction 291


14.4 Scenario 2: Cross Selling 293


14.5 Exploiting the Models and Rules 296


14.6 Summary: Lessons Learned 299


15. Environmental Risk Management 301
Ladislav Hluchy, Ondrej Habala, Viet Tran, and Branislav Simo


15.1 Environmental Modeling 302


15.2 Cascading Simulation Models 303


15.3 Environmental Data Sources and Their Management 305


15.4 Scenario 1: ORAVA 309


15.5 Scenario 2: RADAR 313


15.6 Scenario 3: SVP 318


15.7 New Technologies for Environmental Data Mining 321


15.8 Summary: Lessons Learned 323


16. Analyzing Gene Expression Imaging Data in Developmental Biology 327
Liangxiu Han, Jano van Hemert, Ian Overton, Paolo Besana, and Richard Baldock


16.1 Understanding Biological Function 328


16.2 Gene Image Annotation 330


16.3 Automated Annotation of Gene Expression Images 331


16.4 Exploitation and Future Work 341


16.5 Summary 345


17. Data-Intensive Seismology: Research Horizons 353
Michelle Galea, Andreas Rietbrock, Alessandro Spinuso, and Luca Trani


17.1 Introduction 354


17.2 Seismic Ambient Noise Processing 356


17.3 Solution Implementation 358


17.4 Evaluation 369


17.5 Further Work 372


17.6 Conclusions 373


PART V DATA-INTENSIVE BEACONS OF SUCCESS 377


18. Data-Intensive Methods in Astronomy 381
Thomas D. Kitching, Robert G. Mann, Laura E. Valkonen, Mark S. Holliman, Alastair Hume, and Keith T. Noddle


18.1 Introduction 381


18.2 The Virtual Observatory 382


18.3 Data-Intensive Photometric Classification of Quasars 383


18.4 Probing the Dark Universe with Weak Gravitational Lensing 387


18.5 Future Research Issues 392


18.6 Conclusions 392


19. The World at One’s Fingertips: Interactive Interpretation of Environmental Data 395
Jon Blower, Keith Haines, and Alastair Gemmell


19.1 Introduction 395


19.2 The Current State of the Art 397


19.3 The Technical Landscape 401


19.4 Interactive Visualization 403


19.5 From Visualization to Intercomparison 406


19.6 Future Development: The Environmental Cloud 409


19.7 Conclusions 411


20. Data-Driven Research in the Humanities—the DARIAH Research Infrastructure 417
Andreas Aschenbrenner, Tobias Blanke, Christiane Fritze, and Wolfgang Pempe


20.1 Introduction 417


20.2 The Tradition of Digital Humanities 420


20.3 Humanities Research Data 422


20.4 Use Case 426


20.5 Conclusion and Future Development 429


21. Analysis of Large and Complex Engineering and Transport Data 431
Jim Austin


21.1 Introduction 431


21.2 Applications and Challenges 432


21.3 The Methods Used 434


21.4 Future Developments 438


21.5 Conclusions 439


References 440


22. Estimating Species Distributions—Across Space, Through Time, and with Features of the Environment 441
Steve Kelling, Daniel Fink, Wesley Hochachka, Ken Rosenberg, Robert Cook, Theodoros Damoulas, Claudio Silva, and William Michener


22.1 Introduction 442


22.2 Data Discovery, Access, and Synthesis 443


22.3 Model Development 448


22.4 Managing Computational Requirements 449


22.5 Exploring and Visualizing Model Results 450


22.6 Analysis Results 452


22.7 Conclusion 454


PART VI THE DATA-INTENSIVE FUTURE 459


23. Data-Intensive Trends 461
Malcolm Atkinson and Paolo Besana


23.1 Reprise 461


23.2 Data-Intensive Applications 469


24. Data-Rich Futures 477
Malcolm Atkinson


24.1 Future Data Infrastructure 478


24.2 Future Data Economy 485


24.3 Future Data Society and Professionalism 489


References 494


Appendix A: Glossary 499
Michelle Galea and Malcolm Atkinson


Appendix B: DISPEL Reference Manual 507
Paul Martin


Appendix C: Component Definitions 531
Malcolm Atkinson and Chee Sun Liew


INDEX 537

About the author

MALCOLM ATKINSON, Ph D, is Professor of e-Science in the School of Informatics at the University of Edinburgh in Scotland. He is also Data-Intensive Research Group leader, Director of the e-Science Institute, IT architect for the ADMIRE and VERCE EU projects and UK e-Science Envoy. Professor Atkinson has been leading research projects for several decades and served on many advisory bodies.
Buy this ebook and get 1 more FREE!
Language English ● Format EPUB ● ISBN 9781118540305 ● File size 9.0 MB ● Editor Malcolm Atkinson & Rob Baxter ● Publisher John Wiley & Sons ● Country US ● Published 2013 ● Edition 1 ● Downloadable 24 months ● Currency EUR ● ID 2657736 ● Copy protection without

More ebooks from the same author(s) / Editor

3,992 Ebooks in this category