This book examines the recent trend of extending data dependencies to adapt to rich data types in order to address variety and veracity issues in big data. Readers will be guided through the full range of rich data types where data dependencies have been successfully applied, including categorical data with equality relationships, heterogeneous data with similarity relationships, numerical data with order relationships, sequential data with timestamps, and graph data with complicated structures. The text will also discuss interesting constraints on ordering or similarity relationships contained in novel classes of data dependencies in addition to those in equality relationships, e.g., considered in functional dependencies (FDs). In addition to exploring the concepts of these data dependency notations, the book investigates the extension relationships between data dependencies, such as conditional functional dependencies (CFDs) that extend conventional functional dependencies (FDs). This forms in the book a family tree of extensions, mostly rooted in FDs, that help illuminate the expressive power of various data dependencies. Moreover, the book points to work on the discovery of dependencies from data, since data dependencies are often unlikely to be manually specified in a traditional way, given the huge volume and high variety in big data. It further outlines the applications of the extended data dependencies, in particular in data quality practice. Altogether, this book provides a comprehensive guide for readers to select proper data dependencies for their applications that have sufficient expressive power and reasonable discovery cost. Finally, the book concludes with several directions of future studies on emerging data.
Tabla de materias
Introduction.- Categorical Data.- Heterogeneous Data.- Ordered Data.- Temporal Data.- Graph Data.- Conclusions and Directions.- Index of Data Dependencies.- References.
Sobre el autor
Shaoxu Song is an Associate Professor in the School of Software at Tsinghua University in Beijing, China. His research interests include data quality and data integration. He has published more than 50 papers in top conferences and journals such as SIGMOD, VLDB, ICDE, ACM
TODS,
VLDBJ,
IEEE TKDE, etc. He served as a Vice Program Chair for the 2022 IEEE International Conference on Big Data (IEEE Big Data 2022) and received the Distinguished Reviewer award from VLDB 2019 and an Outstanding Reviewer award from CIKM 2017.
Lei Chen is a Chaired Professor in the Department of Computer Science and Engineering at the Hong Kong University of Science and Technology and the Director of the HKUST Big Data Institute. He received the SIGMOD Test-of-Time Award in 2015 and served as the Program Committee Co-Chair of VLDB 2019 and ICDE 2023. He is currently the Editor-in-Chief of the
VLDB Journal, and the Editor-in-Chief of
IEEE Transactions
on Knowledge and Data Engineering (TKDE). He is an IEEE Fellow and ACM Distinguished Scientist.