This book provides a principled data-driven framework that progressively constructs, enriches, and applies taxonomies without leveraging massive human annotated data. Traditionally, people construct domain-specific taxonomies by extensive manual curations, which is time-consuming and costly. In today’s information era, people are inundated with the vast amounts of text data. Despite their usefulness, people haven’t yet exploited the full power of taxonomies due to the heavy curation needed for creating and maintaining them. To bridge this gap, the authors discuss automated taxonomy discovery and exploration, with an emphasis on label-efficient machine learning methods and their real-world usages. Taxonomy organizes entities and concepts in a hierarchy way. It is ubiquitous in our daily life, ranging from product taxonomies used by online retailers, topic taxonomies deployed by news outlets and social media, as well as scientific taxonomies deployed by digital libraries across various domains. When properly analyzed, these taxonomies can play a vital role for science, engineering, business intelligence, policy design, e-commerce, and more. Intuitive examples are used throughout enabling readers to grasp concepts more easily.
विषयसूची
Introduction.- Concept Set Expansion.- Taxonomy Construction.- Taxonomy Enrichment.- Taxonomy-Guided Classification.- Conclusions.
लेखक के बारे में
Jiaming Shen, Ph.D., is a Research Scientist at Google Research working on data mining and natural language processing. His research aims to develop automated methods for mining knowledge from text data without excessive human annotations. He completed his Ph.D. from the University of Illinois at Urbana-Champaign and a B.S. degree from Shanghai Jiao Tong University. His research has been awarded several fellowships and scholarships, including a Brian Totty Graduate Fellowship and a Yunni & Maxine Pao Memorial Fellowship.
Jiawei Han, Ph.D. is a Michael Aiken Chair Professor at the University of Illinois at Urbana-Champaign. His research areas encompass data mining, text mining, data warehousing, and information network analysis, with over 800 research publications. He is a Fellow of both ACM and the IEEE and has received numerous prominent awards, including the ACM SIGKDD Innovation Award (2004) and the IEEE Computer Society W. Wallace Mc Dowell Award (2009).