This book lays out a path leading from the linguistic and cognitive basics, to classical rule-based and machine learning algorithms, to today’s state-of-the-art approaches, which use advanced empirically grounded techniques, automatic knowledge acquisition, and refined linguistic modeling to make a real difference in real-world applications. Anaphora and coreference resolution both refer to the process of linking textual phrases (and, consequently, the information attached to them) within as well as across sentence boundaries, and to the same discourse referent.
The book offers an overview of recent research advances, focusing on practical, operational approaches and their applications. In part I (Background), it provides a general introduction, which succinctly summarizes the linguistic, cognitive, and computational foundations of anaphora processing and the key classical rule- and machine-learning-based anaphora resolution algorithms. Acknowledging the central importance ofshared resources, part II (Resources) covers annotated corpora, formal evaluation, preprocessing technology, and off-the-shelf anaphora resolution systems. Part III (Algorithms) provides a thorough description of state-of-the-art anaphora resolution algorithms, covering enhanced machine learning methods as well as techniques for accomplishing important subtasks such as mention detection and acquisition of relevant knowledge. Part IV (Applications) deals with a selection of important anaphora and coreference resolution applications, discussing particular scenarios in diverse domains and distilling a best-practice model for systematically approaching new application cases. In the concluding part V (Outlook), based on a survey conducted among the contributing authors, the prospects of the research field of anaphora processing are discussed, and promising new areas of interdisciplinary cooperation and emerging application scenarios are identified.
Given the book’s design, it can be used both as an accompanying text for advanced lectures in computational linguistics, natural language engineering, and computer science, and as a reference work for research and independent study. It addresses an audience that includes academic researchers, university lecturers, postgraduate students, advanced undergraduate students, industrial researchers, and software engineers.
Table of Content
Preface.- 1.Introduction.- Part I Background .- 2.Linguistic and Cognitive Evidence About Anaphora.- 3. Early Approaches to Anaphora Resolution: Theoretically Inspired and Heuristic-Based.- Part II Resources .- 4.Annotated Corpora and Annotation Tools.- 5.Evaluation Metrics.- 6.Evaluation Campaigns.- 7.Preprocessing Technology.- 8.Off-the-shelf Tools.- Part III Algorithms .- 9.The Mention-Pair Model.- 10.Advanced Machine Learning Models for Coreference Resolution.- 11.Integer Linear Programming for Coreference Resolution.- 12.Extracting Anaphoric Agreement Properties from Corpora.- 13.Detecting Non-reference and Non-anaphoricity.- 14.Using Lexical and Encyclopedic Knowledge.- Part IV Applications .- 15.Coreference Applications to Summarization.- 16.Towards a Procedure Model for Developing Anaphora Processing Applications.- Part V Outlook .- 17.Challenges and Directions of Further Research.- Index.
About the author
Massimo Poesio is a cognitive scientist with a primary interest in computational linguist but interests in psycholinguistics and neuroscience as well. His research includes the development of computational models of semantic and discourse interpretation (in particular, anaphora resolution); the creation of corpora of anaphorically annotated data (he pioneered the use of games-with-a-purpose for computational linguistics with the development of Phrase Detectives, http://www.phrasedetectives.org); the study of commonsense knowledge using a combination of methods from computational linguistics and from neuroscience; and the application of text analytics methods to real life problems, such as deception detection and the identification of reports of human rights violations in social media.
Roland Stuckardt works as a consultant, research & development manager, and scientific researcher in the fields of computational linguistics and natural language processing. He studied computer science and economics at Goethe University Frankfurt. During his work at the German National Research Center for Information Technology (GMD) Darmstadt, he specialized in text analysis, parsing, discourse semantics, and robust anaphor resolution. He received his Ph D at Goethe University for his research on computer-based text content analysis in the social sciences. Among his research interests and main fields of work are anaphora processing, information extraction, media content monitoring, innovative natural language processing applications in general, and computer chess.
Yannick Versley is a group leader in the Leibniz-Science Campus ‘Empirical Linguistics and Computational Language Modeling’, a collaboration between the Institute for German Language (IDS) in Mannheim and the Institute for Computational Linguistics at the University of Heidelberg. He studied Computer Science, Physics and Mathematics in Hamburg before doing a Ph D in Tübingen on the coreference resolution of definite noun phrases in German newspaper text. During his subsequent work in Rovereto/Trento, Tübingen, and Heidelberg, he has worked on a number of topics including statistical parsing, coreference resolution, discourse relations, and distributional semantics, with particular attention to German.