This book addresses the problems that are encountered, and solutions that have been proposed, when we aim to identify people and to reconstruct populations under conditions where information is scarce, ambiguous, fuzzy and sometimes erroneous.
The process from handwritten registers to a reconstructed digitized population consists of three major phases, reflected in the three main sections of this book. The first phase involves transcribing and digitizing the data while structuring the information in a meaningful and efficient way. In the second phase, records that refer to the same person or group of persons are identified by a process of linkage. In the third and final phase, the information on an individual is combined into a reconstruction of their life course.
The studies and examples in this book originate from a range of countries, each with its own cultural and administrative characteristics, and from medieval charters through historical censuses and vital registration, to the modern issue of privacy preservation. Despite the diverse places and times addressed, they all share the study of fundamental issues when it comes to model reasoning for population reconstruction and the possibilities and limitations of information technology to support this process.
It is thus not a single discipline that is involved in such an endeavor. Historians, social scientists, and linguists represent the humanities through their knowledge of the complexity of the past, the limitations of sources, and the possible interpretations of information. The availability of big data from digitized archives and the need for complex analyses to identify individuals calls for the involvement of computer scientists. With contributions from all these fields, often in direct cooperation, this book is at the heart of the digital humanities, and will hopefully offer a source of inspiration for future investigations.Tabella dei contenuti
Part I Data quality: cleaning and standardization.- 1 The Danish Demographic Database – principles and methods for cleaning and standardization of data.- 2 Dutch historical toponyms in the Semantic Web.- 3 Automatic methods for coding historical occupation descriptions to standard classifications.- 4 Learning name variants from inexact high-confidence matches.- Part II Record linkage and validation.- 5 Advanced record linkage methods and privacy aspects for population reconstruction – a survey and case studies.- 6 Reconstructing historical populations from genealogical data files.- 7 Multi-source entity resolution for genealogical data.- 8 Record linkage in the Historical Population Registry for Norway.- 9 Record linkage in Medieval and early modern texts.- Part III Life course reconstruction.- 10 Reconstructing lifespans through historical marriage records of Barcelona from the 16th and 17th centuries.- 11 Dancing with dirty data: Problems in the extraction of life-course evidence from historical censuses.- 12 Using the Canadian censuses of 1852 and 1881 for automatic data linkage: a case study of intergenerational social mobility.- 13 Introducing ‘movers’ into community reconstructions: linking civil registers of vital events to local and national census data: a Scottish experiment.- 14 Linking strategies for building a life course dataset from Australian convict records; Founders & Survivors: Australian Life Courses in Historical Context, 1803-1920.
Circa l’autore
Gerrit Bloothooft is researcher at the Utrecht Institute of Linguistics, The Netherlands. His research interests cover e Humanities in a wide range from language and speech technology, onomastics to historical record linkage. He is a fellow of two institutes of the Dutch Royal Academy of Sciences.
Peter Christen is associate professor in the Research School of Computer Science at the Australian National University in Canberra, Australia. His research interests are data mining, with a focus on data matching and privacy-preserving data sharing and mining, and he is the author of the Springer book `Data Matching – Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection’ (2012).Kees Mandemakers is senior research fellow at the International Institute of Social History directing the Historical Sample of the Netherlands. He holds the endowed chair for Large Historical Databases at the Erasmus School of History, Culture and Communication of the Erasmus University Rotterdam and is President of the International Commission for Historical Demography. His research interests are the methodology of large historical databases, social stratification and mobility and social history.
Marijn Schraagen is researcher at the Digital Humanities Lab, Utrecht University, The Netherlands. His research interests range from artificial intelligence and data mining to language technology and psycholinguistics. As an e Humanities researcher he is involved in record linkage and historical population reconstruction.