Online communities generate massive volumes of natural language data and the social sciences continue to learn how to best make use of this new information and the technology available for analyzing it.
Text Mining brings together a broad range of contemporary qualitative and quantitative methods to provide strategic and practical guidance on analyzing large text collections. This accessible book, written by a sociologist and a computer scientist, surveys the fast-changing landscape of data sources, programming languages, software packages, and methods of analysis available today. Suitable for novice and experienced researchers alike, the book will help readers use text mining techniques more efficiently and productively.
Cuprins
Part I: Digital Texts, Digital Social Science
1. Social Science and the Digital Text Revolution
Learning Objectives
Introduction
History of Text Analysis
Risk and Rewards of Text Mining for the Social Sciences
Social Data from Digital Environments
Theory and Metatheory
Ethics of Text Mining
Organization of This Volume
2. Research Design Strategies
Learning Objectives
Introduction
Levels of Analysis
Strategies for Document Selection and Sampling
Types of Inferential Logic
Approaches to Research Design
Part II: Text Mining Fundamentals
3. Web Crawling and Scraping
Learning Objectives
Introduction
Web Statistics
Web Crawling
Web Scraping
Software for Web Crawling and Scraping
4. Lexical Resources
Learning Objectives
Introduction
Word Net
Roget′s Thesaurus
Linguistic Inquiry and Word Count
General Inquirer
Wikipedia
Downloadable Lexical Resources and APIs
5. Basic Text Processing
Learning Objectives
Introduction
Tokenization
Stopword Removal
Stemming and Lemmatization
Text Statistics
Language Models
Other Text Processing
Software for Text Processing
6. Supervised Learning
Learning Objectives
Feature Representation and Weighting
Supervised Learning Algorithms
Evaluation of Supervised Learning
Software for Supervised Learning
Part III: Text Analysis Methods from the Humanities and Social Sciences
7. Thematic Analysis, QDAS, and Visualization
Learning Objectives
Thematic Analysis
Qualitative Data Analysis Software
Visualization Tools
8. Narrative Analysis
Learning Objectives
Introduction
Conceptual Foundations
Mixed Methods of Narrative Analysis
Automated Approaches to Narrative Analysis
Future Directions
Specialized Software for Narrative Analysis
9. Metaphor Analysis
Learning Objectives
Introduction
Theoretical Foundations
Qualitative Metaphor Analysis
Mixed Methods of Metaphor Analysis
Automated Metaphor Identification Methods
Software for Metaphor Analysis
Part IV: Text Mining Methods from Computer Science
10. Word and Text Relatedness
Learning Objectives
Introduction
Theoretical Foundations
Corpus-based and Knowledge-based Measures of Relatedness
Software and Datasets for Word and Text Relatedness
Further Reading
11. Text Classification
Learning Objectives
Introduction
Applications of Text Classification
Representing Texts for Supervised Text Classification
Text Classification Algorithms
Bootstrapping in Text Classifcation
Evaluation of Text Classification
Software and Datasets for Text Classification
12. Information Extraction
Learning Objectives
Introduction
Entity Extraction
Relation Extraction
Web Information Extraction
Template Filling
Software and Datasets for Information Extraction and Text Mining
13. Information Retrieval
Learning Objectives
Introduction
Theoretical Foundations
Components of an Information Retrieval System
Information Retrieval Models
The Vector-Space Model
Evaluation of Information Retrieval Models
Web-Based Information Retrieval
Software and Datasets for Information Retrieval
14. Sentiment Analysis
Learning Objectives
Introduction
Theoretical Foundations
Lexicons
Corpora
Tools
Future Directions
Software and Datasets for Word and Text Relatedness
15. Topic Models
Learning Objectives
Introduction
Digital Humanities
Political Science
Sociology
Software for Topic Modeling
V: Conclusions
16. Text Mining, Text Analysis, and the Future of Social Science
Introduction
Social and Computer Science Collaboration
Despre autor
Rada Mihalcea is a professor of computer science and engineering at the University of Michigan. Her research interests are in computational linguistics, with a focus on lexical semantics, multilingual natural language processing, and computational social sciences. She serves or has served on the editorial boards of the following journals: Computational Linguistics, Language Resources and Evaluation, Natural Language Engineering, Research on Language and Computation, IEEE Transactions on Affective Computing, and Transactions of the Association for Computational Linguistics. She was a general chair for the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL, 2015) and a program cochair for the Conference of the Association for Computational Linguistics (2011) and the Conference on Empirical Methods in Natural Language Processing (2009). She is the recipient of a National Science Foundation CAREER award (2008) and a Presidential Early Career Award for Scientists and Engineers (2009). In 2013, she was made an honorary citizen of her hometown of Cluj-Napoca, Romania.