Community detection in social networks is an important but challenging problem. This book develops a new technique for finding communities that uses both structural similarity and attribute similarity simultaneously, weighting them in a principled way. The results outperform existing techniques across a wide range of measures, and so advance the state of the art in community detection. Many existing community detection techniques base similarity on either the structural connections among social-network users, or on the overlap among the attributes of each user. Either way loses useful information. There have been some attempts to use both structure and attribute similarity but success has been limited. We first build a large real-world dataset by crawling Instagram, producing a large set of user profiles. We then compute the similarity between pairs of users based on four qualitatively different profile properties: similarity of language used in posts, similarity of hashtags used (which requires extraction of content from them), similarity of images displayed (which requires extraction of what each image is ‘about’), and the explicit connections when one user follows another. These single modality similarities are converted into graphs. These graphs have a common node set (the users) but different sets a weighted edges. These graphs are then connected into a single larger graph by connecting the multiple nodes representing the same user by a clique, with edge weights derived from a lazy random walk view of the single graphs. This larger graph can then be embedded in a geometry using spectral techniques. In the embedding, distance corresponds to dissimilarity so geometric clustering techniques can be used to find communities. The resulting communities are evaluated using the entire range of current techniques, outperforming all of them. Topic modelling is also applied to clusters to show that they genuinely represent users with similar interests. This can form the basis for applications such as online marketing, or key influence selection.
Cuprins
Chapter 1: Introduction.- Chapter 2: Background.- Chapter 3: Building blocks.- Chapter 4: Social network data.- Chapter 5: Methodology.- Chapter 6: Results and validation.- Chapter 7: Conclusions.
Despre autor
Mosab ALfaqeeh is a doctoral graduate of the School of Computing at Queen’s. He works as a software developer.
David Skillicorn has worked extensively in adversarial data analytics, including the use of natural language processing and social network analysis. His work has applications in intelligence, policing, counterterrorism, and cybersecurity. He is the author of two hundred papers and several books, most recently ‘Cyberspace, Data Analytics, and Policing’ (Taylor and Francis).