Learn how to apply rough-fuzzy computing techniques to solve problems in bioinformatics and medical image processing
Emphasizing applications in bioinformatics and medical image processing, this text offers a clear framework that enables readers to take advantage of the latest rough-fuzzy computing techniques to build working pattern recognition models. The authors explain step by step how to integrate rough sets with fuzzy sets in order to best manage the uncertainties in mining large data sets. Chapters are logically organized according to the major phases of pattern recognition systems development, making it easier to master such tasks as classification, clustering, and feature selection.
Rough-Fuzzy Pattern Recognition examines the important underlying theory as well as algorithms and applications, helping readers see the connections between theory and practice. The first chapter provides an introduction to pattern recognition and data mining, including the key challenges of working with high-dimensional, real-life data sets. Next, the authors explore such topics and issues as:
- Soft computing in pattern recognition and data mining
- A mathematical framework for generalized rough sets, incorporating the concept of fuzziness in defining the granules as well as the set
- Selection of non-redundant and relevant features of real-valued data sets
- Selection of the minimum set of basis strings with maximum information for amino acid sequence analysis
- Segmentation of brain MR images for visualization of human tissues
Numerous examples and case studies help readers better understand how pattern recognition models are developed and used in practice. This textcovering the latest findings as well as directions for future researchis recommended for both students and practitioners working in systems design, pattern recognition, image analysis, data mining, bioinformatics, soft computing, and computational intelligence.
Mục lục
Foreword xiii
Preface xv
About the Authors xix
1 Introduction to Pattern Recognition and Data Mining 1
1.1 Introduction 1
1.2 Pattern Recognition 3
1.2.1 Data Acquisition 4
1.2.2 Feature Selection 4
1.2.3 Classification and Clustering 5
1.3 Data Mining 6
1.3.1 Tasks, Tools, and Applications 7
1.3.2 Pattern Recognition Perspective 8
1.4 Relevance of Soft Computing 9
1.5 Scope and Organization of the Book 10
References 14
2 Rough-Fuzzy Hybridization and Granular Computing 21
2.1 Introduction 21
2.2 Fuzzy Sets 22
2.3 Rough Sets 23
2.4 Emergence of Rough-Fuzzy Computing 26
2.4.1 Granular Computing 26
2.4.2 Computational Theory of Perception and f -Granulation 26
2.4.3 Rough-Fuzzy Computing 28
2.5 Generalized Rough Sets 29
2.6 Entropy Measures 30
2.7 Conclusion and Discussion 36
References 37
3 Rough-Fuzzy Clustering: Generalized c-Means Algorithm 47
3.1 Introduction 47
3.2 Existing c-Means Algorithms 49
3.2.1 Hard c-Means 49
3.2.2 Fuzzy c-Means 50
3.2.3 Possibilistic c-Means 51
3.2.4 Rough c-Means 52
3.3 Rough-Fuzzy-Possibilistic c-Means 53
3.3.1 Objective Function 54
3.3.2 Cluster Prototypes 55
3.3.3 Fundamental Properties 56
3.3.4 Convergence Condition 57
3.3.5 Details of the Algorithm 59
3.3.6 Selection of Parameters 60
3.4 Generalization of Existing c-Means Algorithms 61
3.4.1 RFCM: Rough-Fuzzy c-Means 61
3.4.2 RPCM: Rough-Possibilistic c-Means 62
3.4.3 RCM: Rough c-Means 63
3.4.4 FPCM: Fuzzy-Possibilistic c-Means 64
3.4.5 FCM: Fuzzy c-Means 64
3.4.6 PCM: Possibilistic c-Means 64
3.4.7 HCM: Hard c-Means 65
3.5 Quantitative Indices for Rough-Fuzzy Clustering 65
3.5.1 Average Accuracy, α Index 65
3.5.2 Average Roughness, ϱ Index 67
3.5.3 Accuracy of Approximation, α⋆ Index 67
3.5.4 Quality of Approximation, γ Index 68
3.6 Performance Analysis 68
3.6.1 Quantitative Indices 68
3.6.2 Synthetic Data Set: X32 69
3.6.3 Benchmark Data Sets 70
3.7 Conclusion and Discussion 80
References 81
4 Rough-Fuzzy Granulation and Pattern Classification 85
4.1 Introduction 85
4.2 Pattern Classification Model 87
4.2.1 Class-Dependent Fuzzy Granulation 88
4.2.2 Rough-Set-Based Feature Selection 90
4.3 Quantitative Measures 95
4.3.1 Dispersion Measure 95
4.3.2 Classification Accuracy, Precision, and Recall 96
4.3.3 κ Coefficient 96
4.3.4 β Index 97
4.4 Description of Data Sets 97
4.4.1 Completely Labeled Data Sets 98
4.4.2 Partially Labeled Data Sets 99
4.5 Experimental Results 100
4.5.1 Statistical Significance Test 102
4.5.2 Class Prediction Methods 103
4.5.3 Performance on Completely Labeled Data 103
4.5.4 Performance on Partially Labeled Data 110
4.6 Conclusion and Discussion 112
References 114
5 Fuzzy-Rough Feature Selection using f -Information Measures 117
5.1 Introduction 117
5.2 Fuzzy-Rough Sets 120
5.3 Information Measure on Fuzzy Approximation Spaces 121
5.3.1 Fuzzy Equivalence Partition Matrix and Entropy 121
5.3.2 Mutual Information 123
5.4 f -Information and Fuzzy Approximation Spaces 125
5.4.1 V -Information 125
5.4.2 Iα-Information 126
5.4.3 Mα-Information 127
5.4.4 χα-Information 127
5.4.5 Hellinger Integral 128
5.4.6 Renyi Distance 128
5.5 f -Information for Feature Selection 129
5.5.1 Feature Selection Using f -Information 129
5.5.2 Computational Complexity 130
5.5.3 Fuzzy Equivalence Classes 131
5.6 Quantitative Measures 133
5.6.1 Fuzzy-Rough-Set-Based Quantitative Indices 133
5.6.2 Existing Feature Evaluation Indices 133
5.7 Experimental Results 135
5.7.1 Description of Data Sets 136
5.7.2 Illustrative Example 137
5.7.3 Effectiveness of the FEPM-Based Method 138
5.7.4 Optimum Value of Weight Parameter β 141
5.7.5 Optimum Value of Multiplicative Parameter η 141
5.7.6 Performance of Different f -Information Measures 145
5.7.7 Comparative Performance of Different Algorithms 152
5.8 Conclusion and Discussion 156
References 156
6 Rough Fuzzy c-Medoids and Amino Acid Sequence Analysis 161
6.1 Introduction 161
6.2 Bio-Basis Function and String Selection Methods 164
6.2.1 Bio-Basis Function 164
6.2.2 Selection of Bio-Basis Strings Using Mutual Information 166
6.2.3 Selection of Bio-Basis Strings Using Fisher Ratio 167
6.3 Fuzzy-Possibilistic c-Medoids Algorithm 168
6.3.1 Hard c-Medoids 168
6.3.2 Fuzzy c-Medoids 169
6.3.3 Possibilistic c-Medoids 170
6.3.4 Fuzzy-Possibilistic c-Medoids 171
6.4 Rough-Fuzzy c-Medoids Algorithm 172
6.4.1 Rough c-Medoids 172
6.4.2 Rough-Fuzzy c-Medoids 174
6.5 Relational Clustering for Bio-Basis String Selection 176
6.6 Quantitative Measures 178
6.6.1 Using Homology Alignment Score 178
6.6.2 Using Mutual Information 179
6.7 Experimental Results 181
6.7.1 Description of Data Sets 181
6.7.2 Illustrative Example 183
6.7.3 Performance Analysis 184
6.8 Conclusion and Discussion 196
References 196
7 Clustering Functionally Similar Genes from Microarray Data 201
7.1 Introduction 201
7.2 Clustering Gene Expression Data 203
7.2.1 k-Means Algorithm 203
7.2.2 Self-Organizing Map 203
7.2.3 Hierarchical Clustering 204
7.2.4 Graph-Theoretical Approach 204
7.2.5 Model-Based Clustering 205
7.2.6 Density-Based Hierarchical Approach 206
7.2.7 Fuzzy Clustering 206
7.2.8 Rough-Fuzzy Clustering 206
7.3 Quantitative and Qualitative Analysis 207
7.3.1 Silhouette Index 207
7.3.2 Eisen and Cluster Profile Plots 207
7.3.3 Z Score 208
7.3.4 Gene-Ontology-Based Analysis 208
7.4 Description of Data Sets 209
7.4.1 Fifteen Yeast Data 209
7.4.2 Yeast Sporulation 211
7.4.3 Auble Data 211
7.4.4 Cho et al. Data 211
7.4.5 Reduced Cell Cycle Data 211
7.5 Experimental Results 212
7.5.1 Performance Analysis of Rough-Fuzzy c-Means 212
7.5.2 Comparative Analysis of Different c-Means 212
7.5.3 Biological Significance Analysis 215
7.5.4 Comparative Analysis of Different Algorithms 215
7.5.5 Performance Analysis of Rough-Fuzzy-Possibilistic c-Means 217
7.6 Conclusion and Discussion 217
References 220
8 Selection of Discriminative Genes from Microarray Data 225
8.1 Introduction 225
8.2 Evaluation Criteria for Gene Selection 227
8.2.1 Statistical Tests 228
8.2.2 Euclidean Distance 228
8.2.3 Pearson’s Correlation 229
8.2.4 Mutual Information 229
8.2.5 f -Information Measures 230
8.3 Approximation of Density Function 230
8.3.1 Discretization 231
8.3.2 Parzen Window Density Estimator 231
8.3.3 Fuzzy Equivalence Partition Matrix 233
8.4 Gene Selection using Information Measures 234
8.5 Experimental Results 235
8.5.1 Support Vector Machine 235
8.5.2 Gene Expression Data Sets 236
8.5.3 Performance Analysis of the FEPM 236
8.5.4 Comparative Performance Analysis 250
8.6 Conclusion and Discussion 250
References 252
9 Segmentation of Brain Magnetic Resonance Images 257
9.1 Introduction 257
9.2 Pixel Classification of Brain MR Images 259
9.2.1 Performance on Real Brain MR Images 260
9.2.2 Performance on Simulated Brain MR Images 263
9.3 Segmentation of Brain MR Images 264
9.3.1 Feature Extraction 265
9.3.2 Selection of Initial Prototypes 274
9.4 Experimental Results 277
9.4.1 Illustrative Example 277
9.4.2 Importance of Homogeneity and Edge Value 278
9.4.3 Importance of Discriminant Analysis-Based Initialization 279
9.4.4 Comparative Performance Analysis 280
9.5 Conclusion and Discussion 283
References 283
Index 287
Giới thiệu về tác giả
PRADIPTA MAJI, PHD, is Assistant Professor in the Machine Intelligence Unit of the Indian Statistical Institute. His research explores pattern recognition, bioinformatics, medical image processing, cellular automata, and soft computing.
SANKAR K. PAL, PHD, is Director and Distinguished Scientist of the Indian Statistical Institute. He is also a J. C. Bose Fellow of the Government of India. Dr. Pal founded both the Machine Intelligence Unit and the Center for Soft Computing Research at the Indian Statistical Institute. He is a Fellow of the IEEE, IAPR, IFSA, TWAS, and Indian National Science Academy.