A Statistical Approach to Genetic Epidemiology
After studying statistics and mathematics at the University of Munich and obtaining his doctoral degree from the University of Dortmund, Andreas Ziegler received the Johann-Peter-Süssmilch-Medal of the German Association for Medical Informatics, Biometry and Epidemiology for his post-doctoral work on “Model Free Linkage Analysis of Quantitative Traits” in 1999. In 2004, he was one of the recipients of the Fritz-Linder-Forum-Award from the German Association for Surgery.
Cuprins
Foreword to the First Edition vii
Foreword to the Second Edition viii
Preface xi
Acknowledgments xv
1 Molecular Genetics 1
1.1 Genetic information 2
1.1.1 Location of genetic information 2
1.1.2 Interpretation of genetic information 5
1.1.3 Translation of genetic information 5
1.2 Transmission of genetic information 7
1.3 Variations in genetic information 10
1.3.1 Individual differences in genetic information 10
1.3.2 Detection of variations 12
1.3.3 Probability for detection of variations 16
1.4 Problems 18
2 Formal Genetics 21
2.1 Mendel and his laws 22
2.2 Segregation patterns 23
2.2.1 Autosomal dominant inheritance 24
2.2.2 Autosomal recessive inheritance 25
2.2.3 X-chromosomal dominant inheritance 26
2.2.4 X-chromosomal recessive inheritance 27
2.2.5 Y-chromosomal inheritance 28
2.3 Complications of Mendelian segregation 28
2.3.1 Variable penetrance and expression 29
2.3.2 Age-dependent penetrance 31
2.3.3 Imprinting 33
2.3.4 Phenotypic and genotypic heterogeneity 35
2.3.5 Complex diseases 36
2.4 Hardy–Weinberg law 38
2.5 Problems 43
3 Genetic Markers 47
3.1 Properties of genetic markers 47
3.2 Types of genetic markers 52
3.2.1 Short tandem repeats (STRs) 52
3.2.2 Single nucleotide polymorphisms (SNPs) 54
3.3 Genotyping methods for SNPs 57
3.3.1 Restriction fragment length polymorphism analysis 58
3.3.2 Real-time polymerase chain reaction 58
3.3.3 Matrix assisted laser desorption/ionization time of flight genotyping 61
3.3.4 Chip-based genotyping 61
3.3.5 Choice of genotyping method 63
3.4 Problems 65
4 Data Quality 67
4.1 Pedigree errors 68
4.2 Genotyping errors in pedigrees 70
4.2.1 Frequency of genotyping errors 70
4.2.2 Reasons for genotyping errors 71
4.2.3 Mendel checks 72
4.2.4 Checks for double recombinants 74
4.3 Genotyping errors and Hardy–Weinberg equilibrium (HWE) 76
4.3.1 Causes of deviations from HWE 77
4.3.2 Tests for deviation from HWE for SNPs 78
4.3.3 Tests for deviation from HWE for STRs 81
4.3.4 Measures for deviation from HWE 83
4.3.5 Tests for compatibility with HWE for SNPs 86
4.4 Quality control in high-throughput studies 91
4.4.1 Sample quality control 94
4.4.2 SNP quality control 97
4.5 Cluster plot checks and internal validity 98
4.5.1 Cluster compactness measures 101
4.5.2 Cluster connectedness measures 101
4.5.3 Cluster separation measures 101
4.5.4 Genotype stability measures 102
4.5.5 Combinations of criteria 102
4.6 Problems 109
5 Genetic Map Distances 113
5.1 Physical distance 113
5.2 Map distance 114
5.2.1 Distance 114
5.2.2 Specific map functions 115
5.2.3 Correspondence between physical distance and map distance 116
5.2.4 Multilocus feasibility 117
5.3 Linkage disequilibrium distance 118
5.4 Problems 123
6 Family Studies 125
6.1 Family history method and family study method 127
6.2 Familial correlations and recurrence risks 129
6.2.1 Familial resemblance 129
6.2.2 Recurrence risk ratios 131
6.3 Heritability 134
6.3.1 The simple Falconer model 135
6.3.2 The general Falconer model 137
6.3.3 Kinship coefficient and Jacquard’s Δ7 coefficient 138
6.4 Twin and adoption studies 141
6.4.1 Twin studies 141
6.4.2 Adoption studies 142
6.5 Critique on investigating familial resemblance 143
6.6 Segregation analysis 144
6.7 Problems 154
7 Model-Based Linkage Analysis 155
7.1 Linkage analysis between two genetic markers 156
7.1.1 Linkage analysis in phase-known pedigrees 156
7.1.2 Linkage analysis in phase-unknown pedigrees 160
7.1.3 Linkage analysis in pedigrees with missing genotypes 161
7.2 Linkage analysis between a genetic marker and a disease 167
7.2.1 Linkage analysis between a genetic marker and a disease in phase-known pedigrees 168
7.2.2 Linkage analysis between a genetic marker and a disease in general cases 172
7.2.3 Gain in information by genotyping additional individuals; power calculations 177
7.3 Significance levels in linkage analysis 180
7.4 Problems 184
8 Model-Free Linkage Analysis 189
8.1 The principle of similarity 190
8.2 Mathematical foundation of affected sib-pair analysis 192
8.3 Common tests for affected sib-pair analysis 193
8.3.1 The maximum LOD score and the triangle test 194
8.3.2 Score- and Wald–type 1 degree of freedom tests 201
8.3.3 Affected sib-pair tests using alleles shared identical by state 206
8.4 Properties of affected sib-pair tests 206
8.5 Sample size and power calculations for affected sib-pair studies 207
8.5.1 Functional relation between identical by descent probabilities and recurrence risk ratios 207
8.5.2 Sample size and power calculations for the mean test using recurrence risk ratios 209
8.6 Extensions to multiple marker loci 212
8.7 Extension to large sibships 213
8.8 Extension to large pedigrees 214
8.9 Extensions of the affected sib-pair approach 216
8.9.1 Covariates in affected sib-pair analyses 216
8.9.2 Multiple disease loci in affected sib-pair analyses 216
8.9.3 Estimating the position of the disease locus in affected sib-pair analyses 217
8.9.4 Typing unaffected relatives in sib-pair analyses 217
8.10 Problems 218
9 Quantitative Traits 221
9.1 Quantitative versus qualitative traits 222
9.2 The Haseman–Elston method 223
9.2.1 The expected squared phenotypic difference at the trait locus 225
9.2.2 The expected squared phenotypic difference at the marker locus 227
9.3 Extensions of the Haseman–Elston method 229
9.3.1 Double squared trait difference 230
9.3.2 Extension to large sibships 230
9.3.3 Haseman–Elston revisited and the new Haseman–Elston method 231
9.3.4 Power and sample size calculations 234
9.4 Variance components models 237
9.4.1 The univariate variance components model 237
9.4.2 The multivariate variance components model 238
9.5 Random sib-pairs, extreme probands and extreme sib-pairs 240
9.6 Empirical determination of p-values 243
9.7 Problems 245
10 Fundamental Concepts of Association Analyses 247
10.1 Introduction to association 247
10.1.1 Principles of association 247
10.1.2 Study designs for association 249
10.2 Linkage disequilibrium 250
10.2.1 Allelic linkage disequilibrium 250
10.2.2 Genotypic linkage disequilibrium 255
10.2.3 Extent of linkage disequilibrium 259
10.3 Problems 262
11 Association Analysis in Unrelated Individuals 265
11.1 Selection of cases and controls 266
11.2 Tests, estimates, and a comparison 266
11.2.1 Association tests 267
11.2.2 Choice of a test in applications 272
11.2.3 Effect measures 274
11.2.4 Selection of the genetic model 280
11.2.5 Association tests for the X chromosome 287
11.3 Sample size calculation 289
11.4 Population stratification 291
11.4.1 Testing for population stratification 293
11.4.2 Structured association 294
11.4.3 Genomic control 295
11.4.4 Comparison of structured association and genomic control 297
11.4.5 Principal components analysis 297
11.5 Gene-gene and gene-environment interaction 299
11.5.1 Classical examples for gene-gene and gene-environment interaction 299
11.5.2 Coat color in the Labrador retriever 301
11.5.3 Concepts of interaction 303
11.5.4 Statistical testing of gene-environment interactions 307
11.5.5 Statistical testing of gene-gene interactions 311
11.5.6 Multifactor dimensionality reduction 315
11.6 Problems 316
12 Family-based Association Analysis 319
12.1 Haplotype relative risk 320
12.2 Transmission disequilibrium test (TDT) 322
12.3 Risk estimates for trio data 325
12.4 Sample size and power calculations for the TDT 327
12.5 Alternative test statistics 329
12.6 TDT for multiallelic markers 330
12.6.1 Test of single alleles 330
12.6.2 Global test statistics 331
12.7 TDT type tests for different family structures 333
12.7.1 TDT type tests for missing parental data 334
12.7.2 TDT type tests for sibship data 336
12.7.3 TDT type tests for extended pedigrees 341
12.8 Association analysis for quantitative traits 344
12.9 Problems 346
13 Haplotypes in Association Analyses 349
13.1 Reasons for studying haplotypes 350
13.2 Inference of haplotypes 351
13.2.1 Algorithms for haplotype assignment 352
13.2.2 Algorithms for estimating haplotype probabilities 353
13.3 Association tests using haplotypes 356
13.4 Haplotype blocks and tagging SNPs 359
13.4.1 Selection of markers by haplotypes or linkage disequilibrium 360
13.4.2 Evaluation of marker selection approaches 363
13.5 Problems 364
14 Genome-wide Association (GWA) Studies 367
14.1 Design options in GWA studies 369
14.2 Genotype imputation 370
14.2.1 Imputation algorithms 370
14.2.2 Quality of imputation 371
14.3 Statistical analysis of GWA studies 372
14.4 Multiple testing 374
14.4.1 Region-wide multiple testing adjustment by simulation 375
14.4.2 Genome-wide multiple testing adjustment by simulation 376
14.4.3 Multiple testing adjustment by effective number of tests 377
14.5 Analysis of accumulating GWA data 378
14.5.1 Multistage designs for GWA studies 378
14.5.2 Replication in GWA studies 379
14.5.3 Meta-analysis of GWA studies 380
14.6 Clinical impact of a GWA study 383
14.6.1 Evaluation of a genetic predictive test 383
14.6.2 Clinical validity of a single genetic marker 385
14.6.3 Clinical validity of multiple genetic markers 386
14.7 Outlook 389
14.8 Problems 391
Appendix
Algorithms Used in Linkage Analyses 393
A.1 The Elston–Stewart algorithm 394
A.1.1 The fundamental ideas of the Elston–Stewart algorithm 394
A.1.2 The Elston–Stewart algorithm for a trait and a linked marker locus 400
A.2 The Lander–Green algorithm 401
A.2.1 The inheritance vector at a single genetic marker 401
A.2.2 The inheritance distribution given all genetic markers 405
A.3 The Cardon–Fulker algorithm 412
A.4 Problem 414
Solutions 415
References 451
Index 489
Despre autor
Andreas Ziegler is head of the Institute for Medical Biometry and Statistics at the University Clinic Schleswig-Holstein in Lubeck, an acknowledged center of excellence for genetic epidemiological methods. Currently he is President of the German Region of the International Biometric Society.
Inke R. Konig studied psychology at the universities of Marburg (Germany) as a scholar of the German National Academic Foundation and Dundee (Scotland) with a grant from the German Academic Exchange Service (DAAD). She has done research work at the Institute of Medical Biometry and Epidemiology in Marburg and since 2001 at the Institute of Medical Biometry and Statistics in Lubeck. In 2004, she became vice director of the latter and also received the Fritz-Linder-Forum-Award from the German Association for Surgery. Besides holding the certificate “Biometrics in Medicine”, she has collected teaching experience since 1998 as a lecturer for biomathematics, behavioural genetics, clinical epidemiology, genetic epidemiology, and evidence-based medicine.
Friedrich Pahlke is Dipl. Inf. at the Institute for Medical Biometry and Statistics at the University Clinic Schleswig-Holstein in Lubeck. He has created the e-learning course which is optionally available with the book.