Open Access Databases and Datasets for Drug Discovery
Timely resource discussing the future of data-driven drug discovery and the growing number of open-source databases
With an overview of 90 freely accessible databases and datasets on all aspects of drug design, development, and discovery, Open Access Databases and Datasets for Drug Discovery is a comprehensive guide to the vast amount of “free data” available to today’s pharmaceutical researchers. The applicability of open-source data for drug discovery and development is analyzed, and their usefulness in comparison with commercially available tools is evaluated.
The most relevant databases for small molecules, drugs and druglike substances, ligand design, protein 3D structures (both experimental and calculated), and human drug targets are described in depth, including practical examples of how to access and work with the data. The first part is focused on databases for small molecules, followed by databases for macromolecular targets and diseases. The final part shows how to integrate various open-source tools into the academic and industrial drug discovery and development process.
Contributed to and edited by experts with long-time experience in the field, Open Access Databases and Datasets for Drug Discovery includes information on:
- An extensive listing of open access databases and datasets for computer-aided drug design
- Pub Chem as a chemical database for drug discovery, Drug Bank Online, and bioisosteric replacement for drug discovery supported by the Swiss Bioisostere database
- The Protein Data Bank (PDB) and macromolecular structure data supporting computer-aided drug design, and the SWISS-MODEL repository of 3D protein structures and models
- PDB-REDO in computational aided drug design (CADD), and using Pharos/TCRD for discovering druggable targets
Unmatched in scope and thoroughly reviewing small and large open data sources relevant for rational drug design, Open Access Databases and Datasets for Drug Discovery is an essential reference for medicinal and pharmaceutical chemists, and any scientists involved in the drug discovery and drug development.
Daftar Isi
Series Editors Preface xiii
Raimund Mannhold – A Personal Obituary from the Series Editors xvii
A Personal Foreword xxi
1 Open Access Databases and Datasets for Computer-Aided Drug Design. A Short List Used in the Molecular Modelling Group of the SIB 1
Antoine Daina, María José Ojeda-Montes, Maiia E. Bragina, Alessandro Cuozzo, Ute F. Röhrig, Marta A.S. Perez, and Vincent Zoete
References 30
Part I Small Molecules 39
2 Pub Chem: A Large-Scale Public Chemical Database for Drug Discovery 41
Sunghwan Kim and Evan E. Bolton
2.1 Introduction 41
2.2 Data Content and Organization 42
2.3 Tools and Services 45
2.3.1 Pub Chem Search 45
2.3.2 Summary Pages 48
2.3.3 Literature Knowledge Panel 49
2.3.4 2D and 3D Neighbors 50
2.3.5 Classification Browser 51
2.3.6 Identifier Exchange Service 52
2.3.7 Programmatic Access 52
2.3.8 Pub Chem FTP Site and Pub Chem RDF 53
2.4 Drug- and Lead-Likeness of Pub Chem Compounds 54
2.5 Bioactivity Data in Pub Chem 56
2.6 Comparison with Other Databases 57
2.7 Use of Pub Chem Data for Drug Discovery 58
2.8 Summary 59
Acknowledgments 60
References 60
3 Drug Bank Online: A How-to Guide 67
Christen M. Klinger, Jordan Cox, Denise So, Teira Stauth, Michael Wilson, Alex Wilson, and Craig Knox
3.1 Introduction 67
3.2 Drug Bank 68
3.2.1 Overview of Drug Bank 68
3.2.2 Drug Bank Datasets 69
3.2.2.1 Drug Cards: An Overview and Navigation Guide 70
3.2.2.2 Identification 70
3.2.2.3 Pharmacology 71
3.2.2.4 Categories 73
3.2.2.5 Properties 73
3.2.2.6 Targets, Enzymes, Carriers, and Transporters 73
3.2.2.7 References 77
3.3 Protocols 77
3.3.1 General Workflows 77
3.3.1.1 Using Drug Bank Online’s Search Functionality 77
3.3.1.2 Using Drug Bank Online’s Advanced Search Functionality 80
3.3.1.3 Browsing Drugs Using Drug Bank Online’s Drug Categories 83
3.3.2 Identifying Chemicals and Relevant Sequences 86
3.3.2.1 Searching Using Chemical Structure Search 86
3.3.2.2 Using Sequence Search to Find Similar Targets 89
3.3.3 Extracting Drug Bank Datasets for ml 93
3.4 Research Using Drug Bank 94
3.5 Discussion and Conclusions 95
References 96
4 Bioisosteric Replacement for Drug Discovery Supported by the Swiss Bioisostere Database 101
Antoine Daina, Alessandro Cuozzo, Marta A.S. Perez, and Vincent Zoete
4.1 Introduction 101
4.1.1 Concept of Isosterism and Bioisosterism 101
4.1.2 Classical vs. Non-classical Bioisostere and Further Molecular Replacements 102
4.1.3 Bioisosteric Replacement in Drug Discovery 105
4.2 Construction and Dissemination of Swiss Bioisostere 106
4.2.1 Intention and Requirements 106
4.2.2 Bioactivity Data 107
4.2.3 Nonsupervised Matched Molecular Pair Analysis 108
4.2.4 Database 108
4.2.5 Web Interface 109
4.3 Content of Swiss Bioisostere 111
4.3.1 Global Content 111
4.3.2 Biological and Chemical Contexts 112
4.3.3 Fragment Shape Diversity 113
4.4 Usage of Swiss Bioisostere 115
4.4.1 Website Usage 115
4.4.2 Most Frequent Requests 117
4.4.3 Examples Related to Drug Discovery 117
4.4.3.1 Use Cases 117
4.4.3.2 Replacing Unwanted Chemical Groups 118
4.4.3.3 Optimization of Passive Absorption and Blood–Brain Barrier Diffusion 122
4.4.3.4 Reduction of Flexibility 124
4.4.3.5 Reduction of Aromaticity/Escape from Flatland 128
4.5 Conclusive Remarks 133
Acknowledgment 133
References 133
Part II Macromolecular Targets and Diseases 139
5 The Protein Data Bank (PDB) and Macromolecular Structure Data Supporting Computer-Aided Drug Design 141
David Armstrong, John Berrisford, Preeti Choudhary, Lukas Pravda, James Tolchard, Mihaly Varadi, and Sameer Velankar
5.1 Introduction 141
5.2 Small Molecule Data in Protein Data Bank (PDB) Entries 142
5.2.1 What Data are in the PDB Archive? 142
5.2.2 Definition of Small Molecules in One Dep 145
5.3 Small Molecule Dictionaries 146
5.3.1 ww PDB Chemical Component Dictionary (CCD) 146
5.3.2 The Peptide Reference Dictionary 147
5.4 Additional Ligand Annotations in the PDB Archive 148
5.4.1 Linkage Information 148
5.4.2 Carbohydrates 149
5.5 Validation of Ligands in the Worldwide Protein Data Bank (ww PDB) 150
5.5.1 Various Criteria and Software Used for Validating Ligand in Validation Reports 150
5.5.2 Identification of Ligand of Interest (LOI) 151
5.5.3 Geometric and Conformational Validation 152
5.5.4 Ligand Fit to Experimental Electron Density Validation 152
5.5.5 Accessing ww PDB Validation Reports from PDBe Entry Pages 154
5.5.6 Other Planned Improvements to Enhance Ligand Validation 154
5.6 PDBe Tools for Ligand Analysis 155
5.6.1 Ligand Interactions 155
5.6.1.1 Classifying Ligand Interactions 155
5.6.1.2 Data Availability 156
5.6.2 Ligand Environment Component 156
5.6.3 Chemistry Process and FTP 158
5.6.4 PDBe Chem Pages 158
5.7 Ligand-Related Annotations in the PDBe-KB 158
5.7.1 Introduction to PDBe-KB 158
5.7.2 Data Access Mechanisms for Ligand-Related Annotations 160
5.7.3 Ligand-Related Annotations on the Aggregated Views of Proteins 162
5.8 Case Study: Using PDB Data to Support Drug Discovery 164
5.9 Conclusions and Outlook 165
5.9.1 Upcoming Features and Improvements 166
References 167
6 The SWISS-MODEL Repository of 3D Protein Structures and Models 175
Xavier Robin, Andrew Mark Waterhouse, Stefan Bienert, Gabriel Studer, Leila T. Alexander, Gerardo Tauriello, Torsten Schwede, and Joana Pereira
6.1 Introduction 175
6.2 SMR Database Content and Model Providers 176
6.2.1 PDB 177
6.2.2 Swiss-model 177
6.2.3 Alpha Fold Database 179
6.2.4 Model Archive 180
6.3 Protein Feature Annotation and Cross-References to Computational Resources 181
6.3.1 Structural Features, Ligands, and Oligomers 181
6.3.2 SWISS-MODEL associated tools 182
6.3.3 Web and API Access 183
6.4 Quality Estimates and Benchmarking 188
6.5 Binding Site Conformational States 189
6.6 SMR and Computer-Aided Structure-based Drug Design 190
6.7 Conclusion and Outlook 191
References 193
7 PDB-REDO in Computational-Aided Drug Design (CADD) 201
Ida de Vries, Anastassis Perrakis, and Robbie P. Joosten
7.1 History and Concepts 201
7.1.1 X-ray Structure Models 201
7.1.2 PDB-REDO Development 202
7.1.2.1 First Uniformity 203
7.1.2.2 Automatic Rebuilding of Protein Backbone and Side Chains 203
7.1.2.3 Automated Model Completion Approaches 204
7.1.2.4 Systematic Integration of Structural Knowledge 205
7.1.2.5 Overview of PDB-REDO Pipeline 205
7.2 Structure Improvements by PDB-REDO 206
7.2.1 Parametrization and Rebuilding Effects on Small Molecule Ligands 206
7.2.1.1 Re-refinement Improves Ligand Conformation 206
7.2.1.2 Side Chain Rebuilding Improves Ligand Binding Sites 207
7.2.1.3 Histidine Flip and Improved Ligand Parameterization 208
7.2.2 Building of Protein Loops and Ligands into Protein Structure Models 210
7.2.2.1 Loop Building Completes a Binding Site Region 210
7.2.2.2 Loop Building Results in Improved Binding Sites 211
7.2.2.3 Building new Compounds into Density 212
7.2.3 Nucleic Acid Improvements by PDB-REDO 213
7.2.4 Glycoprotein Structure Model Rebuilding 214
7.2.5 Metal Binding Sites 214
7.2.6 Limitations of the PDB-REDO Databank 216
7.3 Access the PDB-REDO Databank and Metadata 218
7.3.1 Downloading and Inspecting Individual PDB-REDO Entries 218
7.3.2 Data Available in PDB-REDO Entries 220
7.3.3 Usage of the Uniform and FAIR Validation Data 220
7.3.4 Creating Datasets from the PDB-REDO Databank 222
7.3.5 Submitting Structure Models to the PDB-REDO Pipeline 223
7.4 Conclusions 223
Acknowledgments and Funding 224
List of Abbreviations and Symbols 224
References 225
8 Pharos and TCRD: Informatics Tools for Illuminating Dark Targets 231
Keith J. Kelleher, Timothy K. Sheils, Stephen L. Mathias, Dac-Trung Nguyen, Vishal Siramshetty, Ajay Pillai, Jeremy J. Yang, Cristian G. Bologa, Jeremy S. Edwards, Tudor I. Oprea, and Ewy Mathé
8.1 Introduction 231
8.2 Methods 233
8.2.1 Data Organization 233
8.2.1.1 Target Alignment 234
8.2.1.2 Disease Alignment 234
8.2.1.3 Ligand Alignment 234
8.2.1.4 Data and UI Updates 235
8.2.2 Programmatic Access and Data Download 235
8.2.3 UI Organization 235
8.2.3.1 List Pages 236
8.2.3.2 Details Pages 236
8.2.3.3 Search 238
8.2.3.4 Tutorials 240
8.2.4 Analysis Methods Within Pharos 240
8.2.4.1 Searching for Ligands 240
8.2.4.2 Finding Targets by Amino Acid Sequence 241
8.2.4.3 Finding Targets with Similar Annotations 241
8.2.4.4 Finding Targets with Predicted Activity 241
8.2.4.5 Enrichment Scores for Filter Values 241
8.3 Use Cases 242
8.3.1 Hypothesizing the Role of a Dark Target 242
8.3.1.1 Primary Documentation 242
8.3.1.2 List Analysis 247
8.3.1.3 Downloading Data 251
8.3.1.4 Variations on this Use Case 251
8.3.2 Characterizing a Novel Chemical Compound 251
8.3.2.1 Finding Predicted Targets 252
8.3.2.2 Analyzing Similar Ligands 254
8.3.2.3 Ligand Details Pages 256
8.3.2.4 Variations on this Use Case 257
8.3.3 Investigating Diseases 260
8.4 Discussion 262
Funding 264
References 264
Part III Users’ Points of View 269
9 Mining for Bioactive Molecules in Open Databases 271
Guillem Macip, Júlia Mestres-Truyol, Pol Garcia-Segura, Bryan Saldivar-Espinoza, Santiago Garcia-Vallvé, and Gerard Pujadas
9.1 Introduction 271
9.2 Main Tools for Virtual Screening 272
9.2.1 ADMET and PAINS Filtering 272
9.2.2 Protein–Ligand Docking 274
9.2.3 Pharmacophore Search 275
9.2.4 Shape/Electrostatic Similarity 276
9.2.5 Protein-Structure Databases 277
9.2.6 The Protein Data Bank 278
9.2.7 The PDB-REDO Databank 278
9.2.8 The SWISS-MODEL Repository 279
9.2.9 The Alpha Fold Protein Structure Database 279
9.3 Validating Binding Site and Ligand Coordinates in Three-Dimensional Protein Complexes 280
9.4 Databases for Searching New Drugs 281
9.4.1 Coconut 281
9.4.2 GDBs 282
9.4.3 Zinc 20 282
9.5 Databases of Bioactive Molecules 282
9.5.1 The Binding DB Database 283
9.5.2 Pub Chem 283
9.5.3 Ch EMBL 284
9.6 Databases of Inactive/Decoy Molecules 285
9.6.1 Collecting Experimentally Inactive Compounds from Pub Chem 285
9.6.2 Collecting Presumed Inactive Compounds from Decoy Databases 285
9.6.3 Building Custom-Based Decoy Sets 286
9.7 Main Metrics for Evaluating the Success of a Virtual Screening 286
9.8 Concluding Remarks 288
References 289
10 Open Access Databases – An Industrial View 299
Michael Przewosny
10.1 Academic vs. Industrial Research 299
10.2 Scaffold-Hopping 310
10.3 Virtual-Screening 311
Abbreviations 312
References 313
Index 317
Tentang Penulis
Antoine Daina is a Senior Scientist at the Molecular Modelling Group of the SIB Swiss Institute of Bioinformatics in charge of methodological developments in the Swiss Drug Design program.
Michael Przewosny has over 20 years of experience in pharmaceutical research and drug discovery, having worked as laboratory manager for different pharmaceutical companies.
Vincent Zoete is a Group Leader at the Molecular Modelling Group of the SIB Swiss Institute of Bioinformatics and an Associate Professor at the University of Lausanne, Department of Oncology UNIL-CHUV, Ludwig Institute for Cancer Research.