Chemoinformatics is broadly a scientific discipline encompassing the design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemical information. It is distinct from other computational molecular modeling approaches in that it uses unique representations of chemical structures in the form of multiple chemical descriptors; has its own metrics for defining similarity and diversity of chemical compound libraries; and applies a wide array of statistical, data mining and machine learning techniques to very large collections of chemical compounds in order to establish robust relationships between chemical structure and its physical or biological properties. Chemoinformatics addresses a broad range of problems in chemistry and biology; however, the most commonly known applications of chemoinformatics approaches have been arguably in the area of drug discovery where chemoinformatics tools have played a central role in the analysis and interpretation of structure-property data collected by the means of modern high throughput screening. Early stages in modern drug discovery often involved screening small molecules for their effects on a selected protein target or a model of a biological pathway. In the past fifteen years, innovative technologies that enable rapid synthesis and high throughput screening of large libraries of compounds have been adopted in almost all major pharmaceutical and biotech companies. As a result, there has been a huge increase in the number of compounds available on a routine basis to quickly screen for novel drug candidates against new targets/pathways. In contrast, such technologies have rarely become available to the academic research community, thus limiting its ability to conduct large scale chemical genetics or chemical genomics research. However, the landscape of publicly available experimental data collection methods for chemoinformatics has changed dramatically in very recent years. The term ‘virtual screening’ is commonly associated with methodologies that rely on the explicit knowledge of three-dimensional structure of the target protein to identify potential bioactive compounds. Traditional docking protocols and scoring functions rely on explicitly defined three dimensional coordinates and standard definitions of atom types of both receptors and ligands. Albeit reasonably accurate in many cases, conventional structure based virtual screening approaches are relatively computationally inefficient, which has precluded them from screening really large compound collections. Significant progress has been achieved over many years of research in developing many structure based virtual screening approaches. This book is the first monograph that summarizes innovative applications of efficient chemoinformatics approaches towards the goal of screening large chemical libraries. The focus on virtual screening expands chemoinformatics beyond its traditional boundaries as a synthetic and data-analytical area of research towards its recognition as a predictive and decision support scientific discipline. The approaches discussed by the contributors to the monograph rely on chemoinformatics concepts such as: -representation of molecules using multiple descriptors of chemical structures -advanced chemical similarity calculations in multidimensional descriptor spaces -the use of advanced machine learning and data mining approaches for building quantitative and predictive structure activity models -the use of chemoinformatics methodologies for the analysis of drug-likeness and property prediction -the emerging trend on combining chemoinformatics and bioinformatics concepts in structure based drug discovery The chapters of the book are organized in a logical flow that a typical chemoinformatics project would follow – from structure representation and comparison to data analysis and model building to applications of structure-property relationship models for hit identification and chemical library design. It opens with the overview of modern methods of compounds library design, followed by a chapter devoted to molecular similarity analysis. Four sections describe virtual screening based on the using of molecular fragments, 2D pharmacophores and 3D pharmacophores. Application of fuzzy pharmacophores for libraries design is the subject of the next chapter followed by a chapter dealing with QSAR studies based on local molecular parameters. Probabilistic approaches based on 2D descriptors in assessment of biological activities are also described with an overview of the modern methods and software for ADME prediction. The book ends with a chapter describing the new approach of coding the receptor binding sites and their respective ligands in multidimensional chemical descriptor space that affords an interesting and efficient alternative to traditional docking and screening techniques. Ligand-based approaches, which are in the focus of this work, are more computationally efficient compared to structure-based virtual screening and there are very few books related to modern developments in this field. The focus on extending the experiences accumulated in traditional areas of chemoinformatics research such as Quantitative Structure Activity Relationships (QSAR) or chemical similarity searching towards virtual screening make the theme of this monograph essential reading for researchers in the area of computer-aided drug discovery. However, due to its generic data-analytical focus there will be a growing application of chemoinformatics approaches in multiple areas of chemical and biological research such as synthesis planning, nanotechnology, proteomics, physical and analytical chemistry and chemical genomics.
Cuprins
Preface; 1 – Fragment Descriptors in SAR/QSAR/QSPR studies, molecular similarity analysis and in virtual screening; Introduction; Historical survey; Main characteristics of Fragment Descriptors; Types of Fragments; Simple Fixed Types; WLN and SMILES Fragments; Atom-Centered Fragments; Bond-Centered Fragments; Maximum Common Substructures; Atom Pairs and Topological Multiplets; Substituents and Molecular Frameworks; Basic Subgraphs; Mined Subgraphs; Random Subgraphs; Library Subgraphs; Fragments describing supramolecular systems and chemical reactions; Storage of fragments’ information; Fragment’s Connectivity; Generic Graphs; Labeling Atoms; Application in Virtual Screening and In Silico Design; Filtering; Similarity Search; SAR Classification (Probabilistic) Models; QSAR/QSPR Regression Models; In Silico Design; Limitations of Fragment Descriptors; Conclusion; 2 – Topological Pharmacophores; Introduction; 3D pharmacophore models and descriptors; Topological pharmacophores; Topological pharmacophores from 2D-aligments; Topological pharmacophores from 2D pharmacophore fingerprints; Topological index-based ‘pharmacophores’?; Topological pharmacophores from 2D-aligments; Topological pharmacophores from pharmacophore fingerprints; Topological pharmacophore pair fingerprints; Topological pharmacophore triplets; Similarity searching with pharmacophore fingerprints – Technical Issues; Similarity searching with pharmacophore fingerprints – Some Examples; Machine-learning of Topological Pharmacophores from Fingerprints; Topological index-based ‘pharmacophores’?; Conclusions; 3 – Pharmacophore-based Virtual Screening in Drug Discovery; Introduction; Virtual Screening Methods; Chemical Feature-based Pharmacophores; The Term ‘3D Pharmacophore’; Feature Definitions and Pharmacophore Representation; Hydrogen bonding interactions; Lipophilic areas; Aromatic interactions; Charge-transfer interactions; Customization and definition of new features; Current super-positioning techniques for aligning 3D pharmacophores and molecules; Generation and Use of Pharmacophore Models; Ligand-based Pharmacophore Modeling; Structure-based Pharmacophore Modeling; Inclusion of Shape Information; Qualitative vs. Quantitative Pharmacophore Models; Validation of Models for Virtual Screening; Application of Pharmacophore Models in Virtual Screening; Pharmacophore Models as Part of a Multi-Step Screening Approach; Antitarget and ADME(T) Screening Using Pharmacophores; Pharmacophore Models for Activity Profiling and Parallel Virtual Screening; Pharmacophore Method Extensions and Comparisons to Other Virtual Screening Methods; Topological Fingerprints; Shape-based Virtual Screening; Docking Methods; Pharmacophore Constraints Used in Docking; Further Reading; Summary and Conclusion; 4 – Molecular Similarity Analysis in Virtual Screening; Ligand-Based Virtual Screening; Foundations of Molecular Similarity Analysis; Molecular Similarity and Chemical Spaces; Similarity Measures; Activity Landscapes; Analyzing the Nature of Structure-Activity Relationships; Relationships between different SARs; SARs and target-ligand interactions; Qualitative SAR characterization; Quantitative SAR characterization; Implications for molecular similarity analysis and virtual screening; Strengths and Limitations of Similarity Methods; Conclusion and Future Perspectives; 5 – Molecular Field Topology Analysis in drug design and virtual screening; Introduction: local molecular parameters in QSAR, drug design and virtual screening; Supergraph-based QSAR models; Rationale and history; Molecular Field Topology Analysis (MFTA); General principles; Local molecular descriptors: facets of ligand-biotarget interaction; Construction of molecular supergraph; Formation of descriptor matrix; Statistical analysis; Applicability control; From MFTA model to drug design and virtual screening; MFTA models in biotarget and drug action analysis; MFTA models in virtual screening; MFTA-based virtual screening of compound databases; MFTA-based virtual screening of generated structure libraries; Conclusion; 6 – Probabilistic approaches in activity prediction; Introduction; Biological Activity; Dose-Effect Relationships; Experimental Data; Probabilistic Ligand-Based Virtual Screening Methods; Preparation of Training Sets; Creation of Evaluation Sets; Mathematical Approaches; Evaluation of Prediction Accuracy; Single-Targeted vs. Multi-Targeted Virtual Screening; PASS Approach; Biological Activities Predicted by PASS; Chemical Structure Description in PASS; SAR Base; Algorithm of Activity Spectrum Estimation; Interpretation of Prediction Results; Selection of the Most Prospective Compounds; Conclusions; 7 – Fragment-based de novo design of druglike molecules; Introduction ;From Molecules to Fragments; From Fragments to Molecules; Scoring the Design; Conclusions and Outlook; 8 – Early ADME/T predictions: a toy or a tool?; Introduction; Which properties are important for early drug discovery?; Physico-chemical profiling; Lipophilicity; Solubility; Data availability and accuracy; Models; Why models don’t work: the challenge of the Applicability Domain; AD based on similarity in the descriptor space; AD based on similarity in the property-based space; How reliable are predictions of physico-chemical properties?; Available Data for ADME/T biological properties; Absorption; Data; Models; Distribution; Data; Models; The usefulness of ADME/T models is limited by available data; Conclusions; 9 – Compound Library Design – Principles and Applications; Introduction to Compound Library Design; Methods for Compound Library Design; Design for Specific Biological Activities; Similarity Guided Design of Targeted Libraries; Diversity Based Design of General Screening Libraries; Pharmacophore Guided Design of Focused Compound Libraries; QSAR Based Targeted Library Design; Protein Structure Based Methods for Compound Library Design; Design for Developability or Drug-likeness; Rule & Alert Based Approaches; QSAR Based ADMET Models; Undesirable Functionality Filters; Design for Multiple Objectives and Targets Simultaneously; Concluding Remarks; 10 – Integrated Chemo- and Bioinformatics Approaches to Virtual Screening; Introduction; Availability of large compound collections for virtual screening; NIH Molecular Libraries Roadmap Initiative and the Pub Chem database; Other chemical databases in public domain; Structure based virtual screening; Major methodologies; Challenges and limitations of current approaches; The implementation of cheminformatics concepts in structure based virtual screening; Predictive QSAR models as virtual screening tools; Critical Importance of model validation; Applicability domains and QSAR model acceptability criteria; Predictive QSAR modeling workflow; Examples of application; Structure based chemical descriptors of protein ligand interface: the En TESS method; Derivation of the En TESS descriptors; Validation of the En TESS descriptors for binding affinity prediction; Structure based cheminformatics approach to virtual screening: the Co Li BRI method; The representation of three-dimensional active sites in multidimensional chemistry space; The mapping between chemistry spaces of active sites and ligands; Summary and Conclusions
Despre autor
Alexandre Varnek is Professor in Theoretical Chemistry at the Louis Pasteur University (ULP) France, and Head of the Laboratory of Chemoinformatics, Director of Master Courses on Chemoinformatics at the Faculty of Chemistry, ULP. He has 30 years experience in the fields of molecular modelling and chemoinformatics and more than 80 publications including a monograph. His current research projects include the development of new approaches and software tools for in silico design of new compounds. Alexander Tropsha is Head of the Laboratory for Molecular Modeling, School of Pharmacy at the University of North Carolina, Chapel Hill, USA as well as Professor and Chair, Division of Medicinal Chemistry and Natural Products at the School of Pharmacy. His research interests include Computer-Aided Drug Design, Chemoinformatics, and Structural Bioinformatics. He has authored or co-authored over 110 peer-reviewed research papers and book chapters.