This volume provides an overview of the field of Astrostatistics understood as the sub-discipline dedicated to the statistical analysis of astronomical data. It presents examples of the application of the various methodologies now available to current open issues in astronomical research. The technical aspects related to the scientific analysis of the upcoming petabyte-scale databases are emphasized given the importance that scalable Knowledge Discovery techniques will have for the full exploitation of these databases.
Based on the 2011 Astrostatistics and Data Mining in Large Astronomical Databases conference and school, this volume gathers examples of the work by leading authors in the areas of Astrophysics and Statistics, including a significant contribution from the various teams that prepared for the processing and analysis of the Gaia data.
Innehållsförteckning
??? ’Science with Gaia: how will we deal with a complex billion-source catalogue and data archive?’ by Anthony Brown (Leiden University, Netherlads).- ’Recent Advances in cosmological Bayesian model comparison’ by Roberto Trotta (University College London, UK).- ’The Art of Data Science’ by Matthew Graham (Center for Advanced Computing Research, California Institute of Technology, USA).- ’Astronomical Surveys: from SDSS to LSST’ by Robert Lupton (Princeton University, USA).- ’Exoplanet demography, quasar target selection, and probabilistic redshift estimation: Hierarchical models for density estimation, classification, and regression.’ by David Hogg (New York University, USA).- ’Learning to disentangle Exoplanet signals from correlated noise’ by Suzanne Aigrain (Oxford University, UK).- Astroinformatics and data mining: how to cope with the data tsunami’ by Giuseppe Longo (Federico II University, Italy).- Advanced statistical techniques for the processing of astronomical data: time series, images, low number statistics for high energy photons, heteroskedastic data, non-detections.- Challenges in the data mining of astronomical databases: the class imbalance in training sets or how to define prior robust preprocessing for supervised/unsupervised classification robust inference with heterogeneous datasets, how to combine observations, models, priors, etc in a training/test set error propagation.- The challenge of petabyte size databases: scalability, parallel computing, accuracy.- Geometric data organization, sky indexing for efficient data retrieval, intelligent access to petabyte size databases.- Knowledge Discovery in astronomical archives: outlier detection, new object types, parametric inference, model fitting and model selection, etc.- Combining the classical domain knowledge approach with machine learning techniques.- Global approaches for global datasets. The Galaxy zoo and the Universe zoo.- The Virtual Observatories, Data Mining and Astrostatistics: software, standards, protocols.