This book brings a thorough explanation on the path needed to use cloud computing technologies to run High-Performance Computing (HPC) applications. Besides presenting the motivation behind moving HPC applications to the cloud, it covers both essential and advanced issues on this topic such as deploying HPC applications and infrastructures, designing cloud-friendly HPC applications, and optimizing a provisioned cloud infrastructure to run this family of applications. Additionally, this book also describes the best practices to maintain and keep running HPC applications in the cloud by employing fault tolerance techniques and avoiding resource wastage.
To give practical meaning to topics covered in this book, it brings some case studies where HPC applications, used in relevant scientific areas like Bioinformatics and Oil and Gas industry were moved to the cloud. Moreover, it also discusses how to train deep learning models in the cloud elucidating the key components andaspects necessary to train these models via different types of services offered by cloud providers.
Despite the vast bibliography about cloud computing and HPC, to the best of our knowledge, no existing manuscript has comprehensively covered these topics and discussed the steps, methods and strategies to execute HPC applications in clouds. Therefore, we believe this title is useful for IT professionals and students and researchers interested in cutting-edge technologies, concepts, and insights focusing on the use of cloud technologies to run HPC applications.
表中的内容
Chapter. 1. Why move HPC applications to the Cloud?.- Part. I. Foundations.- Chapter. 2. What is Cloud Computing?.- Chapter. 3. What do HPC applications look like?.- Part. II. Running HPC Applications in Cloud.- Chapter. 4. Deploying and Configuring Infrastructure.- Chapter. 5. Executing Traditional HPC Application Code in Cloud with Containerized Job Schedulers.- Chapter. 6. Designing Cloud-friendly HPC Applications.- Chapter. 7. Exploiting Hardware Accelerators in Clouds.- Part III. Cost and Performance Optimizations.- Chapter. 8. Optimizing Infrastructure for MPI Applications.- Chapter. 9. Harnessing Low-Cost Virtual Machines on the Spot.- Chapter. 10. Ensuring Application Continuity with Fault Tolerance Techniques.- Chapter. 11. Avoiding Resource Wastage.- Part. IV. Application Study Cases.- Chapter. 12. Biological Sequence Comparison on Cloud-based GPU Environment.- Chapter. 13. Oil & Gas Reservoir Simulation in the Cloud.- Chapter. 14. Cost effective deep learning on the cloud.- Appendix A. Deploying an HPC cluster on AWS.- Appendix B. Configuring a cloud-deployed HPC cluster.
关于作者
Edson Borin: Prof. Edson Borin is an associate professor at the Institute of Computing at the University of Campinas (Unicamp) and has been working there since 2010. Prior to joining Unicamp, he was a researcher at Intel Labs in California, where he developed dynamic compilation techniques to improve next-generation HW/SW co-designed microprocessors. He also used the microcode compression algorithms he had developed in his Ph D thesis to enhance the manufacturing process of Intel microprocessors, earning four divisional recognition awards. At Unicamp, Prof. Borin applies his expertise in modern computer architecture and compilers to optimize the performance and cost of scientific and engineering computing. He leads the Discovery laboratory, which is supported by government agencies such as Fapesp, CNPq and Capes, international technology companies like Intel, AMD, Samsung, Motorola, and Cadence/Tensilica, and major Brazilian corporations such as Petrobras. Several of his researchworks have been particularly geared towards optimizing the execution of seismic-processing and deep-learning applications on cloud infrastructure. In addition to his research contributions, Prof. Borin has authored eight patents, a technical book on assembly programming, and over 100 papers in international conferences and journals. He has supervised over 22 doctoral and master’s students, many of whom have received recognition for their exceptional theses, dissertations, and papers.
Lúcia Maria A. Drummond: Prof. Lucia Drummond obtained her D.Sc. in Systems Engineering and Computer Science from the Federal University of Rio de Janeiro, Brazil, in 1994, where she took part of the group which developed the first Brazilian parallel computer. She has been in the Department of Computer Science of the Fluminense Federal University (UFF) since 1989, where she is now Full Professor. She currently acts in undergraduate and graduate program, advising a number of master and doctoral students. She is a Level 1 Researcher at CNPq (a Brazilian Research Agency), possessing more than 100 publications in journals and proceedings of national and international conferences. Her research interests are parallel and distributed computing, including theory and applications. She has been invited to give talks in Université Paris-Sud, École de Mines, Université d’Avignon et des Pays du Vaucluse, Université Sorbonne, France, where she has also co-advised Ph.D. students.
Jean-Luc Gaudiot: Prof. Jean-Luc Gaudiot received the Diplôme d’Ingénieur from the École Supérieure d’Ingénieurs en Electronique et Electrotechnique, Paris, France in 1976 and the M.S. and Ph.D. degrees in Computer Science from the University of California, Los Angeles in 1977 and 1982, respectively. He is currently Distinguished Professor in the Electrical Engineering and Computer Science Department at the University of California, Irvine where he was department Chair from 2003 to 2009. Priorto joining UCI in January 2002, he was a Professor of Electrical Engineering at the University of Southern California since 1982, where he served as Director of the Computer Engineering Division for three years. He has also designed distributed microprocessor systems at Teledyne Controls, Santa Monica, California (1979-1980) and performed research in innovative architectures at the TRW Technology Research Center, El Segundo, California (1980-1982). He frequently acts as consultant to companies that design high-performance computer architectures and has served as an expert witness in patent infringement and product liability cases. His research interests include programmability of parallel systems, hardware computer security, and design of Autonomous Driving Systems. He has published nearly 300 journal and conference papers. His research has been sponsored by NSF, Do E, and DARPA, as well as a number of industrial organizations. From 2006 to 2009, he was the first Editor-in-Chief of the IEEE Computer Architecture Letters, a new publication of the IEEE Computer Society, which he helped found to the end of facilitating short, fast turnaround of fundamental ideas in the Computer Architecture domain. From 1999 to 2002, he was the Editor-in-Chief of the IEEE Transactions on Computers. In June 2001, he was elected chair of the IEEE Technical Committee on Computer Architecture and re-elected in June 2003 for a second two-year term. In 2009, he was elected to the Board of Governors of the IEEE Computer Society for a 3-year-term. He was the Chair of the IEEE Computer Society Publications Board Transactions Operations Committee (2010-2011), the Chair of the IEEE Computer Society Publications Board Magazines Operations Committee in 2012, the IEEE Computer Society vice President, Educational Activities Board in 2013, and 2014-2015 IEEE Computer Society vice President, Publications Board. He served as the 2017 IEEE Computer Society President. Dr. Gaudiot is a member of AAAS, ACM, and IEEE. He has also chaired the IFIP Working Group 10.3 (Concurrent Systems). He was co-General Chairman of the 1992 International Symposium on Computer Architecture, Program Committee Chairman of the 1993 IFIP Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, the 1993 IEEE Symposium on Parallel and Distributed Processing (Systems Track), the 1995 Parallel Architectures and Compilation Techniques Conference (PACT ‘95), the High Performance Computer Architecture conference in 1999 (HPCA-5), and the 2005 International Parallel and Distributed Processing Symposium. In 1999, he became a Fellow of the IEEE, “For Contributions to the Programmability and Reliability of Dataflow Architectures.” He was elevated to the rank of AAAS Fellow in 2007, “For Distinguished Contributions to the Design and Analysis of Highly Efficient Multiprocessor and Memory System Architectures.”
Alba Melo: Prof. Alba Cristina Magalhaes Alves de Melo obtained her Ph D degree in Computer Science from the Institut National Polytechnique de Grenoble (INPG), France, in 1996. In 2008, she did a postdoc at the University of Ottawa, Canada; in 2011, she was invited as Guest Scientist at Université Paris-Sud, France; and in 2013 she did a sabbatical at the Universitat Polytecnica de Catalunya, Spain. Since 1997, she works at the Department of Computer Science at the University of Brasilia (Un B), Brazil, where she is now a Full Professor. She is also a CNPq Research Fellow level 1D in Brazil. She was the Coordinator of the Graduate Program in Informatics at Un B for several years (2000-2002, 2004-2006, 2008, 2010, 2014) and she coordinated international collaboration projects with the Universitat Politecnica de Catalunya, Spain (2012, 2014-2016) and with the University of Ottawa, Canada (2012-2015). In 2016, she received the Brazilian Capes Award on “Advisor of the Best Ph D Thesis in Computer Science”. Her research interests are High Performance Computing, Bioinformatics and Cloud Computing. She advised 2 postdocs, 4 Ph D Thesis and 22 Ms C Dissertations. Currently, she advises 4 Ph D students and 2 Ms C students. She is Senior Member of the IEEE Society and Member of the Brazilian Computer Society. She gave invited talks at Universitat Karlshure, Germany, Université Paris-Sud, France, Universitat Polytecnica de Catalunya, Spain, University of Ottawa, Canada and at Universidad del Chile, Chile. She has currently 91 papers listed at DBLP.
Maicon Melo Alves: Dr. Maicon Melo Alves obtained his D.Sc. degree in Computer Science from the Fluminense Federal University (UFF), Brazil, in 2018, and received his M.Sc. degree in Computer Science from Rio de Janeiro Federal University (UFRJ), Brazil, in 2012. He received the best paper award (2015) and an honorable mention for his D.Sc. thesis (2019) in WSCAD, the foremost brazilian conference for the high-performance computing area. He has over 25 years of experience in IT infrastructure and, since 2006, he acts as system analyst at Petrobras, the brazilian oil and gas state company, working with high performance computing systems used to execute geoscience applications. He had joined, in 2021, the executive committee of the Regional Commission for High Performance Computing of the State of Rio de Janeiro and completed the MBA in in Data Science of Pontifícia Universidade Católica of Rio de Janeiro (PUC-RIO). He possesses two published books and publications in international journals and proceedings of national conferences. His research interests include high performance computing, parallel and distributed computing, cloud computing and artificial intelligence.
Philippe Olivier Alexandre Navaux: Prof. Philippe Olivier Alexander Navaux is a retired professor of the Informatics Institute from the Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil, since 1971. Graduated in Electronic Engineering, UFRGS, 1970, Brazil, master’s in applied physics, UFRGS, 1973, Brazil, Ph D in Computer Science, Grenoble National Institute of Technology (INPG), Grenoble, 1979, France. Professor of graduate and undergraduate courses on Computer Architecture – High Performance Computing. Leader of the GPPD, Parallel and Distributed Processing Group, with projects financed by government agencies Finep, CNPq, Capes, and international Cooperation with groups from France, Germany, Spain and USA, with funding from EU, CNPq and CAPES. Besides the cooperation projects with academic sector, he has conducted several research projects with private companies: Petrobras, Microsoft, Intel, HP, DELL, Altus and Itautec. Has oriented more than 100 Master and Ph D students and has published near 400 papers in journals and conferences. Member of the SBC, Brazilian Computer Society, SBPC, Brazilian Society for Scientific Progress, ACM, Association for Computing Machinery, and IEEE, Institute of Electrical and Electronics Engineers. Consultant to various national and international funding organizations Do E (USA), ANR (FR), FINEP, CNPq, CAPES, FAPESP, FAPERGS, FAPEMIG, FACEPE and others. He was member of the Superior Council from the FAPERGS (one Brazilian agency for supporting research) and from the CTC, Scientific and Technical Council, of the LNCC/MCT. He was coordinator of the Computing Area Committee from the Capes/MEC (Higher Education Personnel Training Coordination / Ministry of Education).