The Springer Brief introduces Fas Tensor, a powerful parallel data programming model developed for big data applications. This book also provides a user’s guide for installing and using Fas Tensor. Fas Tensor enables users to easily express many data analysis operations, which may come from neural networks, scientific computing, or queries from traditional database management systems (DBMS). Fas Tensor frees users from all underlying and tedious data management tasks, such as data partitioning, communication, and parallel execution.
This Springer Brief gives a high-level overview of the state-of-the-art in parallel data programming model and a motivation for the design of Fas Tensor. It illustrates the Fas Tensor application programming interface (API) with an abundance of examples and two real use cases from cutting edge scientific applications. Fas Tensor can achieve multiple orders of magnitude speedup over Spark and other peer systems in executing big data analysis operations. Fas Tensor makes programming for data analysis operations at large scale on supercomputers as productively and efficiently as possible. A complete reference of Fas Tensor includes its theoretical foundations, C++ implementation, and usage in applications.
Scientists in domains such as physical and geosciences, who analyze large amounts of data will want to purchase this Springer Brief. Data engineers who design and develop data analysis software and data scientists, and who use Spark or Tensor Flow to perform data analyses, such as training a deep neural network will also find this Springer Brief useful as a reference tool.
Table des matières
1. Introduction.- 1.1 Lessons from Big Data Systems.- 1.2 Data Model.- 1. 3 Programming Model High-Performance Data Analysis for Science.- 2. Fas Tensor Programming Model.- 2.1 Introduction to Tensor Data Model.- 2.2 Fas Tensor Programming Model.- 2.2.1 Stencils.- 2.2.2 Chunks.- 2.2.3 Overlap.- 2.2.4 Operator: Transform.- 2.2.5 Fas Tensor Execution Engine.- 2.2.6 Fas Tensor Scientific Computing Use Cases.- 2.3 Summary.- Illustrated Fas Tensor User Interface.- 3.1 An Example.- 3.2 The Stencil Class.- 3.2.1 Constructors of the Stencil.- 3.2.2 Parenthesis operator () and Read Point.- 3.2.3 Set Shape and Get Shape.- 3.2.4 Set Value and Get Value.- 3.2.5 Read Neighbors and Write Neighbors.- 3.2.6 Get Offset Upper and Get Offset Lower.- 3.2.7 Get Chunk ID.- 3.2.8 Get Global Index and Get Local Index.- 3.2.9 Exercise of the Stencil class.- 3.3 The Array Class.- 3.3.1 Constructors of Array.- 3.3.2 Set Chunk Size, Set Chunk Size By Mem, Set Chunk Size By Dim, and Get Chunk Size.- 3.3.3 Set Overlap Size, Set Overlap Size By Detection, Get Overlap Size, Set Overlap Padding, and Sync Overlap.- 3.3.4 Transform.- 3.3.5 Set Stride and Get Stride.- 3.3.6 Append Attribute, Insert Attribute, Get Attribute and Erase Attribute.- 3.3.7 Set Endpoint and Get Endpoint.- 3.3.8 Control Endpoint.- 3.3.9.- Read Array and Write Array.- 3.3.10 Set Tag and Get Tag.- 3.3.11 Get Array Size and Set Array Size.- 3.3.12 Backup and Restore.- 3.3.13 Create Vis File.- 3.3.14 Report Cost.- 3.3.15 EP_DIR Endpoint.- 3.3.16 EP_HDF5 and Other Endpoints.- Other Functions in Fas Tensor.- 3.4.1 FT_Init.- 3.4.2 FT_Finalize.- 3.4.3 Data types in Fas Tensor.- 4. Fas Tensor in Real Scientific Applications.- 4.1 DAS: Distributed Acoustic Sensing.- 4.2 VPIC: Vector Particle-In-Cell.- Appendix.- A.1 Installation Guide of Fas Tensor.- A.2 How to Develop a New Endpoint Protocol.- Alphabetical Index.- Bibliography.- References.
A propos de l’auteur
Dr. Bin Dong is a Research Scientist in Lawrence Berkeley National Laboratory in Berkeley, California, USA. Bin has the Ph.D degree in computing science and technology. Bin has wide research interests in big scientific data analysis, parallel computing, parallel I/O, machine learning, etc. He has co-authored more than 62 technical publications.
Dr. Kesheng Wu is a Senior Scientist at Lawrence Berkeley National Laboratory. He works extensively on data management, data analysis, and scientific computing. He is the developer of a number of widely used algorithms including Fast Bit bitmap indexes for querying large scientific datasets, Thick-Restart Lanczos (TRLan) algorithm for solving eigenvalue problems, and IDEALEM for statistical data reduction and feature extraction. He has co-authored more than 200 technical publications.
Dr. Suren Byna is a Computer Scientist in the Scientific Data Management (SDM) Group at Lawrence Berkeley National Laboratory in Berkeley, California, USA. His research interests are in scalable scientific data management. More specifically, he works on optimizing parallel I/O and on developing systems for managing scientific data. He leads the Exa IO project in the Exascale Computing Project (ECP) that contributes advanced I/O features to HDF5 and develops a new file system called Unify FS. He also leads efforts that develop object-centric data management systems (Proactive Data Containers – PDC) and experimental and observational data (EOD) management strategies. He has co-authored more than 150 technical publications.