Apache Spark 2: Data Processing and Real-Time Analytics

Romeo Kienzler & Md. Rezaul Karim

Romeo Kienzler & Md. Rezaul Karim
Apache Spark 2: Data Processing and Real-Time Analytics [EPUB ebook]
Master complex big data processing, stream analytics, and machine learning with Apache Spark

Supporto

Copertina di Romeo Kienzler & Md. Rezaul Karim: Apache Spark 2: Data Processing and Real-Time Analytics (ePUB)

Build efficient data flow and machine learning programs with this flexible, multi-functional open-source cluster-computing framework

Key Features

Master the art of real-time big data processing and machine learning

Explore a wide range of use-cases to analyze large data

Discover ways to optimize your work by using many features of Spark 2.x and Scala

Book Description

Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark’s functionality and building your own data flow and machine learning programs on this platform.

You will work with the different modules in Apache Spark, such as interactive querying with Spark SQL, using Data Frames and datasets, implementing streaming analytics with Spark Streaming, and applying machine learning and deep learning techniques on Spark using MLlib and various external tools.

By the end of this elaborately designed Learning Path, you will have all the knowledge you need to master Apache Spark, and build your own big data processing and analytics pipeline quickly and without any hassle.

This Learning Path includes content from the following Packt products:

Mastering Apache Spark 2.x by Romeo Kienzler

Scala and Spark for Big Data Analytics by Md. Rezaul Karim, Sridhar Alla

Apache Spark 2.x Machine Learning Cookbook by Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei Cookbook

What you will learn

Get to grips with all the features of Apache Spark 2.x

Perform highly optimized real-time big data processing

Use ML and DL techniques with Spark MLlib and third-party tools

Analyze structured and unstructured data using Spark SQL and Graph X

Understand tuning, debugging, and monitoring of big data applications

Build scalable and fault-tolerant streaming applications

Develop scalable recommendation engines

Who this book is for

If you are an intermediate-level Spark developer looking to master the advanced capabilities and use-cases of Apache Spark 2.x, this Learning Path is ideal for you. Big data professionals who want to learn how to integrate and use the features of Apache Spark and build a strong big data pipeline will also find this Learning Path useful. To grasp the concepts explained in this Learning Path, you must know the fundamentals of Apache Spark and Scala.

Romeo Kienzler works as the chief data scientist in the IBM Watson Io T worldwide team, helping clients to apply advanced machine learning at scale on their Io T sensor data. He holds a Master’s degree in computer science from the Swiss Federal Institute of Technology, Zurich, with a specialization in information systems, bioinformatics, and applied statistics. Md. Rezaul Karim is a Research Scientist at Fraunhofer FIT, Germany. He is also a Ph D candidate at RWTH Aachen University, Aachen, Germany. He has more than 8 years’ experience in the area of research and development with a solid understanding of algorithms and data structures in C, C++, Java, Scala, R, and Python. Sridhar Alla is a big data expert helping companies solve complex problems in distributed computing, large scale data science and analytics practice. He holds a bachelor’s in computer science from JNTU, India. He loves writing code in Python, Scala, and Java. He also has extensive hands-on knowledge of several Hadoop-based technologies, Tensor Flow, No SQL, Io T, and deep learning. Siamak Amirghodsi (Sammy) is interested in building advanced technical teams, executive management, Spark, Hadoop, big data analytics, AI, deep learning nets, Tensor Flow, cognitive models, swarm algorithms, real-time streaming systems, quantum computing, financial risk management, trading signal discovery, econometrics, long-term financial cycles, Io T, blockchain, probabilistic graphical models, cryptography, and NLP. Meenakshi Rajendran is experienced in the end-to-end delivery of data analytics and data science products for leading financial institutions. Meenakshi holds a master’s degree in business administration and is a certified PMP with over 13 years of experience in global software delivery environments. Her areas of research and interest are Apache Spark, cloud, regulatory data governance, machine learning, Cassandra, and managing global data teams at scale. Broderick Hall is a hands-on big data analytics expert and holds a master’s degree in computer science with 20 years of experience in designing and developing complex enterprise-wide software applications with real-time and regulatory requirements at a global scale. He is a deep learning early adopter and is currently working on a large-scale cloud-based data platform with deep learning net augmentation. Shuen Mei is a big data analytic platforms expert with 15+ years of experience in designing, building, and executing large-scale, enterprise-distributed financial systems with mission-critical low-latency requirements. He is certified in the Apache Spark, Cloudera Big Data platform, including Developer, Admin, and HBase. He is also a certified AWS solutions architect with emphasis on peta-byte range real-time data platform systems.

€41.99

Acquista questo ebook e ricevine 1 in più GRATIS!

Lingua Inglese ● Formato EPUB ● Pagine 616 ● ISBN 9781789959918 ● Dimensione 24.9 MB ● Casa editrice Packt Publishing ● Città San Antonio ● Paese US ● Pubblicato 2018 ● Scaricabile 24 mesi ● Moneta EUR ● ID 6813091 ● Protezione dalla copia Adobe DRM

Richiede un lettore di ebook compatibile con DRM