Apache Spark 2.x for Java Developers by Sourav Gulati,Sumit Kumar

By Sourav Gulati,Sumit Kumar

Key Features

  • Perform mammoth facts processing with Spark—without having to profit Scala!
  • Use the Spark Java API to enforce effective enterprise-grade functions for information processing and analytics
  • Go past mainstream information processing by means of including querying power, laptop studying, and graph processing utilizing Spark

Book Description

Apache Spark is the buzzword within the large info instantly, particularly with the expanding desire for real-time streaming and information processing. whereas Spark is equipped on Scala, the Spark Java API exposes the entire Spark beneficial properties on hand within the Scala model for Java builders. This booklet will convey you ways you could enforce quite a few functionalities of the Apache Spark framework in Java, with no stepping from your convenience zone.

The ebook begins with an advent to the Apache Spark 2.x environment, by way of explaining find out how to set up and configure Spark, and refreshes the Java strategies that would be beneficial to you whilst eating Apache Spark's APIs. you'll discover RDD and its linked universal motion and Transformation Java APIs, manage a production-like clustered surroundings, and paintings with Spark SQL. relocating on, you'll practice near-real-time processing with Spark streaming, computer studying analytics with Spark MLlib, and graph processing with GraphX, all utilizing a variety of Java packages.

By the top of the booklet, you may have an exceptional beginning in imposing elements within the Spark framework in Java to construct speedy, real-time applications.

What you are going to learn

  • Process facts utilizing various dossier codecs comparable to XML, JSON, CSV, and simple and delimited textual content, utilizing the Spark middle Library.
  • Perform analytics on info from a variety of info resources reminiscent of Kafka, and Flume utilizing Spark Streaming Library
  • Learn SQL schema construction and the research of dependent facts utilizing a number of SQL features together with Windowing capabilities within the Spark SQL Library
  • Explore Spark Mlib APIs whereas enforcing computer studying concepts to unravel real-world problems
  • Get to grasp Spark GraphX so that you comprehend a number of graph-based analytics that may be played with Spark

About the Author

Sourav Gulati is linked to software program for greater than 7 years. He begun his occupation with Unix/Linux and Java after which moved in the direction of immense info and NoSQL global. He has labored on numerous tremendous facts tasks. He has lately begun a technical weblog referred to as Technical studying to boot. except IT international, he likes to examine mythology.

Sumit Kumar is a developer with insights in telecom and banking. At varied junctures, he has labored as a Java and SQL developer, however it is shell scripting that he unearths either tough and pleasant while. at the moment, he promises colossal information tasks interested by batch/near-real-time analytics and the allotted listed querying process. in addition to IT, he's taking a willing curiosity in human and ecological issues.

Table of Contents

  1. Introduction to Spark
  2. Java for Spark
  3. Let's Spark
  4. Understanding Spark Programming model
  5. Working with facts & storage
  6. Spark on Cluster
  7. Spark Programming version - boost concepts
  8. Working with Spark SQL
  9. Near actual time processing with Spark Streaming
  10. Machine studying analytics with Spark MLlib
  11. Learning Spark GraphX

Show description

Read or Download Apache Spark 2.x for Java Developers PDF

Similar data modeling & design books

Database Pro

Study SQL Server 2012 expert database layout quickly. useful relational database layout teach-by-practical-diagrams-&-examples publication for builders, programmers, platforms analysts, IT managers and undertaking managers who're new to relational database and client/server applied sciences. additionally for database builders, database designers and database directors (DBA), who understand a few database layout, and who desire to refresh & extend their RDBMS layout know-how horizons.

Data Modeling Theory and Practice

Facts MODELING thought AND perform is for practitioners and lecturers who've realized the conventions and ideas of knowledge modeling and are trying to find a deeper realizing of the self-discipline. The insurance of thought incorporates a designated evaluation of the wide literature on facts modeling and logical database layout, referencing approximately 500 courses, with a robust specialise in their relevance to perform.

Programmieren in C (German Edition)

C ist eine der bedeutendsten und eine sehr häufig eingesetzte Programmiersprache. Die Autoren haben jahrelange Erfahrung mit dieser Programmiersprache und vermitteln Lesern das Wesentliche – die Programmiermethodik: used to be ist Programmieren? Wie werden programmtechnische Probleme gelöst? Schrittweise wird die Programmierung anhand der Sprache C erlernt und mit Beispielen und Aufgaben vertieft.

Conversations with the Future: 21 Visions for the 21st Century

For generations, humanity stared on the vastness of the oceans and questioned, “What if? ” at the present time, having explored the curves of the Earth, we now stare at never-ending stars and beauty, “What if? ” Our know-how has introduced us to the make-or-break second in human heritage. we will be able to both develop complacent, and pass extinct just like the dinosaurs, or unfold in the course of the cosmos, as Carl Sagan dreamed of.

Additional resources for Apache Spark 2.x for Java Developers

Sample text

Download PDF sample

Rated 4.59 of 5 – based on 35 votes