buchspektrum Internet-Buchhandlung

Neuerscheinungen 2019

Stand: 2020-02-01
Schnellsuche
ISBN/Stichwort/Autor
Herderstraße 10
10625 Berlin
Tel.: 030 315 714 16
Fax 030 315 714 14
info@buchspektrum.de

Raju Kumar Mishra

PySpark SQL Recipes


With HiveQL, Dataframe and Graphframes
1st ed. 2019. xxiv, 323 S. 57 SW-Abb. 235 mm
Verlag/Jahr: SPRINGER, BERLIN; APRESS 2019
ISBN: 1-484-24334-X (148424334X)
Neue ISBN: 978-1-484-24334-3 (9781484243343)

Preis und Lieferzeit: Bitte klicken


Carry out data analysis with PySpark SQL, graphframes, and graph data processing using a problem-solution approach. This book provides solutions to problems related to dataframes, data manipulation summarization, and exploratory analysis. You will improve your skills in graph data analysis using graphframes and see how to optimize your PySpark SQL code.
PySpark SQL Recipes starts with recipes on creating dataframes from different types of data source, data aggregation and summarization, and exploratory data analysis using PySpark SQL. You´ll also discover how to solve problems in graph analysis using graphframes.
On completing this book, you´ll have ready-made code for all your PySpark SQL tasks, including creating dataframes using data from different file formats as well as from SQL or NoSQL databases.
What You Will Learn

Understand PySpark SQL and its advanced features

Use SQL and HiveQL with PySpark SQL

Work with structured streaming

Optimize PySpark SQL

Master graphframes and graph processing

Who This Book Is For Data scientists, Python programmers, and SQL programmers.
Chapter 1: Introduction to PySparkSQL
Chapter Goal: Reader will understand about PySpark, PySparkSQL , Catalyst Optimizer, Project Tungsten and Hive

No of pages 20-30

Sub -Topics

1. PySpark

2. PySparkSQL

3. Hive

4. Catalyst

5. Project Tungsten

Chapter 2: Some time with Installation
Chapter Goal: Learner will understand about installation of Spark, Hive, PostgreSQL, MySQL, MongoDB, Cassandra etc.

No of pages: 30 -40

Sub - Topics

1. Installation Spark

2. Installation Hive

3. Installation MySQL

4. Installation MongoDB

Chapter 3: IO in PySparkSQL
Chapter Goal: This chapter will provide recipes to the reader, which will enable them to create PySparkSQL DataFrame from different sources.

No of pages : 40-50

Sub - Topics:

1. Creating DataFrame from data.

2. Reading csv file to create Dataframe

3. Reading JSON file to create Dataframe.

4. Saving DataFrames to different formats.

Chapter 4 : Operations on PySparkSQL DataFrames
Chapter Goal: Reader will learn about data filtering, data manuipulation, data descriptive analysis , Dealing with missing value etc

No Of Pages ; 40 -50

1. Data filtering

2. Data manipulation

3. Row and column manipulation

Chapter 5 : Data Merging and Data Aggregation using PySparkSQL
Chapter Goal: Reader will learn about data merging and aggregation using PySparkSQL

1. Data Merging

2. Data aggregation

Chapter 6: SQL, NoSQL and PySparkSQL
Chapter Goal: Reader will learn to run SQL and HiveQL queries on Dataframe

No of pages : 30-40

Sub - Topics:

1. Running SQL on DataFrame

2. Running HiveQL

Chapter 7: Structured Streaming
Chapter Goal: Reader will understand about structured streaming

No of pages : 30-40

1. Different type of modes.

2. Data aggregation in structured streaming

3. Different type of sources

Chapter 8 : Optimizing PySparkSQL
Chapter Goal: Reader will learn about optimizing PySparkSQL

No Of pages : 20-30

Optimizing PySparkSQL

Chapter 9 : GraphFrames
Chapter Goal: Reader will understand about graph data analysis with Graphframes.

No of pages : 30-40

1. GraphFrame Creation

1. Page Rank

2. Breadth First Search