eagletapllc

Spark the Future: PySpark for Data Enthusiasts – January 6th Class

Wishlist Share
Share Course
Page Link
Share On Social Media

About Course

 

What you’ll learn

  • Apache Spark Foundation and Spark Architecture
  • Data Engineering and Data Processing in Spark
  • Working with Data Sources and Sinks
  • Working with Data Frames and Spark SQL
  • Using PyCharm IDE for Spark Development and Debugging
    ** Exclusive real practice problems using spark for interview

Course Content

Pyspark : why do we need it ?
PySpark: The Need for Scalable Data Processing in Big Data Environments

Setting Up PyCharm: A Step-by-Step Guide for Python Development
Lets setup the environment where we do our hand On!

Handling DataFrames with CSV Files
Reading, Writing, and Viewing of Dataframes in PySpark

Handling other file formates: Json and Parquet
Read, view and understand Json and Parquet File Formats

Handle Dataframe Structure : Guide to withColumn , withColumnRenamed and StructType Functions
Create schema using pyspark and explore dataframe columns

Exploring split(), array () and explode() functions of Pyspark

Comparing Pyspark in-built functions : orderBy() vs sort() ? distinct() vs dropDuplicates() ? filter vs where ? union() vs unionall() ?

Aggregating Functions of Pyspark : groupBy() and groupByAgg()

Joins in Pyspark : inner () , left() , which one to choose ?

Pivot function in Pyspark

UDF’s in Pyspark : Understanding and Implementation

coalesce vs Repartition : Data engineering concept

Window functions in Pyspark : rank() vs dense_rank() and row_number() with example

Pyspark : Data Engineering Interview Questions

Earn a certificate

Add this certificate to your resume to demonstrate your skills & increase your chances of getting noticed.

selected template

Student Ratings & Reviews

No Review Yet
No Review Yet

Want to receive push notifications for all major on-site activities?