Learning Spark: Lightning-Fast Data Analytics

As the most active open-source project in the big data community, Apache SparkTM has become the de-facto standard for big data processing and analytics. Spark’s ease of use, versatility, and speed has changed the way that teams solve data problems — and that’s fostered an ecosystem of technologies around it, including Delta Lake for reliable data lakes, MLflow for the machine learning lifecycle, and Koalas for bringing the pandas API to Spark.

We’re proud to share the complete text of O’Reilly’s new Learning Spark, 2nd Edition with you. It includes the latest updates on new features from the Apache Spark 3.0 release, to help you:

Learn the Python, SQL, Scala, or Java high-level APIs: DataFrames and Datasets
Inspect, tune, and debug your Spark operations with Spark configurations and Spark UI
Perform analytics on batch and streaming data using Structured Streaming
Build reliable data pipelines with open source Delta Lake and Spark
Develop machine learning pipelines with MLlib and productionize models using MLflow
Use Koalas, the open source pandas framework, and Spark for data transformation and feature engineering

Learn more about the latest developments around Spark, and the ecosystem around it with Delta Lake, MLflow, and Koalas, in this free ebook.

Learning Spark: Lightning-Fast Data Analytics

Categories

Tags