Spark RDD – A Two Minute Guide for Beginners

spark rdd
/ Categories: Spark Comments: no comments

What is Spark RDD? Spark RDD is short for Apache Spark Resilient Distributed Dataset.  A Spark Resilient Distributed Dataset is often shortened to simply RDD.  RDDs are a foundational component of the Apache Spark large scale data processing framework. Spark RDDs are an immutable, fault-tolerant, and possibly distributed collection of data elements.  RDDs may be

read more

Spark Scala with 3rd Party JARs Deploy to a Cluster

Spark Apache Cluster Deploy with 3rd Party Jars
/ Categories: Spark Comments: no comments

Overview In this Apache Spark cluster deploy tutorial, we’ll cover how to deploy Spark driver programs to a Spark cluster when the driver program utilizes third-party jars.  In this case, we’re going to use code examples from previous Spark SQL and Spark Streaming tutorials. At the end of this tutorial, there is a screencast of

read more