IntelliJ Scala and Apache Spark – Well, Now You Know

Intellij Scala Spark
/ Categories: Spark Comments: no comments

IntelliJ Scala and Spark Setup Overview In this post, we’re going to review one way to setup IntelliJ for Scala and Spark development.  The IntelliJ Scala combination is the best, free setup for Scala and Spark development.  And I have nothing against ScalaIDE (Eclipse for Scala) or using editors such as Sublime.  I switched from

read more

Apache Spark, Cassandra and Game of Thrones

Spark Cassandra tutorial
/ Categories: Spark Comments: no comments

Apache Spark with Cassandra is a powerful combination in data processing pipelines.  In this post, we will build a Scala application with the Spark Cassandra combo and query battle data from Game of Thrones.  Now, we’re not going to make any show predictions!   But, we will show the most aggressive kings as well as

read more

Spark RDD – A Two Minute Guide for Beginners

spark rdd
/ Categories: Spark Comments: no comments

What is Spark RDD? Spark RDD is short for Apache Spark Resilient Distributed Dataset.  A Spark Resilient Distributed Dataset is often shortened to simply RDD.  RDDs are a foundational component of the Apache Spark large scale data processing framework. Spark RDDs are an immutable, fault-tolerant, and possibly distributed collection of data elements.  RDDs may be

read more

Apache Spark Advanced Cluster Deploy Troubleshooting

spark cluster deploy troubleshooting
/ Categories: Spark Comments: no comments

In this Apache Spark example tutorial, we’ll review a few options when your Scala Spark code does not deploy as anticipated.  For example, does your Spark driver program rely on a 3rd party jar only compatible with Scala 2.11, but your Spark Cluster is based on Scala 2.10?  Maybe your code relies on a newer version

read more

Spark Scala with 3rd Party JARs Deploy to a Cluster

Spark Apache Cluster Deploy with 3rd Party Jars
/ Categories: Spark Comments: no comments

Overview In this Apache Spark cluster deploy tutorial, we’ll cover how to deploy Spark driver programs to a Spark cluster when the driver program utilizes third-party jars.  In this case, we’re going to use code examples from previous Spark SQL and Spark Streaming tutorials. At the end of this tutorial, there is a screencast of

read more