Spark Broadcast and Accumulator Examples in Scala

Spark Shared Variables Broadcast and Accumulators
/ Categories: Spark Comments: no comments

Spark Broadcast and Accumulator Overview So far, we’ve learned about distributing processing tasks across a Spark cluster.  But, let’s go a bit deeper in a couple of approaches you may need when designing distributed tasks.  I’d like to start with a question.  What do we do when we need each Spark worker task to coordinate certain

read more

IntelliJ Scala and Apache Spark – Well, Now You Know

Intellij Scala Spark
/ Categories: Spark Comments: no comments

IntelliJ Scala and Spark Setup Overview In this post, we’re going to review one way to setup IntelliJ for Scala and Spark development.  The IntelliJ Scala combination is the best, free setup for Scala and Spark development.  And I have nothing against ScalaIDE (Eclipse for Scala) or using editors such as Sublime.  I switched from

read more

Spark Streaming Testing with Scala Example

Spark Streaming Testing
/ Categories: Spark Comments: no comments

Spark Streaming Testing How do you create and automate tests of Spark Streaming applications?  In this post, we’ll show an example of one way in Scala.  This post is heavy on code examples and has the added bonus of using a code coverage plugin. Are the tests in this tutorial examples unit tests?  Or, are

read more

Apache Spark, Cassandra and Game of Thrones

Spark Cassandra tutorial
/ Categories: Spark Comments: no comments

Apache Spark with Cassandra is a powerful combination in data processing pipelines.  In this post, we will build a Scala application with the Spark Cassandra combo and query battle data from Game of Thrones.  Now, we’re not going to make any show predictions!   But, we will show the most aggressive kings as well as

read more

Apache Spark Machine Learning Example with Scala

Spark Machine Learning Example
/ Categories: Spark Comments: 3 Comments

In this Apache Spark Machine Learning example, Spark MLlib will be introduced and Scala source code reviewed.  This post and accompanying screencast videos will demonstrate a custom Spark MLlib Spark driver application.  Then, the Spark MLLib Scala source code will be examined.  There will be many topics shown and explained, but first, let’s describe a

read more

Apache Spark Advanced Cluster Deploy Troubleshooting

spark cluster deploy troubleshooting
/ Categories: Spark Comments: no comments

In this Apache Spark example tutorial, we’ll review a few options when your Scala Spark code does not deploy as anticipated.  For example, does your Spark driver program rely on a 3rd party jar only compatible with Scala 2.11, but your Spark Cluster is based on Scala 2.10?  Maybe your code relies on a newer version

read more

Spark Scala with 3rd Party JARs Deploy to a Cluster

Spark Apache Cluster Deploy with 3rd Party Jars
/ Categories: Spark Comments: no comments

Overview In this Apache Spark cluster deploy tutorial, we’ll cover how to deploy Spark driver programs to a Spark cluster when the driver program utilizes third-party jars.  In this case, we’re going to use code examples from previous Spark SQL and Spark Streaming tutorials. At the end of this tutorial, there is a screencast of

read more

Spark Streaming Example – How to Stream from Slack

Spark Streaming Example
/ Categories: Spark Comments: 3 Comments

Let’s write a Spark Streaming example in Scala, which streams from Slack.  This post will show how to write, configure and execute the code, first.  Then, the source code will be examined in detail.  If you don’t have a Slack team,  you can set one up for free.   We’ll cover that too. Let’s start

read more

How-To Apache Spark Streaming with Scala Part 1

Spark Streaming with Scala
/ Categories: Spark Comments: no comments

Let’s start Apache Spark Streaming by building up our confidence with small steps.  These small steps will create the forward momentum needed when learning new skills.  The quickest way to gain confidence and momentum in learning new software development skills is executing code that performs without error. In this post, we’re going to setup and

read more

Apache Spark with Amazon S3 Examples of Text Files Tutorial

Apache Spark with Amazon S3 setup
/ Categories: Spark Comments: no comments

This post will show ways and options for accessing files stored on Amazon S3 from Apache Spark.  Examples of text file interaction on Amazon S3 will be shown from both Scala and Python using the spark-shell from Scala or ipython notebook for Python. To begin, you should know there are multiple ways to access S3

read more