Apache Spark Advanced Cluster Deploy Troubleshooting

spark cluster deploy troubleshooting
/ Categories: Spark Comments: no comments

In this Apache Spark example tutorial, we’ll review a few options when your Scala Spark code does not deploy as anticipated.  For example, does your Spark driver program rely on a 3rd party jar only compatible with Scala 2.11, but your Spark Cluster is based on Scala 2.10?  Maybe your code relies on a newer version

read more

Spark Scala with 3rd Party JARs Deploy to a Cluster

Spark Apache Cluster Deploy with 3rd Party Jars
/ Categories: Spark Comments: no comments

Overview In this Apache Spark cluster deploy tutorial, we’ll cover how to deploy Spark driver programs to a Spark cluster when the driver program utilizes third-party jars.  In this case, we’re going to use code examples from previous Spark SQL and Spark Streaming tutorials. At the end of this tutorial, there is a screencast of

read more

Spark Streaming Example – How to Stream from Slack

Spark Streaming Example
/ Categories: Spark Comments: 3 Comments

Let’s write a Spark Streaming example in Scala, which streams from Slack.  This post will show how to write, configure and execute the code, first.  Then, the source code will be examined in detail.  If you don’t have a Slack team,  you can set one up for free.   We’ll cover that too. Let’s start

read more

How-To Apache Spark Streaming with Scala Part 1

Spark Streaming with Scala
/ Categories: Spark Comments: no comments

Let’s start Apache Spark Streaming by building up our confidence with small steps.  These small steps will create the forward momentum needed when learning new skills.  The quickest way to gain confidence and momentum in learning new software development skills is executing code that performs without error. In this post, we’re going to setup and

read more

How To: Apache Spark Cluster on Amazon EC2 Tutorial

Spark Cluster on EC2
/ Categories: Spark Comments: 12 Comments

How to setup and run Apache Spark Cluster on EC2?  This post will walk you through each step to get an Apache Spark cluster up and running on EC2. The cluster consists of one master and one worker node. It includes each step I took regardless if it failed or succeeded.  While your experience may

read more

Spark SQL JSON Examples

Spark SQL JSON
/ Categories: Spark Comments: 2 Comments

This tutorial covers using Spark SQL with a JSON file input data source. Overview We will show examples of JSON as input source to Spark SQL’s SQLContext.  This Spark SQL tutorial with JSON has two parts.  Part 1 focus is the “happy path” when using JSON with Spark SQL.  Part 2 covers a “gotcha” or something

read more

Spark SQL CSV Examples

Spark SQL CSV Example
/ Categories: Spark Comments: 4 Comments

In this Spark tutorial, we will use Spark SQL with a CSV input data source.  We will continue to use the baby names CSV source file as used in the previous Spark tutorials.  This tutorial presumes the reader is familiar with using SQL with relational databases and would like to know how to use with

read more

Apache Spark Cluster Part 1: Run Standalone

Spark console
/ Categories: Spark Comments: 1 Comment

Running an Apache Spark Cluster on your local machine is natural, early step towards Apache Spark proficiency.  Let’s start understanding Spark cluster options by to running a cluster on a local machine.  Running a local cluster is called “standalone” mode.  This post will describe pitfalls to avoid and review how to run Spark Cluster locally, deploy to a

read more

Apache Spark: Examples of Actions

actions
/ Categories: Spark Comments: 1 Comment

Spark Action Examples Unlike Transformations which produce RDDs, action functions produce a value back to the Spark driver program.  Actions may trigger a previously constructed, lazy RDD to be evaluated. reduce collect count first take takeSample countByKey saveAsTextFile reduce(func) Aggregate the elements of a dataset through func

map API signature with stripped implicits: map[U](f: (T) ⇒ U): RDD[U]

read more