How-To Apache Spark Streaming with Scala Part 1

Let’s start Apache Spark Streaming by building up our confidence with small steps.  These small steps will create the forward momentum needed when learning new skills.  The quickest way to gain confidence and momentum in learning new software development skills is executing code that performs without error.

In this post, we’re going to setup and run Apache Spark Streaming with Scala code.  Then, we will be confident taking the next step to Part 2 of learning Apache Spark Streaming.

Before we begin though, I assume you already have a high-level understanding of Apache Spark Streaming at this point, but if not, here’s a quick two-minute read on Spark Streaming (opens in new window) from the Learning Apache Spark Summary book.

Overview

Spark comes with some great examples and convenient scripts for running Streaming code.  Let’s make sure you can run these examples.  In case it helps, I made a screencast of me running through these steps.  Link to the screencast below.

Running the NetworkWordCount example out-of-the-box

  1. Open a shell or command prompt on Windows and go to your Spark root directory.
  2. Start Spark Master:  sbin/start-master.sh  **
  3. Start a Worker: sbin/start-slave.sh spark://todd-mcgraths-macbook-pro.local:7077
  4. Start netcat on port 9999: nc -lk 9999  (*** Windows users: https://nmap.org/ncat/  Let me know in page comments if this works well on Windows)
  5. Run network word count using handy run-example script: bin/run-example streaming.NetworkWordCount localhost 9999

** Windows users, please adjust accordingly; i.e. sbin/start-master.cmd instead of sbin/start-master.sh

Here’s a screencast of me running these steps

Making and Running Our Own NetworkWordCount

Ok, that’s good.  We’ve succeeded in running the Scala Spark Streaming NetworkWordCount example, but what about running our own Spark Streaming program in Scala?  Let’s take another step towards that goal.  In this step, we’re going to setup our own Scala/SBT project, compile, package and deploy a modified NetworkWordCount.  Again, I made a screencast of the following steps with a link to the screencast below.

  1. Choose or create a new directory for a new Spark Streaming Scala project.
  2. Make dirs to make things convenient for SBT: src/main/scala
  3. Create Scala object code file called NetworkWordCount.scala in src/main/scala directory
  4. Copy-and-paste NetworkWordCount.scala code from Spark examples directory to your version created in previous step
  5. Remove or comment out package and StreamingExamples references
  6. Change AppName to “MyNetworkWordCount”
  7. Create a build.sbt file (source code below)
  8. sbt compile to smoke test
  9. Deploy: ~/Development/spark-1.5.1-bin-hadoop2.4/bin/spark-submit –class “NetworkWordCount” –master spark://todd-mcgraths-macbook-pro.local:7077 target/scala-2.11/streaming-example_2.11-1.0.jar localhost 9999
  10. Start netcat on port 9999: nc -lk 9999  and start typing
  11. Check things out in the Spark UI

build.sbt source

If you watched the video, notice this has been corrected to “streaming-example” and not “steaming-example” :)

Spark Streaming With Scala Part 1 Conclusion

At this point, I hope you were successful in running both Spark Streaming examples in Scala.  If so, you should be more confident when we continue to explore Spark Streaming in Part 2.   If you have any questions, feel free to add comments below.

 

 

Featured image credit https://flic.kr/p/bVJF32

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">