After you have a Spark cluster running, how do you deploy Python programs to a Spark Cluster? If you find these videos of deploying Python programs to an Apache Spark cluster interesting, you will find the entire Apache Spark with Course valuable. Make sure to check it out.
In this post, we’ll deploy a couple of example Python programs. We’ll start with a simple example and then progress to more complicated examples which include utilizing spark-packages and Spark SQL.
Ok, now that we’ve deployed a few examples, let’s review a Python program which utilizes code we’ve already seen in this Spark with Python tutorials on this site. It’s a Python program which analyzes New York City Uber data using Spark SQL. The video will show the program in the Sublime Text editor, but you can use any editor you wish.
When deploying our driver program, we need to do things differently than we have while working with pyspark. For example, we need to obtain a SparkContext and SQLContext. We need to specific Python imports.
bin/spark-submit –master spark://todd-mcgraths-macbook-pro.local:7077 –packages com.databricks:spark-csv_2.10:1.3.0 uberstats.py Uber-Jan-Feb-FOIL.csv
Let’s return to the Spark UI now we have an available worker in the cluster and we have deployed some Python programs.
The Spark UI is the tool for Spark Cluster diagnostics, so we’ll review the key attributes of the tool.
Featured Image credit https://flic.kr/p/bpd8Ht