Which spark performance monitoring tools are available to monitor the performance of your Spark cluster? Let’s find out. Before we address this question, I assume we already know Spark includes monitoring through the Spark UI. In addition, Spark includes support for monitoring and performance debugging through the History Server as well as support for the Java Metrics library. But, are there other spark performance monitoring tools available? In this short post, let’s list a few more options to consider.
Developed at Groupon. Sparklint uses Spark metrics and a custom Spark event listener. It is easily attached to any Spark job. It can also run standalone against historical event logs or be configured to use an existing Spark History server. It presents good looking charts through a web UI for analysis. It also provides a resource focused view of the application runtime.
Presentation Spark Summit 2017 Presentation on Sparklint
From LinkedIn, Dr. Elephant is a spark performance monitoring tool for Hadoop and Spark. Dr. Elephant gathers metrics, runs analysis on these metrics, and presents them back in a simple way for easy consumption. The goal is to improve developer productivity and increase cluster efficiency by making it easier to tune the jobs.
“It analyzes the Hadoop and Spark jobs using a set of pluggable, configurable, rule-based heuristics that provide insights on how a job performed, and then uses the results to make suggestions about how to tune the job to make it perform more efficiently.”
Presentation: Spark Summit 2017 Presentation on Dr. Elephant
Born from IBM Research in Dublin. SparkOscope was developed to better understand Spark resource utilization. One of the reasons SparkOscope was developed to “address the inability to derive temporal associations between system-level metrics (e.g. CPU utilization) and job-level metrics (e.g. stage ID)”. Example: authors were not able to trace back the root cause of a peak in HDFS Reads or CPU usage to the Spark application code. To overcome these limitations, SparkOscope was developed.
SparkOscope extends (augments) the Spark UI and History server.
SparkOscope dependencies include Hyperic Sigar library and HDFS.
Presentation: Spark Summit 2017 Presentation on SparkOscope
Don’t forget about the Spark History Server. I wrote up a tutorial on Spark History Server recently.
Spark’s support for the Metrics Java library available at http://metrics.dropwizard.io/ is what facilitates many of the Spark Performance monitoring options above. It also provides a way to integrate with external monitoring tools such as Ganglia and Graphite. There is a short tutorial on integrating Spark with Graphite presented on this site.
Hopefully, this list of Spark Performance monitoring tools presents you with some options to explore. Let me know if I missed any other options or if you have any opinions on the options above. Thank you and good night.
Featured image https://flic.kr/p/e4rCVb