Difference between revisions of "Spark"

Revision as of 03:46, 17 May 2018

Description

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

Environment Modules

Run module spider spark to find out what environment modules are available for this application.

System Variables

HPC_{{#uppercase:spark}}_DIR - installation directory
HPC_{{#uppercase:spark}}_BIN - executable directory
HPC_{{#uppercase:spark}}_SLURM - SLURM job script examples
SPARK_HOME - examples directory

Running Spark in HiperGator

To run your Spark jobs in HiperGator, first, a Spark cluster should be created in HiperGator via SLURM. This section shows a simple example how to create a Spark cluster in HiperGator and how to submit your Spark jobs into the cluster.

Spark cluster in HiperGator

Set SLURM parameters for Spark cluster
Set Spark parameters for Spark cluster
Set Spark Master and Workers
Submit the SLURM job script to HiperGator

Spark supports interactive shells in Scalar and Python.

Spark interactive shell in Scalar (spark-shell)
Spark interactive shell in Python (pyspark)

Pi estimation in pyspark
Pi estimation from file with pyspark

Spark-submit

Pi estimation
Wordcount

@@ Line 28: / Line 28: @@
 * HPC_{{#uppercase:{{#var:app}}}}_SLURM - SLURM job script examples
 * SPARK_HOME - examples directory
+==Running Spark in HiperGator==
+To run your Spark jobs in HiperGator, first, a Spark cluster should be created in HiperGator via SLURM. This section shows a simple example how to create a Spark cluster in HiperGator and how to submit your Spark jobs into the cluster.
+* Spark cluster in HiperGator
+# Set SLURM parameters for Spark cluster
+# Set Spark parameters for Spark cluster
+# Set Spark Master and Workers
+# Submit the SLURM job script to HiperGator
+Spark supports interactive shells in Scalar and Python.
+* Spark interactive shell in Scalar (spark-shell)
+* Spark interactive shell in Python (pyspark)
+# Pi estimation in pyspark
+# Pi estimation from file with pyspark
+* Spark-submit
+# Pi estimation
+# Wordcount
 <!--Configuration-->