Difference between revisions of "Spark"
Line 37: | Line 37: | ||
# Submit the SLURM job script to HiperGator | # Submit the SLURM job script to HiperGator | ||
− | Spark | + | Spark interactive shells in Scalar and Python. |
* Spark interactive shell in Scalar (spark-shell) | * Spark interactive shell in Scalar (spark-shell) | ||
* Spark interactive shell in Python (pyspark) | * Spark interactive shell in Python (pyspark) |
Revision as of 03:57, 17 May 2018
Description
Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.
Environment Modules
Run module spider spark
to find out what environment modules are available for this application.
System Variables
- HPC_{{#uppercase:spark}}_DIR - installation directory
- HPC_{{#uppercase:spark}}_BIN - executable directory
- HPC_{{#uppercase:spark}}_SLURM - SLURM job script examples
- SPARK_HOME - examples directory
Running Spark in HiperGator
To run your Spark jobs in HiperGator, first, a Spark cluster should be created in HiperGator via SLURM. This section shows a simple example how to create a Spark cluster in HiperGator and how to submit your Spark jobs into the cluster.
- Spark cluster in HiperGator
- Set SLURM parameters for Spark cluster
- Set Spark parameters for Spark cluster
- Set Spark Master and Workers
- Submit the SLURM job script to HiperGator
Spark interactive shells in Scalar and Python.
- Spark interactive shell in Scalar (spark-shell)
- Spark interactive shell in Python (pyspark)
- Pi estimation in pyspark
- Pi estimation from file with pyspark
- Spark-submit
- Pi estimation
- Wordcount