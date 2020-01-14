How to install spark on RHEL 8

Apache Spark is a distributed computing system. It consists of a master and one or more slaves, where the master distributes the work among the slaves, thus giving the ability to use our many computers to work on one task. One could guess that this is indeed a powerful tool where tasks need large computations to complete, but can be split into smaller chunks of steps that can be pushed to the slaves to work on. Once our cluster is up and running, we can write programs to run on it in Python, Java, and Scala.

In this tutorial we will work on a single machine running Red Hat Enterprise Linux 8, and will install the Spark master and slave to the same machine, but keep in mind that the steps describing the slave setup can be applied to any number of computers, thus creating a real cluster that can process heavy workloads. We'll also add the necessary unit files for management, and run a simple example against the cluster shipped with the distributed package to ensure our system is operational.

How to install Spark master and slave

How to add systemd unit files

How to verify successful master-slave connection

How to run a simple example job on the cluster

Spark shell with pyspark.

Software Requirements and Conventions Used

Software Requirements and Linux Command Line Conventions Category Requirements, Conventions or Software Version Used System Red Hat Enterprise Linux 8 Software Apache Spark 2.4.0 Other Privileged access to your Linux system as root or via the sudo command. Conventions # - requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command $ - requires given linux commands to be executed as a regular non-privileged user

How to install spark on Redhat 8 step by step instructions

Apache Spark runs on JVM (Java Virtual Machine), so a working Java 8 installation is required for the applications to run. Aside from that, there are multiple shells shipped within the package, one of them is pyspark , a python based shell. To work with that, you'll also need python 2 installed and set up.