Apache Hadoop is an open source framework used for distributed storage as well as distributed processing of big data on clusters of computers which runs on commodity hardwares. Hadoop stores data in Hadoop Distributed File System (HDFS) and the processing of these data is done using MapReduce. YARN provides API for requesting and allocating resource in the Hadoop cluster.
The Apache Hadoop framework is composed of the following modules:
- Hadoop Common
- Hadoop Distributed File System (HDFS)
- YARN
- MapReduce
This article explains how to install Hadoop Version 2 on Ubuntu 18.04. We will install HDFS (Namenode and Datanode), YARN, MapReduce on the single node cluster in Pseudo Distributed Mode which is distributed simulation on a single machine. Each Hadoop daemon such as hdfs, yarn, mapreduce etc. will run as a separate/individual java process.
In this tutorial you will learn:
- How to add users for Hadoop Environment
- How to install and configure the Oracle JDK
- How to configure passwordless SSH
- How to install Hadoop and configure necessary related xml files
- How to start the Hadoop Cluster
- How to access NameNode and ResourceManager Web UI