Hadoop is said to be highly fault tolerant. Hadoop achieves this feat through the process of replication. Data is replicated across multiple nodes in a Hadoop cluster. The data is associated with a replication factor, which indicates the number of copies of the data that are present across the various nodes in a Hadoop cluster. For example, if the replication factor is 3, the data will be present in three different nodes of the Hadoop cluster, where each node will contain one copy each. In this manner, if there is a failure in any one of the nodes, the data will not be lost, but can be recovered from one of the other nodes which contains copies or replicas of the data.
Hadoop 3.0, however, makes use of the method of Erasure Coding (EC). EC is implemented by means of Redundant Array of Inexpensive Disks (RAID) by striping, where logically sequential data is divided into smaller units and these smaller units are then stored as consecutive unts on different disks. Replication results in a storage overhead of 200% in case of a replication factor of 3 (which is the default replication factor). The use of EC in Hadoop improves storage efficiency when compared to replication, but still maintains the same level of fault tolerance.
Posted Date:- 2021-08-31 05:23:07