CHAPTER 3: Why HDFS Needed?

Well, by now you must be clear about why we need HDFS? Why HDFS Needed is the 3rd chapter of HDFS Tutorial Series.

Just to summarize, here again, I am mentioning few points as why exactly we need HDFS. As we know HDFS is a file storage and distribution system used to store files in Hadoop environment.

Here is the detailed view of the uses of HDFS

• It is suitable for the distributed storage and processing.
• Hadoop provides a command interface to interact with HDFS.
• The built-in servers of NameNode and DataNode help users to easily check the status of the cluster.
• Streaming access to file system data.
• HDFS provides file permissions and authentication.

Goals of HDFS

• Fault detection and recovery: Since HDFS includes a large number of commodity hardware, failure of components is frequent. Therefore, HDFS should have mechanisms for quick and automatic fault detection and recovery.
• Huge datasets: HDFS should have hundreds of nodes per cluster to manage the applications having huge datasets.
• Hardware at data: A requested task can be done efficiently when the computation takes place near the data. Especially where huge datasets are involved, it reduces the network traffic and increases the throughput.

A study was done on the need like what exactly corporate looks when buying the hardware and looking at the data, it seems our current Hadoop architecture fulfills it completely.

study on hardwarePrevious Chapter: HDFS OverviewCHAPTER 4: Important Hadoop Terminology