When you are designing a Hadoop project from scratch then there are multiple things you may need to consider. For example, HDFS Node Storage, the number of datanodes etc. to be used. Here in this tutorial, I will share the formula to calculate the number of datanodes.
Calculate the number of datanodes (N)
Here is the simple formula to calculate the number of datanodes-
Here H is the HDFS storage size which you can find from this tutorial- formula to calculate HDFS node storage.
D: It is the disk space available per node. Here we will also have to consider CPU, Bandwidth, RAM, and nodes etc.
RAM is a very important configuration to consider. As you are running multiple machines, data transfers etc. and so you should have sufficient RAM configuration to handle all the processes.
If you are having RAM size of 64GB then it would be sufficient to handle all your work but if you have more RAM size then it will give you more smooth and efficient process.
Network speed is equally important for the smooth operations.
Let’s calculate the number of datanodes based on some figure. Let say you have 500TB of the file to be put in Hadoop cluster and disk size available is 2TB per node. Then the required number of datanodes would be-
N= 500/2= 250. This is just a sample data.
This was all about how to calculate the number datanodes easily.
Do try and share if you are facing any difficulty while finding the number of datanodes.