Hadoop Working and Handling Failures !!!
How Hadoop works and how it handles data node failure during file transfer.
Greeting Everyone!!! I hope you all are having a good day.
“The world is one Big Data Problem” ~ Andrew McAfee
Here in this blog, we will discuss how Hadoop Cluster is created and how we transfer data in the cluster and retrieve it. Lastly, what will happen when a data node (system holding data) crashes while we are retrieving the data. So, let’s get started…..
For the Hadoop Cluster creation part, please refer to my previous blog where I have explained how we can create a Hadoop Cluster consisting of Name Node, Data Node, and Client Node.
Our second goal is to know how the Hadoop file system works? When the Client uploads the data, it contacts the Name Node(NN), and NN contains the information and availability of the Data Node and guides the Client, and the client uploads the data to Data Node. In my other blog, I have clearly explained how the data goes, and I have busted some myths about data transfer in that blog, so please do check the blog for a better understanding of this blog.
Reading File and Crashing Data Node
Now we have a Hadoop Cluster with 1 Name Node, 4 Data Node, and 1 Client. And as per our previous, we have uploaded the file from the data node into the cluster, and we also found out that the file gets uploaded in the data nodes using the serialism concept.
Now we will try to read that file using the command.
# hadoop fs -cat /<filename>
The file has been replicated in 3 of the cluster, as we can confirm from the previous blog. We will try to read the files from the client, and when reading the file, we will crash the data node from which the client is reading the file and see what will happen. Will, the client, face any error, or will the client get data from another node(where data is replicated)?
For the experiment, we have made a video with two systems where I system there are two Data Nodes and one Name Node and another system with two data node and one Client.
Video Link:
System 1:
System 2:
In the video, when we tried to read the file from Client, we used the command:
# hadoop fs -cat /testhadoop.txt
We saw at first the Client contact the Name Node(on Sys 2), and then the Client gets the information of Data Nodes and starts getting data from Data Node 2 at first then when we stop the service in DN2 with the command:
# hadoop-daemon.sh stop datanode
It started to collect data from Data Node 3, which as proof we can see on both systems that the “tcpdump” command on Client grep DN3 and vice versa start to run.
When we stop the service at Data Node 3, the Client starts to collect Data from Data Node 1.
In conclusion, when the Client tried to get data, it first went to DN2, and then when it was stopped, it started from DN3 and again from DN1 when DN3 was stopped. This experiment even proves that the data was stored in three Data Nodes out of four Data Nodes as our replication factor was 3.
NOTE: Due to latency in the cloud system you may see the output of the `tcpdump` command slow on Name Node or Client but the concept is the same.
CONCLUSION
In the end, we came to know that when the Client is getting data from the Data Nodes, and if the Data Node crash, it will connect to another Data Node and start collecting the data. We even saw the Client was connecting to the Name Node when the data node crashed and collected information of another Data Node.
I hope I have cleared all the doubts and presented my point clear. If you have any doubt and confusion, always feel free to contact me on my LinkedIn.
Thank You for your time. Have a Good Day.