Showing posts with label CDH. Show all posts
Showing posts with label CDH. Show all posts

Thursday, March 29, 2018

Clearing hdfs space on CDH cluster

Problem Statement: On our dev CDH cluster we were frequently facing issues related to hdfs space getting filled up due to which HDFS and related services would halt. This would then
interrupt our development work. 

What we used to do: We would resort to dropping temporary tables which would free up some space but that would give us only a week's breathing space at max.

A more permanent solution: To find a more permanent fix I ran "hdfs dfsadmin -report" command on any of the datanodes. This command showed up DFS used and available capacity on all data nodes in the cluster. As expected, the nodes had run out of free space. 

Then i ran command "hdfs dfs -du -h /". This showed that some TBs of space was occupied by /tmp folder. 
On further drilling down i found that multiple folders (including logs folder) were hogging space but the information they had was not needed. 

So a simple hdfs dfs -rm -r /tmp/<folder name> followed by cleaning of hdfs trash folder did the trick. However free space did not immediately reflect on CDH cluster but took a couple of hours
before CDH started reporting hdfs as normal.

While this is too simple a thing to write a post on but sometimes simple things end up taking a lot of time too :)