Sunday, January 20, 2013

FAILED FSError: java.io.IOException: No Space left on device

 

FAILED FSError: java.io.IOException: No Space left on device

org.apache.hadoop.util.DiskChecker$DiskErrorException : Could not find any valid local directory for taskTracker/jobcache

When you run hive queries and you queries fails, just go the the job tracker of Hadoop : mostly on 50030 port on your Hadoop nodes, and check the failed or killed jobs, you may find these errors.

The reason behind will be following:

An imbalance cluster : some of the nodes having more data and some with less, that is data is distributed unequally through out your cluster, check for the DFS uses. as you know while running hive query Hadoop needs lot of space on temp directory to store intermediate files, and if your query is traversing through out the cluster, then the size of intermediate files generated will be huge and if you don’t have a balanced cluster the jobs will fail at the nodes which are over utilized.

What is the solution

Allocate more space if feasible

Balance your cluster using balancer which is found in the bin directory, which you many not have noticed like me Smile. so you can start balancer as following:

bin/start-balancer.sh

this will calculate the statistics of your cluster and will tell you what and how much data needed to be transferred to balance your Hadoop cluster,

depending on the size of your cluster and the amount of imbalance it may take more time so plan the balancing accordingly, it is good practice to balance your cluster at specific interval, for avoiding these kind of problem

you can also check the amount of temp data partition allocated for Hadoop on local disk as it needs bot the places on the local disk as well on HDFS to execute map-reduce, which generates intermediate huge temp files.

Do above keep you cluster balance and healthy for proper processing. Happy Hadooping Smile

No comments:

Post a Comment

Live

Your Ad Here