Infinite Interview Questions: March 2013

Thursday, March 28, 2013

WebHDFS REST API

The HTTP REST API supports the complete FileSystem interface for HDFS.

Operations

HTTP GET
- OPEN (see FileSystem.open)
- GETFILESTATUS (see FileSystem.getFileStatus)
- LISTSTATUS (see FileSystem.listStatus)
- GETCONTENTSUMMARY (see FileSystem.getContentSummary)
- GETFILECHECKSUM (see FileSystem.getFileChecksum)
- GETHOMEDIRECTORY (see FileSystem.getHomeDirectory)
- GETDELEGATIONTOKEN (see FileSystem.getDelegationToken)
HTTP PUT
- CREATE (see FileSystem.create)
- MKDIRS (see FileSystem.mkdirs)
- RENAME (see FileSystem.rename)
- SETREPLICATION (see FileSystem.setReplication)
- SETOWNER (see FileSystem.setOwner)
- SETPERMISSION (see FileSystem.setPermission)
- SETTIMES (see FileSystem.setTimes)
- RENEWDELEGATIONTOKEN (see DistributedFileSystem.renewDelegationToken)
- CANCELDELEGATIONTOKEN (see DistributedFileSystem.cancelDelegationToken)
HTTP POST
- APPEND (see FileSystem.append)
HTTP DELETE
- DELETE (see FileSystem.delete)

For More Please Visit WebHDFS

Jobtracker API error - Call to localhost/127.0.0.1:50030 failed on local exception: java.io.EOFException

Try the port number listed in your $HADOOP_HOME/conf/mapred-site.xml under the mapred.job.tracker property. Here's my pseudo mapred-site.xml conf

<property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
</property>

If you look at the JobTracker.getAddress(Configuration) method, you can see it uses this property if you don't explicitly specify the jobtracker host / port:

public static InetSocketAddress getAddress(Configuration conf) {
  String jobTrackerStr =
    conf.get("mapred.job.tracker", "localhost:8012");
  return NetUtils.createSocketAddr(jobTrackerStr);
}

Thursday, March 14, 2013

Adding Scheduler to Hadoop Cluster

As we know we we execute task or jobs on hadoop it follows FIFO Scheduling, but if you are in multi user hadoop environment the you will need better scheduler for the consistency and correctness of the task scheduling.

Hadoop comes with other schedulers too those are:

Fair Schedulers : This defines pools and over time; each pool gets around the same amount of resources.

Capacity Schedulers : This defines queues, and each queue has a guaranteed capacity. The capacity scheduler shares computer resources allocated to a queue with other queues if those resources are not in use.

For changing the scheduler you need to take your cluster offline and make some configuration changes, first make sure that the correct scheduler jar files are there. In older version of hadoop you need to put the jar file if not ther in lib directory but from hadoop 1 these jars available in the lib folder and if you are using the newer hadoop good news for you Smile

Steps will be:

Using C++ or C to interact with hadoop

Are you a C++ or c programmer and you are not willing to write java code to interact with Hadoop/HDFS ha? Ok you have an option that is : llbhdfs native library that enables you to write programs in c or cpp to interact with Hadoop.

Current Hadoop distributions contain the pre-compiled libhdfs libraries for 32-bit and 64-bit Linux operating systems. You may have to download the Hadoop standard distribution and compile the libhdfs library from the source code, if your operating system is not compatible with the pre-compiled libraries.

For more information read following:

http://wiki.apache.org/hadoop/MountableHDFS

https://ccp.cloudera.com/display/CDHDOC/Mountable+HDFS

http://xmodulo.com/2012/06/how-to-mount-hdfs-using-fuse.html

Writing code in cpp or c Follows:

Finding out block location and block size of file on HDFS

Have you ever needed to find out the Block location and Block size for a file which is lying on hdfs hadoop? if so here is the command you can use to find out that.

For that we need “fsck” command which hadoop provide.

Here goes the command:

bin/hadoop fsck /filepath/filenameonhdfs –block –files -location

This command will provide information about Block location on what data node its lying, what are the blocks for that file

Just go and play with the command you will understand more. Smile

Infinite Interview Questions

Pages