Saturday, April 13, 2013

Write file to HDFS/Hadoop Read File From HDFS/Hadoop Using Java

 

import java.io.File;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hdfs.DistributedFileSystem;

/**
*
* @author Shashwat Shriparv
* @email dwivedishashwat@gmail.com
* @Web helpmetocode.blogspot.com
*/
public class WritetoHDFSReadFromHDFSWritToLocal {
private static byte[] buffer;
private static int bytesRead;

public static void main(String[] args) throws IOException, InterruptedException, URISyntaxException {

FileSystem fs =new DistributedFileSystem();

fs.initialize(new URI("hdfs://master1:9000/"), new Configuration());

final File folder = new File("C:\\Shared\\files");
for (final File fileEntry : folder.listFiles()) {
if (fileEntry.isDirectory()) {
readallfilefromfolder(fileEntry);
} else {
fs.copyFromLocalFile(new Path("C:\\shashwat\\files"+fileEntry.getName()),new Path("/Test/"));
System.out.println(fileEntry.getName());
fs.copyToLocalFile(new Path("/Test/"+fileEntry.getName()),new Path("d:\\shashwat\\"));
}
}

//fs.copyFromLocalFile(new Path("
C:\\Shared\\HadoopLibs"),new Path("/Test/1.jpg"));

System.out.println("
Done");
}
public static void readallfilefromfolder(final File folder) {
for (final File fileEntry : folder.listFiles()) {
if (fileEntry.isDirectory()) {
readallfilefromfolder(fileEntry);
} else {
System.out.println(fileEntry.getName());
}
}
}
}


Note : While writing to hdfs create a directory and change its permission to 777 to avoid security related exception.

Thursday, March 28, 2013

WebHDFS REST API

The HTTP REST API supports the complete FileSystem interface for HDFS.

Operations

For More Please Visit WebHDFS



Jobtracker API error - Call to localhost/127.0.0.1:50030 failed on local exception: java.io.EOFException

Try the port number listed in your $HADOOP_HOME/conf/mapred-site.xml under the mapred.job.tracker property. Here's my pseudo mapred-site.xml conf

<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>

If you look at the JobTracker.getAddress(Configuration) method, you can see it uses this property if you don't explicitly specify the jobtracker host / port:

public static InetSocketAddress getAddress(Configuration conf) {
String jobTrackerStr =
conf.get("mapred.job.tracker", "localhost:8012");
return NetUtils.createSocketAddr(jobTrackerStr);
}


Thursday, March 14, 2013

Adding Scheduler to Hadoop Cluster

 

As we know we we execute task or jobs on hadoop it follows FIFO Scheduling, but if you are in multi user hadoop environment the you will need better scheduler for the consistency and correctness of the task scheduling.

Hadoop comes with other schedulers too those are:

Fair Schedulers : This defines pools and over time; each pool gets around the same amount of resources.

Capacity Schedulers : This defines queues, and each queue has a guaranteed capacity. The capacity scheduler shares computer resources allocated to a queue with other queues if those resources are not in use.

For changing the scheduler you need to take your cluster offline and make some configuration changes, first make sure that the correct scheduler jar files are there. In older version of hadoop you need to put the jar file if not ther in lib directory but from hadoop 1 these jars available in the lib folder and if you are using the newer hadoop good news for you Smile

Steps will be:

Using C++ or C to interact with hadoop

Are you a C++ or c programmer and you are not willing to write java code to interact with Hadoop/HDFS ha? Ok you have an option that is : llbhdfs native library that enables you to write programs in c or cpp to interact with Hadoop.

Current Hadoop distributions contain the pre-compiled libhdfs libraries for 32-bit and 64-bit Linux operating systems. You may have to download the Hadoop standard distribution and compile the libhdfs library from the source code, if your operating system is not compatible with the pre-compiled libraries.

For more information read following:

http://wiki.apache.org/hadoop/MountableHDFS

https://ccp.cloudera.com/display/CDHDOC/Mountable+HDFS

http://xmodulo.com/2012/06/how-to-mount-hdfs-using-fuse.html

Writing code in cpp or c Follows:

Finding out block location and block size of file on HDFS

 

Have you ever needed to find out the Block location and Block size for a file which is lying on hdfs hadoop? if so here is the command you can use to find out that.

For that we need “fsck” command which hadoop provide.

Here goes the command:

bin/hadoop fsck /filepath/filenameonhdfs –block –files  -location

This command will provide information about Block location on what data node its lying, what are the blocks for that file

Just go and play with the command you will understand more. Smile

Friday, February 1, 2013

Phoenix: A SQL layer over HBase 'We put the SQL back in the NoSQL'

 

Phoenix is a SQL layer over HBase, delivered as a client-embedded JDBC driver, powering the HBase use cases at Salesforce.com. Phoenix targets low-latency queries (milliseconds), as opposed to batch operation via map/reduce. To see what's supported, go to our language reference guide, and read more on our wiki.

Sunday, January 20, 2013

FAILED FSError: java.io.IOException: No Space left on device

 

FAILED FSError: java.io.IOException: No Space left on device

org.apache.hadoop.util.DiskChecker$DiskErrorException : Could not find any valid local directory for taskTracker/jobcache

When you run hive queries and you queries fails, just go the the job tracker of Hadoop : mostly on 50030 port on your Hadoop nodes, and check the failed or killed jobs, you may find these errors.

The reason behind will be following:

Live

Your Ad Here