Infinite Interview Questions: February 2014

Monday, February 24, 2014

Features Supported by Hadoop Release Series

Comparison between Big Data and RDBMS

	RDBMS	Big Data
Data size	Gigabytes	Petabytes
Access	Interactive and batch	Batch
Updates	Read and write many times	Write once, read many times
Structure	Static schema	Dynamic schema
Integrity	High	Low
Scaling	Nonlinear	Linear

Friday, February 21, 2014

What happens in Major compaction in HBase?

1. Delete the data which is masked by tombstone
2. Delete the data which has expired TTL
3. Compact several small HFile into a single larger one

Wednesday, February 19, 2014

List out some limitation of Hadoop.

Write-once model

Plan to support appending-writes

A namespace with an extremely large number of files exceeds Namenode’s capacity to maintain
Cannot be mounted by exisiting OS

Getting data in and out is tedious
Virtual File System can solve problem

Java API
Thrift API is available to use other languages
HDFS does not implement / support

User quotas
Access permissions
Hard or soft links
Data balancing schemes

No periodic checkpoints
Namenode is single point of failure

Automatic restart and failover to another machine not yet supported

List of steps of datanode failure.

Namenode marks Datanodes without recent heartbeat as dead
Does not forward any new I/O requests
Constantly tracks which blocks must be replicated with BlockMap
Initiates replication when necessary

Please write steps of checkpointing in hadoop.

Performed by Namenode
Two versions of FsImage

One stored on disk
One in memory

Applies all transactions in EditLog to in-memory FsImage
Flushes FsImage to disk
Truncates EditLog

Note : Currently only occurs on start-up

What is the namenode startup process steps?

Namenode enters Safemode

Replication does not occur in Safemode

Each Datanode sends Heartbeat
Each Datanode sends Blockreport

Lists all HDFS data blocks

Namenode creates Blockmap from Blockreports
Namenode exits Safemode
Replicate any under-replicated blocks

What makes HDFS unavailable ?

Ans: Failure of datanodes

What is one of the underacted problem which may occur with map reduce submission to Hadoop.

Due to some condition of bad code it goes into infinity loop.

What the functions of a scheduling algorithm?

Reduce the total amount of computation necessary to complete a job
Allow multiple users to share clusters in a predictable, policy-guided manner.
Run jobs at periodic times of the day.
Reduce job latencies in an environment with multiple jobs of different sizes.

Infinite Interview Questions

Pages