Monday, February 24, 2014

Features Supported by Hadoop Release Series

Comparison between Big Data and RDBMS

Big Data
Data size
Interactive and batch
Read and write many times
Write once, read many times
Static schema
Dynamic schema

Friday, February 21, 2014

What happens in Major compaction in HBase?

1. Delete the data which is masked by tombstone
2. Delete the data which has expired TTL
3. Compact several small HFile into a single larger one

Wednesday, February 19, 2014

List out some limitation of Hadoop.

  • Write-once model
    • Plan to support appending-writes
  • A namespace with an extremely large number of files exceeds Namenode’s capacity to maintain
  • Cannot be mounted by exisiting OS
    • Getting data in and out is tedious
    • Virtual File System can solve problem
  • Java API
  • Thrift API is available to use other languages
  • HDFS does not implement / support
    • User quotas
    • Access permissions
    • Hard or soft links
    • Data balancing schemes
  • No periodic checkpoints
  • Namenode is single point of failure
    • Automatic restart and failover to another machine not yet supported

List of steps of datanode failure.

  • Namenode marks Datanodes without recent heartbeat as dead
  • Does not forward any new I/O requests
  • Constantly tracks which blocks must be replicated with BlockMap
  • Initiates replication when necessary

Please write steps of checkpointing in hadoop.

  • Performed by Namenode
  • Two versions of FsImage
    • One stored on disk
    • One in memory
  • Applies all transactions in EditLog to in-memory FsImage
  • Flushes FsImage to disk
  • Truncates EditLog
Note : Currently only occurs on start-up

What is the namenode startup process steps?

  • Namenode enters Safemode
    • Replication does not occur in Safemode
  • Each Datanode sends Heartbeat 
  • Each Datanode sends Blockreport
    • Lists all HDFS data blocks
  • Namenode creates Blockmap from Blockreports
  • Namenode exits Safemode
  • Replicate any under-replicated blocks 

What makes HDFS unavailable ?

Ans: Failure of datanodes

What is one of the underacted problem which may occur with map reduce submission to Hadoop.

Due to some condition of bad code it goes into infinity loop.

What the functions of a scheduling algorithm?

  • Reduce the total amount of computation necessary to complete a job
  • Allow multiple users to share clusters in a predictable, policy-guided manner.
  • Run jobs at periodic times of the day.
  • Reduce job latencies in an environment with multiple jobs of different sizes.


Your Ad Here