Wednesday, May 3, 2017

Hadoop general interview questions

  1. architecture component of Hadoop
  2. os level optimisation
  3. prerequisites before installing
  4. how to bring data
  5. what all we need to make sure in order to copy data from one cluster
  6. scenarios and use of the scheduler
  7. I want to implement department wise access level on hdfs and yarn
  8. job flow in yarn
  9. how resource allocation happens in yarn
  10. what is the file read/write flow
  11. how different nodes in a cluster communicate with each other
  12. how the request flows through zookeepers
  13. please tell something about read/write pipeline on Hadoop
  14. how do you do deployments on many servers at once

Monday, February 24, 2014

Friday, February 21, 2014

What happens in Major compaction in HBase?

1. Delete the data which is masked by tombstone
2. Delete the data which has expired TTL
3. Compact several small HFile into a single larger one

Wednesday, February 19, 2014

List out some limitation of Hadoop.

  • Write-once model
    • Plan to support appending-writes
  • A namespace with an extremely large number of files exceeds Namenode’s capacity to maintain
  • Cannot be mounted by exisiting OS
    • Getting data in and out is tedious
    • Virtual File System can solve problem
  • Java API
  • Thrift API is available to use other languages
  • HDFS does not implement / support
    • User quotas
    • Access permissions
    • Hard or soft links
    • Data balancing schemes
  • No periodic checkpoints
  • Namenode is single point of failure
    • Automatic restart and failover to another machine not yet supported

List of steps of datanode failure.

  • Namenode marks Datanodes without recent heartbeat as dead
  • Does not forward any new I/O requests
  • Constantly tracks which blocks must be replicated with BlockMap
  • Initiates replication when necessary

Please write steps of checkpointing in hadoop.

  • Performed by Namenode
  • Two versions of FsImage
    • One stored on disk
    • One in memory
  • Applies all transactions in EditLog to in-memory FsImage
  • Flushes FsImage to disk
  • Truncates EditLog
Note : Currently only occurs on start-up

What is the namenode startup process steps?

  • Namenode enters Safemode
    • Replication does not occur in Safemode
  • Each Datanode sends Heartbeat 
  • Each Datanode sends Blockreport
    • Lists all HDFS data blocks
  • Namenode creates Blockmap from Blockreports
  • Namenode exits Safemode
  • Replicate any under-replicated blocks 

What makes HDFS unavailable ?

Ans: Failure of datanodes

What is one of the underacted problem which may occur with map reduce submission to Hadoop.

Due to some condition of bad code it goes into infinity loop.


Your Ad Here