2. Delete the data which has expired TTL
3. Compact several small HFile into a single larger one
Compute Node: This is the computer or machine where your actual business logic will be executed.
Storage Node: This is the computer or machine where your file system reside to store the processing data.
In most of the cases compute node and storage node would be the same machine.
Map reduce is an algorithm or concept to process Huge amount of data in a faster way. As per its name you can divide it Map and Reduce.
Your business logic would be written in the Mapped Task and Reduced Task.
Typically both the input and the output of the job are stored in a file-system (Not database). The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.
Hadoop is a open source framework which is written in java by apache software foundation. This framework is used to write software application which requires to process vast amount of data (It could handle multi TB of data). It works in-parallel on large clusters which could have 1000 of computers (Nodes) on the clusters. It also process data very reliably and fault-tolerant manner. See the below image how does it looks.