Friday 27 September 2013

Top 10 Interview Question and Answers for Hadoop

Leave a Comment
Here this article is about Top 10 Interview Question and Answers for Hadoop which are very useful for the seekers in the future when you attend interview regarding Hadoop. These Question and answers are explained in a brief way and in a simple manner where the readers can understand easily without any difficulties. After reading these please share your testimonial with us. All the best





Introduction to Hadoop


Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a circulated computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Hadoop was stimulated by Google's MapReduce, a software framework in which an application is broken down into numerous small parts. Any of these parts called fragments or blocks can be run on any node in the cluster.




1) What is Hadoop?

Ans: Hadoop is a framework which allows for circulated processing for large data sets across clusters of commodity. Hadoop does not have any growing version like ‘oops’.


2) Why do we require Hadoop?

Ans: Everyday an excess amount of unstructured data is getting dumped into our machines. The major challenge is not to store large data sets in our systems but to get back and analyze the data in the organizations, the data present in different machines at different locations. In these circumstances a necessity for Hadoop arises. Hadoop has a capability to analyze the data present in different machines at different locations very quickly and in a very cost effective way.



3) What is Hadoop MapReduce?

Ans: Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for circulated processing of large data sets on compute clusters of commodity hardware. It is a sub project of the Apache Hadoop project. The framework takes care of scheduling tasks, monitoring them and re executing any failed tasks.



4) Define HortonWorks?

Ans: A software enterprise firm which specializes in open source Apache Hadoop development and support. HortonWorks was launched in 2011 by Yahoo and Benchmark Capital, and its flagship product is Hortonworks Data Platform, which is powered by Apache Hadoop. Hortonworks Data Platform is designed as an open source platform that facilitates integrating Apache Hadoop with an enterprise’s existing data architectures.



5) Explain Hadoop Distributed File System?

Ans: The Hadoop Distributed File System (HDFS) is a sub project of the Apache Hadoop project. This Apache Software Foundation project is designed to supply a fault tolerant file system designed to run on commodity hardware. According to The Apache Software Foundation the primary objective of HDFS is to store data consistently even in the presence of failures including Name Node failures, Data Node failures and network partitions. The Name Node is a single point of failure for the HDFS cluster and a Data Node stores data in the Hadoop file management system.



6) Define unstructured data?

Ans: Data can be chosen as unstructured or structured data for classification within an organization. The term unstructured data refers to any data that has no particular structure such as, images, videos, email, documents and text are all considered to be unstructured data within a dataset.



7) Define Fault Tolerance?

Ans: Assume that you have a file stored in a system, and due to some technical problem that file gets destroyed. Then there is no chance of getting the data back present in that file. To avoid such situations, Hadoop has introduced the quality of fault tolerance in HDFS. In Hadoop, when we store a file, it automatically gets simulated at two other locations also. So even if one or two of the systems collapse, the file is still available on the third system.



8) What are the components of Hadoop?

Ans: Core components of Hadoop are HDFS and MapReduce. HDFS is fundamentally used to store large data sets and MapReduce is used to process such large data sets.



9) What are the key features of HDFS?

Ans: HDFS is highly fault tolerant, with high through put suitable for applications with large data sets, streaming access to file system data and can be built out of commodity hardware.



10) What is streaming access?

Ans: Since HDFS works on the principle of Write Once Read Many the feature of streaming access is particularly important in HDFS. HDFS focuses not so much on storing the data but how to reclaim it at the fastest possible speed, particularly while analyzing logs. In HDFS, reading the complete data is more important than the time taken to obtain a single record from the data.
Read More...