Appendix 10 The Hadoop Ecosystem in a nutshell

http://www.sagarjain.com/the-hadoop-ecosystem-in-a-nutshell/

The picture below shows the Hadoop Ecosystem and its components. Lets briefly talks what these components means.

http://www.sagarjain.com/wp-content/uploads/2014/12/Hadoop-Echosystem.jpg

HDFS: Hadoop Distributed File System as name suggest is a distributed file system which can be installed on commodity servers. HDFS offers a way to store large data files on numerous machines and is designed to be fault tolerant due its data replication feature. Learn more on HDFS at Apache

YARN: Yet Another Resource Negotiator aka MapReduce V2. Its a framework for job scheduling and managing resources on cluster. Learn more on HDFS at Apache

Flume: Its a tool or service that is used for aggregating, collecting and moving large amount of log data in and out of Hadoop. More information can be found here.

Zookeeper: Its a framework that enables highly reliable distributed coordination of nodes in the cluster. Check this interesting video on Zookeeper.

Sqoop: “SQL-to-Hadoop” or Sqoop is a tool for efficient transfer between Hadoop and structured data sources i.e Relational Database or other Hadoop data stores, e.g. Hive or HBase. Explore more on sqoop here.

Oozie: Workflow scheduler system to manage Hadoop jobs. The jobs may include non MapReduce jobs. Check out more.

Pig: Initially developed at Yahoo!, Pig is a framework consisting of high level scripting language i.e Pig Latin along with a run time environment to which allows user to run MapReduce on hadoop cluster. Refer link for more on Pig.

Mahout: Mahout is a scalable machine learning and data mining library. Check this video on Mahout and Machine Learning.

R Connectors: R Connectors are used for generating statistics of the nodes in a cluster. More on Oracle R Connectors.

Hive: Hive is an open-source data warehouse system for querying and analyzing large datasets stored in Hadoop files. More on Hive at Apache.

HBase: Its a column-oriented non-rational database management system that runs on top of HDFS. A 3 min video on Hbase.

Amabri: This component of Hadoop ecosystem is used for provisioning, managing and monitoring the hadoop cluster. Check this link for more information.

results matching ""

    No results matching ""