• HADOOP practice for beginners with illustration
  • HADOOP practice for beginners with illustration
  • 1. Pre-requisite: Environment setting
  • 2.HDFS system
    • 2.1 Hadoop ecosystem overview
    • 2.2 Current version of each component (by 2016.07):
    • 2.3 Download and install HDFS
    • privilege setting
    • 2.4 verify hadoop version
    • 2.5 Start hdfs
  • 3. Eclipse:
  • 4. MySql
    • 4.1 Download and installation:
    • 4.2 Download and install sample database
  • 5. Hive
    • 5.1 download hive
    • 5.2 Configuring the Metastore Database
    • 5.3 Run schematool
    • 5.4 Cleanup
    • 5.5 SchemaTool
    • 5.6 Start hive
    • 5.7 error troubleshooting
    • 5.8 Config Hive
    • 5.9 Hive operations
    • 6. ZooKeeper
  • 7. HBase
    • 7.1 Download
    • 7.2 extract and move to hadoop directory
    • 7.3 start region server
    • 7.4 start hbase shell
    • 7.5 run status command
    • 7.6 Create a table.
    • 7.7 insert rows into table in hbase
    • 7.8 retrieve rows from table in hbase
    • 7.9 scan table in hbase for matched result with filter
    • 7.10 Access HBASE via GUI
    • 7.11 Pseudo-distributed mode [1]
  • 8. PIG
    • 8.1 Download
    • 6.2 Installation
    • 6.3 Configuration
    • 6.4 Start Pig
    • 6.5 Using Grunt shell
    • 6.6 loading data demo
  • 7. Sqoop
    • 7.1 Setup sqoop
    • 7.2 Sqoop and Hive
  • 8. Hive Performance Tuning
    • 8.1 Leveraging Time-based Partitioning
    • 8.2 Set Custom Schema
    • 8.3 DISTRIBUTE BY…SORT BY v. ORDER BY
    • 8.4 Avoid “SELECT count(DISTINCT field) FROM tbl”
    • 8.5 Considering the Cardinality within GROUP BY
    • 8.6 Partition
  • 9. Flume
    • 9.1 download
    • 9.2 installation
  • 10. Using flume to load twitter data into hadoop
    • 10.1 prepare the twitter account
    • 10.2 create twitter application
    • 10.3 Create a conf file for the flume job
    • 10.4 Download the file
    • 10.5 flume.conf sample for twitter agent
    • 10.6 set up the path in the conf file
    • 10.7 The full conf file: (the agent's name is called TwitterAgent)
    • 10.8 Configuring Flume (Cloudera Manager path)
    • 10.9 start the flume agent
  • 11. Spark and scala
    • 11.1 download
    • 11.2 installation
    • 11.3 Run Spark
    • Create a simple text file
    • Create a simple RDD
    • 12.1 Download
    • 12.2 Installation
  • 13. IPython
  • ELK
  • Appendix 1. Configure network for multiple nodes in hadoop cluster
    • 1. collect info from windows server
    • 2. collect info from vm machine (pick any one as they are cloned)
    • 3. set the vm using customized definition for network: VMnet8 (NAT Mode)
    • 4. configure name server
    • 5. configure hosts file
    • 6. configure network interface
    • 7. restart network service
    • 8. set the MAC address
    • 9. set linux security type
    • 10. set fastest mirror
    • 11. chkconfig NetworkManager off
    • 12. servie network stop
    • 13. yum -y install perl openssl
    • 14. ssh-keygen[2]
    • 15. set hostname
  • Appendix 2. How-to install DNS server on hadoop cluster in CentOS7
    • [3]Our Goal
    • Default domain
    • Install BIND on DNS Servers
    • Configure Primary DNS Server
    • Configure Bind
    • Configure Local File to specify DNS zones
    • Create Forward Zone File
    • Edit our forward zone file
    • Create Reverse Zone File(s)
    • Check
    • Start BIND
    • Test
    • Terminology & Reading[4]
  • Appendix 3. How to batch generate ssh key and send to multiple servers
    • Method 1 Using Expect
    • Method 2: Using Python
  • Appendix 4: Load data from file system to hdfs
    • method 1 using hdfs command
    • method 2 using hive command
    • method 3 cron job
    • CDH5
    • OOZIE
  • Appendix 6: Configuring Hadoop Security
  • Appendix 5: Move Data from MySQL to HDFS
  • Appendix 6: Load data into table in Hive
    • non-python:
    • Using python
  • Appendix 7. Move Data (using Sqoop) from MySQL to HIVE
    • Finding
  • Appendix 8. HDFS upgrade instructions
    • Before you install the new Hadoop version
    • Install the new Hadoop version
    • After you have installed the new Hadoop version
    • Finishing the HDFS upgrade process
    • How to finalize an HDFS upgrade
  • Appendix 9. Tune Hadoop Cluster to get Maximum Performance
    • How OS tuning will improve performance of Hadoop?
    • 1. Turn off the Power savings option in BIOS:
    • 2. Open file handles and files:
    • 3. FileSystem Type & Reserved Space:
    • 4. Network Parameters Tuning:
    • 5. Transparent Huge Page Compaction:
    • 6. Memory Swapping:
  • Appendix 10 The Hadoop Ecosystem in a nutshell
  • Appendix 11. Common Linux Knowledge
    • Kill a job
    • Kill jps process
    • soft link
    • to change password to null
    • couldn't find hdfs
    • couldn't find jps
    • hdfs health
    • check applications on hdfs:
  • Visualize near-real-time stock price changes using Solr and Banana UI
    • Summary of steps
    • Step-by-step
    • Conclusion
    • Flume Near Real-Time Indexing Reference
  • Cloudera documentation reference
    • Regular-Expression Examples
  • Project: Bible Statistics
    • Loading data
    • Step by step:
  • Project 2: weblog analysis
    • Processing & Analytical goals:
    • Solution:
  • Disclaimer
Powered by GitBook

3. Eclipse:

3. Eclipse:

http://eclipse.mirror.rafal.ca/technology/epp/downloads/release/mars/2/eclipse-jee-mars-2-linux-gtk-x86_64.tar.gz

results matching ""

    No results matching ""