Before you install the new Hadoop version

  1. Make sure that any previous upgrade is finalized before proceeding with another upgrade. To find out whether the cluster needs to be finalized run the command:
  2. $ hadoop dfsadmin -upgradeProgress status
  3. Stop all client applications running on the MapReduce cluster.
  4. Stop the MapReduce cluster with
  5. $ stop-mapred.sh

and kill any orphaned task process on the TaskTrackers.

  1. Stop all client applications running on the HDFS cluster.
  2. Perform some sanity checks on HDFS.
    • Perform a filesystem check:
    • $ hadoop fsck / -files -blocks -locations > dfs-v-old-fsck-1.log
    • Fix HDFS to the point there are no errors. The resulting file will contain a complete block map of the file system. Note: Redirecting the fsck _output is recommend for large clusters in order to avoid time consuming output to STDOUT_.
    • Save a complete listing of the HDFS namespace to a local file:
    • $ hadoop dfs -lsr / > dfs-v-old-lsr-1.log
    • Create a list of DataNodes participating in the cluster:
    • $ hadoop dfsadmin -report > dfs-v-old-report-1.log
  3. Optionally, copy all or unrecoverable only data stored in HDFS to a local file system or a backup instance of HDFS.
  4. Optionally, stop and restart HDFS cluster, in order to create an up-to-date namespace checkpoint of the old version:
  5. $ stop-dfs.sh
  6. $ start-dfs.sh
  7. Optionally, repeat 5, 6, 7 and compare the results with the previous run to ensure the state of the file system remained unchanged.
  8. Create a backup copy of the dfs.name.dir directory on the NameNode (if you followed my Hadoop tutorials: /app/hadoop/tmp/dfs/name). Among other important files, the dfs.name.dir directory includes the checkpoint files edits and fsimage.
  9. Stop the HDFS cluster.
  10. $ stop-dfs.sh

Verify that HDFS has really stopped, and kill any orphaned DataNode processes on the DataNodes.

results matching ""

    No results matching ""