10.8 Configuring Flume (Cloudera Manager path)
10.8.1 Build or Download the custom Flume Source
A pre-built version of the custom Flume Source is available here.
The flume-sources directory contains a Maven project with a custom Flume source designed to connect to the Twitter Streaming API and ingest tweets in a raw JSON format into HDFS.
To build the flume-sources JAR, from the root of the git repository:
$ cd flume-sources
$ mvn package
$ cd ..
This will generate a file called flume-sources-1.0-SNAPSHOT.jar in the target directory.
10.8.2 Add the JAR to the Flume classpath
Copy flume-sources-1.0-SNAPSHOT.jar to /usr/lib/flume-ng/plugins.d/twitter-streaming/lib/flume-sources-1.0-SNAPSHOT.jar and also to /var/lib/flume-ng/plugins.d/twitter-streaming/lib/flume-sources-1.0-SNAPSHOT.jar, just to be sure (actually, refer to Plugin Directories in Cloudera manager->flume->configuration->Agent(Default)). If those places don't exist, sudo mkdir them.
10.8.3 Configure Flume agent in Cloudera Manager Web UI flume
Go to the Flume Service page (by selecting Flume service from the Services menu or from the All Services page).
Pull down the Configuration tab, and select View and Edit.
Select the Agent (Default) in the left hand column.
Set the Agent Name property to TwitterAgent whose configuration is defined in flume.conf.
Copy the contents of flume.conf file, in its entirety, into the Configuration File field. -- If you wish to edit the keywords and add Twitter API related data, now might be the right time to do it.
Click Save Changes button.