6.6 loading data demo

6.6.1 step 1: preparing data

6.6.2 step 2: put the data file to hdfs

hdfs dfs -put ./opt/hadoop/pig-0.15.0/test/student_data.txt hdfs://localhost:9000/Data

6.6.3 step 3: verify the data

6.6.4 Step 4: using Grunt in Pig

student = LOAD 'hdfs://localhost:9000/Data/student_data.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray);

Note: there is no append method in Pig to append data to existing table or whatever.

6.6.5 Step 5: output

STORE student INTO 'hdfs://localhost:9000/Output/' USING PigStorage (',');

6.7 Diagnostic Operators

6.7.1 Dump relation

with relation created in advance, using Dump can show the result of relation

6.8 Describe Operator

describe relation

6.9 Explain Operators

explain relation

6.10 Illustrate Operator

illustrate relation

6.11 Group Operator

test file: student_details.txt

put this file to hdfs

hdfs dfs -put ./opt/hadoop/pig-0.15.0/test/student_details.txt hdfs://localhost:9000/Data

verification:

Now the student_details data is on HDFS

create relation

in pig grunt:

student_details = LOAD 'hdfs://localhost:9000/Data/student_details.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray)

create group relation

6.12 Co-group Operator

Another employee_details.txt is put on the HDFS

create a new relation for employee_details

Now create a co-group relation

grunt> cogroup_data = COGROUP student_details by age, employee_details by age;

the MapReduce job can be checked in a GUI console:

http://localhost:8088/cluster

Click on the Tracking_UI:

Question:

Why the job 0018 started twice? and still hanging after 20 minutes? and this is just a simple co-group realtion dump in pig grunt shell.

and who is Dr. Who?

http://stackoverflow.com/questions/38154917/hadoop-hanging-more-than-an-hour-to-execute-a-dump-of-co-grouping-in-pig-grunt

6.6 loading data demo