6.6 loading data demo
6.6.1 step 1: preparing data
6.6.2 step 2: put the data file to hdfs
hdfs dfs -put ./opt/hadoop/pig-0.15.0/test/student_data.txt hdfs://localhost:9000/Data
6.6.3 step 3: verify the data
6.6.4 Step 4: using Grunt in Pig
student = LOAD 'hdfs://localhost:9000/Data/student_data.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray);
Note: there is no append method in Pig to append data to existing table or whatever.
6.6.5 Step 5: output
STORE student INTO 'hdfs://localhost:9000/Output/' USING PigStorage (',');
6.7 Diagnostic Operators
6.7.1 Dump relation
with relation created in advance, using Dump can show the result of relation
6.8 Describe Operator
describe relation
6.9 Explain Operators
explain relation
6.10 Illustrate Operator
illustrate relation
6.11 Group Operator
test file: student_details.txt
put this file to hdfs
hdfs dfs -put ./opt/hadoop/pig-0.15.0/test/student_details.txt hdfs://localhost:9000/Data
verification:
Now the student_details data is on HDFS
create relation
in pig grunt:
student_details = LOAD 'hdfs://localhost:9000/Data/student_details.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray)
create group relation
6.12 Co-group Operator
Another employee_details.txt is put on the HDFS
create a new relation for employee_details
Now create a co-group relation
grunt> cogroup_data = COGROUP student_details by age, employee_details by age;
the MapReduce job can be checked in a GUI console:
Click on the Tracking_UI:
Question:
Why the job 0018 started twice? and still hanging after 20 minutes? and this is just a simple co-group realtion dump in pig grunt shell.
and who is Dr. Who?