Finding
If using sqoop to import a mysql table to hive, it actually will use hdfs as a staging place and will leave a tmp file there, so if the mysql table is already imported to hdfs, then the import to hive will fail with error of file exists.
The actual hdfs file's location is /user/the_current_user, in this case it is /user/root
Solution to this:
remove or rename the first time sqoop importd to hdfs:
1. hdfs dfs -mv /user/root/employees /user/root/employeesHDFS
2. re-run the sqoop import:
sqoop import --connect jdbc:mysql://localhost/employees --username root --P --table employees --hive-import --hive-table employeesImported2Hive
note: this time indicate the hive table's name different than the source table name in mysql
Change the table's name
alter table employeesimported2hive rename to emp;
get the table schema
create new partitioned table
Create table partitionedemp
CREATE TABLE partitionedemp
(emp_no int,
birth_date string,
first_name string,
last_name string
)
Partitioned By (gender string, hire_date string)
Create a new partitioned table:
set the partition mode to nonstrict
Insert data to the partitioned table
troubleshooting