Member-only story
Hive query small tips
This article only talks about the optimisation on DML layer with an explanation on Hadoop MapReduce mechanisms.
Suppose we have two tables: fact_order, dim_customer. fact_order keeps all customer order transaction history, dim_customer is the main dimension table for customer attributes.
The table structures are as below:
hive> desc fact_order;OKorder_id string Nonecustomer_id string Noneproduct_name string Nonetransaction_id string Noneorder_status string Nonetransaction_status string Nonecreate_time timestamp Noneupdate_time timestamp Nonehive> desc dim_customer;OKid string Nonename string Nonegender string…