Hive Query Optimisation

Chloooo
2 min readSep 6, 2020

Hive query small tips

Photo by Vivek Doshi on Unsplash

This article only talks about the optimisation on DML layer with an explanation on Hadoop MapReduce mechanisms.

Suppose we have two tables: fact_order, dim_customer. fact_order keeps all customer order transaction history, dim_customer is the main dimension table for customer attributes.

The table structures are as below:

hive> desc fact_order;OKorder_id                    string                  Nonecustomer_id                 string                  Noneproduct_name                string                  Nonetransaction_id              string                  Noneorder_status                string                  Nonetransaction_status          string                  Nonecreate_time                 timestamp               Noneupdate_time                 timestamp               Nonehive> desc dim_customer;OKid                    string                  Nonename                  string                  Nonegender                string…

--

--