Business scenario and technical implementation about world wide currency conversion with Redshift and Holistics
Hi there! When conducting a profit related analysis project and creating an visualisation to showcase your results across regions, having a currency filter is often required. …
Hive query small tips
This article only talks about the optimisation on DML layer with an explanation on Hadoop MapReduce mechanisms.
Suppose we have two tables: fact_order, dim_customer. fact_order keeps all customer order transaction history, dim_customer is the main dimension table for customer attributes.
The table structures are as below:
The Elegant Way of Deletion In Database
Manipulating with data is common for IT professionals including database administrators, developers, data analysts and scientists, and playing with data in database is simple and straight forward. SQL is a specialized language for database operations and is very easy to understand and use…
Redshift practical knowledge to speed up analytics process
Time zone is commonly used in Business Intelligence analytics to allow reports to convert different time zones. In this article, I will share about how Redshift handle the time zone conversions.
Redshift provides a built-in timezone table with worldwide timezone.
Demonstration on building a realtime data pipeline using Streamsets
StreamSets ia a modern data streaming and integration platform build by company StreamSets, Inc. It is used by many multinational companies such as Shell and Dell.
StreamSets Data Collector (SDC) is an open source data ingestion pipeline as one part of…
In this case study, I will use an example of a financial MNC to illustrate the current challenges and business impacts, and on how building a seamless big data architecture can help solve the problems and continuously benefit.
Table of Contents:
2. Needs and Requirements
3. Data Sources
The right and wrong about data deletion in production databases
It has been a while since I noticed this question and started to think about the severity of it. I recalled about the past experiences on how I have dealt with data and on the reasons, and I realized that…
Model Engineering — random forest parameter tuning
From last post, we can know that:
But that is also the reason Random Forest can cause overfitting.
In this post, we will use GridSearchCV with Cross Validation…
Sweet treats that will make your analytics easier
It provides a very convenient way to visualize the whole schema and relationships of every entity. It also allows to download the schema as image.
Model Engineering — select and evaluate linear / non-linear models
From previous post, we can know that: