Wednesday, July 7, 2010

ReE#1

Re-engineering #1

Many companies put their computers to providing automated facility for their employees without deliberating effort in the first place to gaining optimal benefits out of their investment.

A typical example is, providing an order entry screen, completely removing the paper order form step that the salesmen wrote in before the computer data entry clerks keyed the orders into the computer.

But, why not providing the computerized order entry facility directly to the customers, enabling them to place their orders in their convenient space and time, and save our efforts?

Monday, April 19, 2010

The Need of a Data Warehouse

The Need of a Data Warehouse
By: Djoni Darmawikarta

You have a need of data for analysis and reporting.

Whatever the specific reasons are, fundamentally, you, whether you are a technical developer or a business user, need a data warehouse because accessing the data directly from their operational systems (transactional systems) has not been satisfying.

Your dissatisfaction is either because you have difficulty to understand or access, or you get a slow response time, or both. Technically, the problems happen because the operational/transaction systems are not developed for your data access purpose.

A data warehouse is the middleware between the source (the operational/transactional systems) and you. A data warehouse resolves the problems by dedicating its design, implementation and delivery, to serve your analytical data and reporting requirement.

Your data warehouse collects data you need from various sources, integrate, structure, store, and make them available such that you can access them in easy manner, at the time you need, and with your expected response time.

Wednesday, March 31, 2010

Data Warehousing issue

Data Warehousing issue: Untimely reference data

By Djoni Darmawikarta

 

 

Coming ahead of time

You can have a large new reference data coming early, no transaction uses them at the time they land in your data warehouse as yet. If you load them into you data warehouse, they can slow down user queries unnecessarily. For example, your company has just acquired a large volume survey data that the marketing folks plan to use for email marketing. The data primarily contains list of names and their email. You don’t have any transaction attaches to them until you really send them email. You can defer its loading into the data warehouse.

 

You have the following options
1) Let them stay in the staging area, don’t load them into the data warehouse

2) Option 1 and load only the reference data at the time needed by incoming transaction

3) Load all of them whenever any of them, even just one, needed by any incoming transaction

4) Load all of them regardless of any need

 

Trade off between disk space and speed, option 4 is fastest in loading, takes most disk space, but can slow down user query.


Thursday, March 11, 2010

star schema rule 0 - a star

A star schema contains one fact table and one or more dimension tables.
The dimensions relate to the fact directly and on their surrogate keys.

star schema rule 4 ... no fact to fact

In a multi-stars schema, a fact cannot relate to any other fact

star schema rule 3 ... no dimension to dimension

In a star schema or multi-stars schema, a dimension cannot relate to any other dimension

star schema rule 2 ... dimension table

A dimension table must be in at least one star, which implies that a dimension table can be in more than one star

star schema rule 1 ... fact table

A fact table can only be in one star

Tuesday, February 16, 2010

Dimensional query - Rule no. 3 ... Multi-stars query must have common dimensions

For example, the following is a multi-stars query on sales star and stock star. The two stars have date_dim and product_dim dimension tables in common. The top query joins the queries of the individual star on the common dimensions.

SELECT ...
FROM
(SELECT ...
FROM sales_fact sf, date_dim dd, store_dim sd, product_dim pd
WHERE
sf.dd_sk = dd.dd_sk
AND sf.sd_sk = sd.sd_sk
AND sf.pd_sk = pd.pd_sk
AND dd.month_name = "January"
AND dd.year = 2010
AND sd.store_name = "DownTown Branch"
AND pd.ISBN = 0975212826) sales
,

(SELECT ...
FROM inventory_fact if, date_dim dd, stock_location_dim sd, product_dim pd
WHERE
if.dd_sk = dd.dd_sk
AND if.sd_sk = sd.sd_sk
AND if.pd_sk = pd.pd_sk) stock

WHERE
sales.dd_sk = stock.dd_sk
AND sales.pd_sk = stock.pd_sk

Dimensional query - rule no. 2...Constraints on the dimension

For example,


SELECT ...
FROM sales_fact sf, date_dim dd, store_dim sd, product_dim pd
WHERE
sf.dd_sk = dd.dd_sk
AND sf.sd_sk = sd.sd_sk
AND sf.pd_sk = pd.pd_sk
AND dd.month_name = "January" AND dd.year = 2010
AND sd.store_name = "DownTown Branch"
AND pd.ISBN = 0975212826

Dimensional query - rule no.1...Join on the surrogate keys

For every star that you want to query, join the fact table to its dimensional tables. So, for example:

SELECT ...
FROM sales_fact sf, date_dim dd, store_dim sd, product_dim pd
WHERE
sf.dd_sk = dd.dd_sk
AND sf.sd_sk = sd.sd_sk
AND sf.pd_sk = pd.pd_sk

Wednesday, February 10, 2010

Dimension table

Dimension table contains attributes that define or constraint the facts in the fact table that relates to the dimension. In the case of sales amount fact for example, one of its dimension can be Product. The Product dimension can have columns such as Product Code, Product Name, and Brand.

Fact tables likely have time-related dimension, such as Date dimension...one distinct characteristic of data warehouse is keeping history of data!

Fact table

Fact table contains one or more fact column. Fact is what the business measures; for example, sales amount, sales quantity, duration of calls, number of calls (in telephone business), grade (school test result), rank (in sports for example)

In addition fact table has all the surrogate keys of its related dimensions.

Monday, February 8, 2010

Purpose of dimensional model

Dimensional model is particularly suitable for developing data warehouse, and data mart as well--smaller, often departmental, data warehouse.

Sunday, February 7, 2010

Benefiting from this blog

To understand and get benefit from Dimensional Modeling Cookbook blogs you need to have working knowledge of relational database and SQL.