Wednesday, March 31, 2010

Data Warehousing issue

Data Warehousing issue: Untimely reference data

By Djoni Darmawikarta

 

 

Coming ahead of time

You can have a large new reference data coming early, no transaction uses them at the time they land in your data warehouse as yet. If you load them into you data warehouse, they can slow down user queries unnecessarily. For example, your company has just acquired a large volume survey data that the marketing folks plan to use for email marketing. The data primarily contains list of names and their email. You don’t have any transaction attaches to them until you really send them email. You can defer its loading into the data warehouse.

 

You have the following options
1) Let them stay in the staging area, don’t load them into the data warehouse

2) Option 1 and load only the reference data at the time needed by incoming transaction

3) Load all of them whenever any of them, even just one, needed by any incoming transaction

4) Load all of them regardless of any need

 

Trade off between disk space and speed, option 4 is fastest in loading, takes most disk space, but can slow down user query.


Thursday, March 11, 2010

star schema rule 0 - a star

A star schema contains one fact table and one or more dimension tables.
The dimensions relate to the fact directly and on their surrogate keys.

star schema rule 4 ... no fact to fact

In a multi-stars schema, a fact cannot relate to any other fact

star schema rule 3 ... no dimension to dimension

In a star schema or multi-stars schema, a dimension cannot relate to any other dimension

star schema rule 2 ... dimension table

A dimension table must be in at least one star, which implies that a dimension table can be in more than one star

star schema rule 1 ... fact table

A fact table can only be in one star