Wednesday, July 7, 2010
ReE#1
Many companies put their computers to providing automated facility for their employees without deliberating effort in the first place to gaining optimal benefits out of their investment.
A typical example is, providing an order entry screen, completely removing the paper order form step that the salesmen wrote in before the computer data entry clerks keyed the orders into the computer.
But, why not providing the computerized order entry facility directly to the customers, enabling them to place their orders in their convenient space and time, and save our efforts?
Monday, April 19, 2010
The Need of a Data Warehouse
You have a need of data for analysis and reporting.
Whatever the specific reasons are, fundamentally, you, whether you are a technical developer or a business user, need a data warehouse because accessing the data directly from their operational systems (transactional systems) has not been satisfying.
Your dissatisfaction is either because you have difficulty to understand or access, or you get a slow response time, or both. Technically, the problems happen because the operational/transaction systems are not developed for your data access purpose.
A data warehouse is the middleware between the source (the operational/transactional systems) and you. A data warehouse resolves the problems by dedicating its design, implementation and delivery, to serve your analytical data and reporting requirement.
Your data warehouse collects data you need from various sources, integrate, structure, store, and make them available such that you can access them in easy manner, at the time you need, and with your expected response time.
Wednesday, April 7, 2010
Wednesday, March 31, 2010
Data Warehousing issue
Data Warehousing issue: Untimely reference data
By Djoni Darmawikarta
Coming ahead of time
You can have a large new reference data coming early, no transaction uses them at the time they land in your data warehouse as yet. If you load them into you data warehouse, they can slow down user queries unnecessarily. For example, your company has just acquired a large volume survey data that the marketing folks plan to use for email marketing. The data primarily contains list of names and their email. You don’t have any transaction attaches to them until you really send them email. You can defer its loading into the data warehouse.
You have the following options
1) Let them stay in the staging area, don’t load them into the data warehouse
2) Option 1 and load only the reference data at the time needed by incoming transaction
3) Load all of them whenever any of them, even just one, needed by any incoming transaction
4) Load all of them regardless of any need
Trade off between disk space and speed, option 4 is fastest in loading, takes most disk space, but can slow down user query.
Thursday, March 11, 2010
star schema rule 0 - a star
The dimensions relate to the fact directly and on their surrogate keys.
star schema rule 4 ... no fact to fact
star schema rule 3 ... no dimension to dimension
star schema rule 2 ... dimension table
Tuesday, February 16, 2010
Dimensional query - Rule no. 3 ... Multi-stars query must have common dimensions
SELECT ...
FROM
(SELECT ...
FROM sales_fact sf, date_dim dd, store_dim sd, product_dim pd
WHERE
sf.dd_sk = dd.dd_sk
AND sf.sd_sk = sd.sd_sk
AND sf.pd_sk = pd.pd_sk
AND dd.month_name = "January"
AND dd.year = 2010
AND sd.store_name = "DownTown Branch"
AND pd.ISBN = 0975212826) sales
,
(SELECT ...
FROM inventory_fact if, date_dim dd, stock_location_dim sd, product_dim pd
WHERE
if.dd_sk = dd.dd_sk
AND if.sd_sk = sd.sd_sk
AND if.pd_sk = pd.pd_sk) stock
WHERE
sales.dd_sk = stock.dd_sk
AND sales.pd_sk = stock.pd_sk
Dimensional query - rule no. 2...Constraints on the dimension
SELECT ...
FROM sales_fact sf, date_dim dd, store_dim sd, product_dim pd
WHERE
sf.dd_sk = dd.dd_sk
AND sf.sd_sk = sd.sd_sk
AND sf.pd_sk = pd.pd_sk
AND dd.month_name = "January" AND dd.year = 2010
AND sd.store_name = "DownTown Branch"
AND pd.ISBN = 0975212826
Dimensional query - rule no.1...Join on the surrogate keys
SELECT ...
FROM sales_fact sf, date_dim dd, store_dim sd, product_dim pd
WHERE
sf.dd_sk = dd.dd_sk
AND sf.sd_sk = sd.sd_sk
AND sf.pd_sk = pd.pd_sk
Wednesday, February 10, 2010
Dimension table
Fact tables likely have time-related dimension, such as Date dimension...one distinct characteristic of data warehouse is keeping history of data!
Fact table
In addition fact table has all the surrogate keys of its related dimensions.