Data Warehousing issue: Untimely reference data
By Djoni Darmawikarta
Coming ahead of time
You can have a large new reference data coming early, no transaction uses them at the time they land in your data warehouse as yet. If you load them into you data warehouse, they can slow down user queries unnecessarily. For example, your company has just acquired a large volume survey data that the marketing folks plan to use for email marketing. The data primarily contains list of names and their email. You don’t have any transaction attaches to them until you really send them email. You can defer its loading into the data warehouse.
You have the following options
1) Let them stay in the staging area, don’t load them into the data warehouse
2) Option 1 and load only the reference data at the time needed by incoming transaction
3) Load all of them whenever any of them, even just one, needed by any incoming transaction
4) Load all of them regardless of any need
Trade off between disk space and speed, option 4 is fastest in loading, takes most disk space, but can slow down user query.