Google
Information Storage and Retrieval: June 2014

Pages

Sunday, June 15, 2014

DataWarehousing Terms

Dimensional Modeling:

A way to design datawarehouse. It employs Ralph Kimball's bottom up approach to design data marts using conformed Facts and Dimensions. It contrasts with ER Modeling in its purpose. The purpose of ER modeling is to store the data in such a way that redundancy is minimized and inserts and updates are quick. Dimensional modeling on the other hand stores the data in such a way that querying or retrieval is fast. 

Role Playing Dimension:

Its is a dimension that has multiple meanings and can be used for multiple purposes. A Date dimension is an example.

Junk Dimension:

A junk dimension is a convenient grouping of flags and indicators into a separate table. In this way, you don't need to store these fields into the fact table and hence you don't need to create separate dimensions for them.

Factless fact:

A table which captures an occurrence of an event but do not contain any actual measurements is known as factless fact table.  This table contains only the keys of dimensions. Typical example can be a fact table that captures the attendance of a student on a particular day.

Degenerate Dimension:

There are instances when an attribute is present in a fact table but it does not have any corresponding dimension. Such attributes are known as degenrate dimensions. For example, Order Numbers, Invoice Number, Transaction Dates, Sale Dates, Credit-Debit Indicator etc can all be classified as degenerate dimensions

Conformed Dimensions and Conformed Facts:

When there is need to analyze facts from two or more data marts, conformed dimensions are used. This is also known as "drilling-across". Time dimension is an example of conformed dimension.

Conforming the facts is the process where the names of measure columns in different fact tables are kept identical.  

Master Data Management:

MDM is the process of creating and managing data that an organization must have as a single copy. For example, data about customers or products can be used as a master copy across different units of the same organization. master data provides a single version of truth. MDM is similar to datawarehousing in terms of ETL process but there are following differences between the two:

a. MDM only applies to entities and not to transactional data while a datawarehouse includes both transactional and non-transactional data. MDM is about dimensional data. It does not concerns with data in Facts.

b. MDM is about reports which are concerned with data goverance, data quality and compliance. Datawarehousing is about analytical reports