Oracle Data Mart Builder Population Guide
Release 2.6

A73318-01

Library

Product

Contents

Index

Prev Next

1
Understanding Star Schemas

This chapter explains dimensional database modeling and the concepts of star schemas.

In this chapter, the following topics are discussed:

Two Data Models

In most database environments, users perform two basic types of tasks: modification (inserting, updating, and deleting records) and retrieval (queries). Modifying records is generally known as online transaction processing (OLTP). Data retrieval is referred to as online analytical processing (OLAP) or decision support, because the information is often used to make business decisions. This section describes these data models and their structural requirements.

When database records are modified, the most important requirements are update performance and data integrity. These needs are addressed by the entity relation model of organizing data. Entity relation schemas are highly normalized. This means that data redundancy is eliminated by separating the data into multiple tables. The process of normalization results in a complex schema with many tables and join paths.

When database records are retrieved, the most important requirements are query performance and schema simplicity. These needs are best addressed by the dimensional model. Another name for the dimensional model is the star schema. A diagram of a star schema resembles a star, with a fact table at the center. The following figure is a sample star schema.


A fact table usually contains numeric measurements, and is the only type of table with multiple joins to other tables. Surrounding the fact table are dimension tables, which are related to the fact table by a single join. Dimension tables contain data that describe the different characteristics, or dimensions, of a business. Data warehouses and data marts are usually based on a star schema.

In a star schema, subjects are either facts or dimensions. You define and organize subjects according to how they are measured and whether or not they change over time. Facts change regularly, and dimensions do not change, or change very slowly.

Separating facts and dimensions yields a subject-oriented design where data is stored according to logical relationships, not according to how the data was entered. This structure is easier for both users and applications to understand and navigate.

The following table summarizes the different characteristics for online transaction processing (OLTP) and online analytical processing (OLAP) databases:

Characteristic   Online Transaction Processing (OLTP)   Online Analytical Processing (OLAP)  

Typical application

Typical user  

Order entry

Billing  

Sales and marketing

Dept. managers  

Data model

Table structure

Number of tables

Number of join paths

Typical schema  

Entity relation

Highly normalized

Many tables

Many join paths

Complex schema  

Dimensional

Denormalized

Fewer tables

Fewer join paths

Simple schema  

Data update  

Data is time-variant, frequently updated (twinkling)  

Data is a snapshot, nonvolatile, seldom updated  

Data structure  

Activity-oriented  

Subject-oriented  

Number of users  

Many users  

Fewer users  

Optimization  

Optimized for inserts, updates, deletions  

Optimized for queries  

Star Schemas

A star schema is composed of one or more central fact tables, a set of dimension tables, and the joins that relate the dimension tables to the fact tables. This section describes these components and outlines some of the decisions you need to make before designing a decision-support schema.

Fact Tables

A fact table contains data columns for the numeric measurements of a business. It also includes a set of columns that form a concatenated, or composite key. Each column of the concatenated key is a foreign key drawn from a dimensional table primary key. Fact tables usually have few columns and many rows, which result in relatively long and narrowly shaped tables.

In the star schema diagram shown earlier in this chapter, the measurements in the fact table are daily totals of sales in dollars, sales in units, and cost in dollars of each product sold. The level of detail of a single record in a fact table is called the granularity of the fact table. In this diagram, the granularity is daily item totals. Each record in the fact table represents the total sales of a specific product in a retail store on one day. Each new combination of product, store, or day generates a different record in the fact table.

The most useful facts are numeric, continuously valued, and additive. A continuously valued fact is a numeric measurement that varies every time it is measured. A fact is additive if it makes sense to add the measurement across the dimensions. Most queries against a fact table access thousands or hundreds of thousands of records to construct a result set of relatively few rows. It is helpful if you can compress these records into the result set by adding them or performing other mathematical operations.

Fact tables in a data mart are populated with data extracted from an OLTP system or a data warehouse. A snapshot of the source data is regularly extracted and moved to the data mart, usually at the same time every day, every week, or every month.

Although the diagram in this chapter shows a single fact table, a star schema can have multiple fact tables. A more complex schema with multiple fact tables is useful when you need to keep separate sets of measurements that share a common set of dimension tables. All operations you can perform in an Oracle Data Mart are also supported for schemas with multiple fact tables.

Dimension Tables

Dimension tables store descriptions of the characteristics of a business. A dimension is usually descriptive information that qualifies a fact. For example, each record in a product dimension represents a specific product. In the star schema shown at the beginning of this chapter, the product, customer, promotion, and time dimensions describe the measurements in the fact table. Dimensions do not change, or change slowly over time.

The shape of dimension tables is typically wide and short because they contain few records and many columns. The columns of a dimension table are also called attributes of the dimension table. Each dimension table in a star schema database has a single-part primary key joined to the fact table.

An important design characteristic of a star schema database is that you can quickly browse a single dimension table. This is possible because dimension tables are flat and denormalized. You can browse a single dimension table to determine the constraints and row headers to use when you query the fact table.

Most star schemas include a time dimension. A time dimension table makes it possible to analyze historic data without using complex SQL calculations. For example, you can analyze your data by workdays as opposed to holidays, by weekdays as opposed to weekends, by fiscal periods, or by special events. If the granularity of the fact table is daily sales, each record in the time dimension table represents a day.

Star Schema Key Structure

The join constraints in a star schema define the relationships between a fact table and its dimension tables. In the star schema diagram at the beginning of the chapter, the product key is the primary key in the product dimension table. This means that each row in the product dimension table has a unique product key. The product key in the fact table is a foreign key drawn from the product dimension table.

Each row in a fact table must contain a primary key value from each dimension table. This rule is called referential integrity and is an important requirement in decision-support databases. The reference from the foreign key to the primary key is the mechanism for verifying key values between the two tables. Referential integrity must be maintained to ensure valid query results.

The primary key of a fact table is a combination of its foreign keys. This is called a concatenated key. The join cardinality of dimension tables to fact tables is one-to-many, because each record in a dimension table can describe many records in the fact table.

A star schema database uses very few joins, and each join expresses the relationship between the elements of the underlying business. For example, in the star schema diagram at the beginning of this chapter, the join between the product dimension table and fact table represents the relationship between the company's products and its sales.

Defining Fact and Dimension Tables

Designing a dimensional data mart for any organization requires an understanding of business rules and the content of data that is collected. Ultimately, you need to match the needs of the users with the realities of the accessible data.

The following points summarize the process of designing a schema for a decision-support database:

  1. Choose a business process to model in order to identify the fact tables.

    This is a major operational process in the organization that generates raw data. Users store and retrieve data in an existing system, such as a legacy system or company-wide data warehouse. In the diagram in this chapter, the business process is sales. Examples of other business processes are orders, invoices, shipments, inventory, and general ledger.

  2. Choose the granularity of each fact table.

    This is the most detailed level of data to include in the fact table for the business process. The finer the granularity of each dimension, the more precisely a query can get through the database. The granularity of a fact table is usually the individual transaction, the individual line item, a daily snapshot, or a monthly snapshot.

  3. Choose the dimensions for each fact table and their respective granularity.

    Examples of typical dimensions are time, product, customer, promotion, transaction type, and status. For each dimension, identify all the distinct attributes that describe the dimension. The most useful attributes are textual and discrete. A dimension can include numeric attributes, such as package size, if the attribute acts more like a description than a measurement.

  4. Choose the measured facts.

    These are the measurements that make up each fact table record. Typical measured facts are numeric additive quantities, such as sales in dollars or sales in units.

Star Schema Advantages

Star schemas are easy for end users and applications to understand and navigate. With a well-designed schema, users can quickly analyze large, multidimensional data sets. The main advantages of star schemas in a decision-support environment are:

Star Schemas in the Data Mart Environment

An Oracle Data Mart is a high-performance, decision-support system, designed to work most efficiently with a database structured in a star schema. However, you can choose to implement a different type of design, such as an OLTP design based on an entity relation database model.

If you use a structure other than a star schema for an Oracle data mart, most features function in the same manner as they do for a star schema. However, you sacrifice the query and load performance benefits of the dimensional model. This guide focuses on the star schema design as an example for data mart population.




Prev

Next
Oracle
Copyright © 1999 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index