In this section, we discuss the features in Oracle OLAP that you should be aware of when designing analytic workspaces. The content here expands on the concepts introduced in our blogs (where those concepts relate to design) and the general design principles discussed in the preceding section.
Determining Dimensions from user Requirements
As mentioned, user requirements must drive the design of Oracle OLAP cubes. This fact is often overlooked in Oracle OLAP design, as the data is sourced from relational tables or views. Often, these tables are part of a data warehouse with a well-defined structure. The structure of the source tables will be an important influence, but the ultimate structure of the OLAP cubes should be driven by user requirements, not the convenience of loading data from the data warehouse, because often the data warehouse design is not reflective of user requirements. Oracle OLAP cubes can be used solely for their cube-organized materialized views to accelerate performance of queries on data warehouses, but they offer much more.
Relating Oracle OLAP Data to a Star Dimensional Model
OLAP data is often represented in relational tables as a star dimensional model. A central fact table contains the data, and dimension tables can be joined to the fact table to supply additional information about the dimensions. This is called a star model or a star schema because the dimension tables are often shown radiating from the central fact table, like a star. With OLAP data, each of the dimensions ends up being described in its own dimension table.
The star dimensional model is central to Oracle OLAP design. Data that is loaded into Oracle OLAP cubes and dimensions often comes from a star dimensional model. In this model, there is a one-to-one correspondence between a dimension and a dimension table. Likewise, there is a one-to-one correspondence between a cube and a fact table, the Oracle tool used to create and manage analytic workspaces, called Analytic Workspace Manager (AWM), is specifically designed to load Oracle OLAP cubes from star and snowflake dimensional models or from a collection of tables.
Oracle OLAP takes this relationship between star dimensional models and Oracle OLAP cubes even further by automatically creating a dimension view for each dimension and a cube view for each cube. These views can be queried just as if they were relational tables. This is the primary mechanism for querying Oracle OLAP data.
Many tools designed for reporting relational data are optimized for reporting against star models. By exposing cube data in a star schema, Oracle has made data in Oracle OLAP cubes accessible to reporting tools that can use SQL to access data.
Snowflake schemas are the same as stars, except there is a dimension table for each level of a dimension. In each level's dimension table, a column joins to the next higher level table. Snowflake schemas radiate out from the central fact table, hence the name snowflake schemas.
Mapping Relational Data to Multidimensional Objects
In designing Oracle OLAP cubes, bear in mind that every object in your analytic workspace needs to be sourced from some relational table, view, or something that acts like a relational table, such as an external table or a gateway This enables access to many sources outside an Oracle database. You use AWM to define the mapping between relational columns and each multidimensional object, except calculated measures. Therefore, if you know you want to represent regions in your analytic workspace, you need some sort of column that gives a list of regions in some sort of table. Likewise, if you want to load cost data into your cube, and it cannot be calculated from other data already in the cube, you should have a column in a table or view that represents cost data.
Determining Dimensions of Cubes
Dimensions are the heart of an Oracle OLAP design; the dimensions you select affect performance and capabilities more than any other decisions. Often, the first clues as to what dimensions may exist in the data come from the star dimensional model. The list of dimension tables may indicate candidates for dimensions of a cube.
As stated previously, user requirements should dictate the design of dimensions and cubes, but if you intend to use cubes simply as a method for accelerating queries, having the dimensions in the star schema dictate the dimension and cube design may be warranted.
As mentioned earlier in the "General Design Principles" section, you should become aware of the dimension-related features available with the product you are using before determining and designing dimensions. In this section, we describe Oracle OLAP dimension types, hierarchies, and attributes, which are all important considerations for dimensions that you may want to reuse across multiple cubes.
Oracle OLAP offers two dimensions types: user and time. By default, Oracle OLAP dimensions are of type user. Most of your dimensions will be user dimensions.
Time dimensions are just like user dimensions, except they have special attributes (described in the "Attributes" section) and they support time-series calculations. If a dimension is continuous and contains time periods, and if users may want to create measures such as year to date or change from prior year, you should consider adding these special attributes to your dimension definition. Otherwise, you will not be able to create these special calculated measures.
The list of measures (both ones that are loaded and those that are calculated) is sometimes represented as an additional measure dimension by some OLAP reporting tools. In the Oracle OLAP model, this is not a true dimension type. Rather, it is a list of measures, with little of the extra metadata associated with Oracle OLAP dimensions. Of course, you are free to design an Oracle OLAP cube with a single measure called Data or some other generic name dimensioned by a Measure dimension. The Oracle OLAP engine would not know anything special about this Measure dimension, but it would allow you to create hierarchies of measures, with drill-down paths and so forth.
Dimensions can be a simple list of values. More often, however, dimensions have one or more hierarchies. Hierarchies give structure to a dimension, defining the aggregation from lower-level data to higher-level data and the drill path from higher-level data to lower-level data. You should usually include a top dimension value that represents all lowest-level dimension values aggregated together. Without this top dimension value, every time you view the data, it will be broken down by the dimension. A user may say, "I don't care about this dimension—sum it up for me please." This summation would need to be done by the end-user tool, eliminating one of the prime advantages of Oracle OLAP.
One notable exception to this rule is in time dimensions. Sometimes users never want to aggregate multiple years together. Still, be very careful with this lack of a requirement. Some in the user community may say they never want to aggregate multiple years together, but inevitably, someone will leave out the year selection and expect Oracle OLAP to add all of the years together. Without a top dimension value, Oracle OLAP cannot aggregate the data for all years. In addition, the top dimension value is often used in calculated measures, such as share measures (share of total for company).
Oracle OLAP hierarchies can be defined to be either level-based or value-based hierarchies. Often, the dimension tables that load the hierarchical information determine whether a given hierarchy is level-based or value-based.
Level-Based Hierarchies In level-based hierarchies, each dimension value has a level associated with it. These levels typically have meaningful names, are ordered from most aggregate to least aggregate, and define the drill path for the dimension. For example, the levels of a time dimension may be Year, Month, Day. A dimension value is a year, a month, or a day. A given day belongs to only one month; a given month belongs to only one year.
End users can choose to view reports for a single level or for multiple levels, but typically with level-based hierarchies, users have a sense of the level at which they want to view the data. Star and snowflake dimension tables are presented as level-based hierarchies. Each column typically corresponds to a level.
Value-Based Hierarchies In value-based hierarchies, each dimension value has a parent in that hierarchy. There is often no concept of a level in a value-based hierarchy. Users can drill from higher-level dimension values to the children of these higher-level values. With value-based hierarchies, a dimension table can list the values in a dimension, and a column defines the parent of each dimension value. Value- based hierarchies are often used in employee dimensions or in account and line item dimensions.
Additional Hierarchies You can define as many hierarchies—either level-based or value-based—in a dimension as you need. Different user communities may want to look at dimensions differently. Some may prefer different rollup structures or ways of aggregating the data. Additional hierarchies allow you to meet the needs of these multiple communities or multiple uses within a community. You can have as many as you need to model your business. For example, you may introduce new hierarchies each year or to model a planned reorganization. Users operate with only one hierarchy at a time.
Often, combining two different ways of looking at the data into a single dimension is a good way of avoiding the explosion that occurs with multiple user communities. If one group wants to break out the data by state and another wants to break out the data by customer type, but no one ever wants to break out the data by state and customer type, it may make sense for a single Customer dimension to have a Type hierarchy and a State hierarchy. In contrast, if you were to include Customer, Type, and State as three separate dimensions, your aggregation time could be much greater, because Oracle OLAP would try to aggregate the data for every Type-State combination that could exist.
Dimension attributes help users to find specific dimension values and provide information about those dimension values. They also supply multilingual descriptive values for dimension values. For example, suppose there is a list of 10,000 products for a user to sift through. How is that user to find the product of interest? Also, suppose you want to store the ending date of each time period so you can determine which time period precedes a given time period or sort time periods by ending date. Dimension attributes enable you to specify this information. Some attributes are automatically defined by the system (system-defined) and some are defined by the person creating the dimension (user-defined).
System-Defined Attributes Oracle OLAP automatically defines the LONG_ DESCRIPTION and SHORT_DESCRIPTION attributes for each dimension. System- defined attributes are typically used by front-end tools to identify a given dimension value. Users can often search for text strings within these names. These attributes are dimensioned by an automatically generated language dimension to allow for names in multiple languages. Oracle OLAP does not require these description attributes, but some applications may need them.
For dimensions you identify as time dimensions, Oracle OLAP also creates the special attributes END_DATE and TIMESPAN. END_DATE contains the ending date of each time period. TIMESPAN contains the number of days in the time period. Oracle OLAP uses this information to calculate prior time periods and year-ago time periods. By using these attributes, Oracle OLAP cans support rich time-series calculations for just about any sort of time calendar, including fiscal calendars, promotional calendars, 13 periods per year, overlapping time periods, and more. If you do not populate these attributes, you will not be able to create time-based calculated measures. Bear this in mind when creating time dimension tables, because you will want to map to columns that contain this important metadata.
User-Defined Attributes You can create as many additional attributes on your dimensions as you want. Attributes are added to the Oracle database data dictionary. They are revealed as a column in dimension and hierarchy views. End users can use these attributes to find dimension values of interest.
For example, suppose you think that a user may want to create a report that displays sales for all products of a certain color. You could create an attribute called COLOR that enables users to select all of the products whose COLOR attribute is RED or WHITE. Attributes can be set for individual levels of a dimension or for all levels of a hierarchy, depending on whether that attribute applies to multiple levels.
Attributes are single-valued. For example, a product can be RED or WHITE, or even REDWHITE, but cannot be multiple values at once. Attributes are associated with a single dimension and are not tracked over time.
If you need to keep track of an attribute that changes over time, or varies by another dimension, you can define a cube with that information. For example, you may want to define a text Color measure that is dimensioned by Product and Time that defines the color of a product in a given month. This measure will generally occupy much more space than an attribute, because it is two dimensional. In addition, it will not benefit from having an index.
Attributes can be used together to find dimension values that meet certain criteria. This is especially handy for creating reports that need to select dimension values that should change as the data changes. If users were to select all the products that were red and white by selecting them from a list, when new red products were introduced, they would not be automatically added to the report. If a COLOR attribute were used instead, new red products would automatically appear on a report since the rule "select all RED products" is used. This technique is especially powerful when multiple attributes are used together, such as "select all RED or WHITE COLOR products or all VALENTINE THEME products."
When creating attributes, be sure to check the Index checkbox if you will be using the attribute in a filter, such as in a WHERE clause. It is faster to find all white products if there is an index of colors than to search sequentially through all products to see if the COLOR attribute is set to WHITE.
Designing Oracle OLAP Cubes
We discussed cubes in a general way in our blogs. This section focuses on the design considerations for Oracle OLAP cubes.
One of the great advantages of Oracle OLAP is the ability to define cubes to meet specific business requirements. These cubes can have varying dimensionality, and all share dimensions and reside in the same analytic workspace. If some of the data you want to represent varies by only three dimensions, create a cube with only three dimensions. If other data requires five dimensions, create a different cube with the five dimensions. By storing and calculating data for only the dimensions required for a certain analysis, you can save storage space, calculation time, and retrieval time.
Recall that cubes contain measures. If a measure is loaded from a fact table, it is a stored measure. You can also create calculated measures that are derived from other measures.
Stored measures in Oracle OLAP are loaded directly from fact tables, views, or similar objects such as materialized views, external tables, or gateways. They are usually the columns in the fact table that are not keys. The precision and data type of these values determine the specific data type of your measures. Usually, these are the number data type, but Oracle OLAP can support cubes with decimal, shortdecimal (half the size of decimal), Boolean, date, and text measures. These stored measures are the source of all of the data in the cube. Examples are SALES, COSTS, and UNITS. All data is computed from the stored measures.
Stored measures can also be calculated from within a cube using OLAP DML code, such as a forecast. If a measure can be computed from one or more other measures, generally it is better to create a calculated measure and allow Oracle OLAP to calculate the value on the fly.
Calculated measures in Oracle OLAP are measures that can be calculated dynamically from stored measures and other calculated measures. Calculations include arithmetic calculations, such as ratios between two measures like DOLLARS_PER_UNIT and SALES_MINUS_RETURNS, as well as more sophisticated calculations. Often, calculations reference specific dimensions and hierarchies. A common calculation that includes dimensions is a share calculation such as DOLLAR_SHARE_OF_REGION, which represents the ratio of the dollars sold to a given customer divided by the dollars sold to the region to which that customer belongs. This calculation allows a user to determine how important this customer is to its region. Some calculations involve the Time dimension, such as DOLLARS_PERCENT_CHANGE_FROM_YEAR_AGO.
The key point here is that the calculations to compute these measures are performed at the time that the data is requested of the cube. Thus, calculated measures require no storage.
Dimension calculation models provide an alternative to aggregations and allocations for calculating data for dimension members. Use models when no single rule, such as summing or averaging children, applies to how each dimension value is calculated. With a model, each member can be calculated using a unique equation.
Figure 1 shows an account financial model, which calculates three accounts from other accounts from other dimension values. The results of running this model
Figure 1. A dimension calculation model
are displayed in an Oracle OLAP Worksheet window. Notice that the values for Revenue % of GP, Net Income, and Gross Profit are calculated appropriately, even though the formula for Revenue % of GP uses Gross Profit before Gross Profit is defined. Oracle OLAP automatically computes the dependencies and solves the model in the proper order.
Models can be run after loading data, or you can have this run on the fly by attaching the model to an aggregation map. You use the OLAP DML language to enter your models.
Sparse and Dense Dimensions
We covered the concept of sparse and dense cubes in this article. When you define an Oracle OLAP cube, you specify which dimensions are sparse—in other words, which dimensions you expect to have a significant number of dimension value combinations for which there is no data present.
Oracle OLAP reserves space for every combination of dimension values that is defined as dense in a cube. For example, if you define a cube with 10,000 products and 1,000 geography values, and both dimensions are dense, you should expect Oracle OLAP to reserve space for 10,000,000 cells for each combination of any other dimensions of the cube. If, in reality, a given geography carries, say, only 2 percent of the detail product line, you should define these dimensions as sparse.
A couple of caveats are worth noting here. With hierarchical data, the sparsity of a cube often varies by level. As data is aggregated up a hierarchy, the data becomes denser. For example, while an individual store may carry a small fraction of a product line (sparse at low levels), at the total country level, most product categories have been sold at some point (dense at high levels).
For years, the general rule of thumb has been to define any dimension with less than 15 percent density as sparse, but to try to define as many dense dimensions as possible. Now, however, most Oracle OLAP cubes that we see are extremely sparse, and we recommend defining all dimensions as sparse. This is especially true for daily data. Define a dimension as dense only when you know that most combinations of sparse dimension values will have data for most values of that dimension. For example, if you are loading monthly sales data and you know that if a given store sells a product during any month, it is likely to sell at least some of that product most months, it may make sense to define the dimension as being dense.
After mapping a cube, you can run the Cube Storage Advisor from the Storage tab in AWM when defining a cube. This advisor analyzes your data and recommends which dimensions should be defined as sparse and dense. This is a great place to start if you are not intimately familiar with your data. Ultimately, the best way to determine which dimensions should be sparse or dense is to experiment using representative samples of your data.
Composites and Compressed Composites
As noted in the previous section, often your data is sparse. Oracle OLAP creates a special object like a dimension called a composite, which contains dimension values for each combination of sparse dimensions that exist in the data. In Figure 2, only 5 of the 16 possible combinations of Dim1 and Dim2 have data. If we define Dim1 and Dim2 as sparse dimensions when creating a cube, Oracle OLAP reserves only space for five cells, not the full 16. The work to create a composite is managed internally by the Oracle OLAP engine.
Often, certain dimension values have only one child. In this case, the data for the parent is the same as the data for the child, since the data for the parent is aggregated from only one child. With multiple dimensions and sparse data, the situation where only one child exists for a given parent cell is quite common. Before compressed composites, Oracle OLAP would store this data for the child level and again for the parent level. In Figure 3, Dim1 has a hierarchy with A being aggregated from B and C, B aggregated from D and E, and C from F and G. Dim2 is being aggregated horizontally with Q aggregated from R and S, R from T and U, and S from V and W.
As you can see, none of the dimension values have only one child, but given the sparsity in the data, certain values are replicated as we aggregate. The outlined values in the grid have only one child. With compressed composites, Oracle OLAP compresses these extra cells and points to the lower-level data, instead of taking up extra space. As the number of dimensions or sparsity increases, this ability to compress data can make a huge impact. We have seen cubes that have decreased their storage requirements (and solve times) by ten times because of compressed composites.
There is little overhead and no loss in functionality in using compression on composites. In general, you should choose to use compression.
Figure 2. Composites list only the dimension combinations that contain a value.
Figure 3. Compression increases with multiple dimensions.
When listing dimensions in a cube, order is important. To optimize the performance of Oracle OLAP, list dense dimensions first, followed by sparse dimensions.
For dense dimensions, the first dimension listed is the one that varies the fastest, and the last dimension listed is the one that varies the slowest. For example, if Time is listed first and is a dense dimension, Time varies fastest when the data is stored on disk. This order is optimal for a trend graph of data that contains most of the time periods, because one disk access is likely to retrieve all of the months for whatever is being graphed, since all of the time periods are contiguously laid out on disk. However, this may not be optimal when loading data for the latest month, since the "latest month" for other dimension values are spread all over.
The order of sparse dimensions is less important than the order of dense dimensions, but still relevant. The rule of thumb is different for compressed and uncompressed cubes. If your cube is compressed (which in most cases, it should be), you should plan to list the sparse dimensions in ascending order based on the number of members in the dimensions. For uncompressed cubes, list sparse dimensions in descending order.
Data is typically loaded into OLAP cubes at the lowest level of a dimension, and then aggregated up hierarchies, but there are other possibilities as well. The default aggregation operator is Sum, but others are available.
In addition to the aggregation operators, two other aggregation techniques are available to you: preaggregation and loading data at multiple levels.
Precomputing Aggregates Oracle OLAP always presents your data as if it were fully solved; that is, if you ask for the total sales for the entire year at the top of each dimension, Oracle OLAP computes aggregates from the data it has. Of course, if you have loaded 1 billion numbers and have not precomputed any aggregates ahead of time, it may take a while to add up all the numbers required for a given cell.
You can specify how much of the cube should be precomputed after loading data in the Aggregation tab when defining a cube. Determining how much you precompute is all about balancing the input/output (I/O) required to store and retrieve precomputed values against the CPU time it takes to compute values, both at build time and at run time. In general, the more you precompute, the faster your queries run, but the longer your builds run.
Cost-based aggregation allows Oracle OLAP to determine which aggregates should be precomputed based on the specific structure of the dimensions involved. You can specify a number from 0 to 100 (representing a relative scale, not a percentage) that defines whether the cube should be precomputed, and if so, how much of the cube. For example, suppose that Los Angeles has only 35 customers, and New York has 25,000 customers. With the cost-based aggregation method, Oracle OLAP is likely to choose to preaggregate New York and aggregate Los Angeles when queried, as illustrated in Figure 4. The cost-based aggregation method is a fine-tuned aggregation strategy that yields better build and query performance and should be used with compressed cubes in Oracle OLAP 11 g.
Cost-based aggregation is available in Oracle OLAP tig and later.
Figure 4. Cost-based aggregation example
In Oracle OLAP 10g, you can specify which levels of a dimension should be precomputed. You should generally precompute those levels that you expect to be accessed frequently. The default in Oracle OLAP 10g is to precompute every other level using a practice commonly called skip-level aggregation. Because every other level is already precomputed, only a few values must be aggregated at query time. Skip-level aggregation often represents a good balance between load performance and run-time performance.
Loading at Multiple Levels Most Oracle OLAP implementations load data at a single lowest level and aggregate from that level, but other possibilities exist. For example, perhaps weeks can span multiple months, and you would like to view monthly and weekly data, but have no need for daily data in your cube. You could load data at the week level, and load data at the month level in two different hierarchies.
You may also want to load data at multiple levels because the data does not aggregate up the hierarchy—it has been computed outside Oracle OLAP, and there is no way to tell Oracle OLAP how to aggregate the data properly. In this case, you can load the data at multiple levels and tell Oracle OLAP not to calculate certain hierarchies.
Partitioning a cube allows you to break a cube into pieces for manageability, performance, and scalability reasons. Cube partitioning works just like table partitioning. Each partition of a cube can be processed independently and concurrently. Queries that can be satisfied by a single partition can be sped significantly, because Oracle can look at a much smaller set of rows, instead of an entire table. To help manage cubes, partitions are automatically added when required and can be easily dropped when no longer needed.
Partitioning becomes critical with large cubes and multiprocessor machines, because only one processor in Oracle OLAP can write to a given partition. If you want Oracle OLAP to use multiple processors in a write operation (such as loading or aggregating), you need to partition your cubes.
You set the partition on a dimension at a selected level within that dimension. For example, you can specify that you want Oracle OLAP to partition your Sales cube by the Year level of the Time dimension. This creates a separate partition for each year in your cube. If you load and aggregate eight years of data, Oracle OLAP can separate the task of loading and aggregating into eight different jobs: one for each processor. If you want to spread the work evenly among multiple processors, you should design your partitions to be of relatively equal sizes and ensure that in each load, you are processing multiple partitions. If you partition by month and are loading only the latest month, you will not be spreading out the load and aggregation work among multiple partitions. If your goal is to use multiple CPUs, consider partitioning by some other dimension, such as region of the country.
The Cube Partitioning Advisor can help you select the appropriate partitioning strategy by analyzing your fact tables. Run the Cube Partitioning Advisor from AWM after you map your cube, but before you load data.
Cube-Organized Materialized Views
As noted in this blog, Oracle OLAP cubes are often used as materialized views. A bit of background is in order here. Materialized views save query time, since the data is precalculated when the view was materialized. When the Oracle optimizer creates the execution plan for a query, it can rewrite the query (or a block within the query) to a materialized view to improve performance.
Cube-organized materialized views are available in Oracle OLAP 11g and later.
With Oracle OLAP, you can create cube-organized materialized views that improve performance on queries against fact tables. Some organizations implement Oracle OLAP for this single capability. Oracle OLAP can be deployed simply as a way to accelerate queries on a data warehouse. Applications do not need to be modified in any way; they simply query the fact tables, and queries are redirected to the cube- organized materialized views. Queries that once took hours now may take seconds!
The other major benefit of cube-organized materialized views is manageability. This type of view represents data at multiple levels of multiple dimensions. Traditional materialized views represent data at only a single level of each dimension, so a single cube-organized materialized view can replace many traditional materialized views. If you want to expose cubes as materialized views, bear in mind the following restrictions
- All d imensions of the cube must have at least one level and one hierarchy.
The Oracle Database requires not-null constraints on columns with dimension members. Ragged and skip-level hierarchies use nulls, so transform them by filling in these columns with the concatenation of the level name and the parent dimension value whenever there is a null value in the table.
All d imensions of the cube must use the same aggregation operator, which is Sum, Min, or Max.
- measures, all five must be mapped to the source tables.
- The detail tables must support dimension and rely constraints. If they have not been defined, then use the Relational Schema Advisor to generate a script that defines them on the detail tables.
- The cube must be compressed.
- The cube can be enriched with calculated measures, but it cannot support The cube must be fully defined and mapped. For example, if the cube has five more advanced analytics in a cube script.
AWM enforces these rules and tells you if a cube can be used as a materialized view.
You can refresh cubes used as cube-organized materialized views using the same mechanisms as relational materialized views. You will find that the DBMS_ MVIEW.REFRESH syntax for refreshing cube-organized materialized views is the same as refreshing relational materialized views. This makes cubes more transparent as a mechanism for storing materialized views. Cube-organized materialized views can also be updated incrementally. For example, if you modify data for only the latest month, Oracle OLAP is smart enough to modify only the current month's data and any values that are aggregated from the current month. Designing your update process to capitalize on this capability can speed your update time significantly.
Summary of the Oracle OLAP Design Process
You must take into account many factors when designing an Oracle OLAP analytic workspace. Of paramount importance is how users are planning to use the cubes. Also important is the structure of the source data, although this can often be changed by loading from views that transform tables while data is being loaded.
You also need to consider how users find dimension values—the need for run-time calculated measures, aggregations, and partitioning strategies.
The design of Oracle OLAP analytic workspaces offers a great deal of flexibility. To master these concepts, however, requires practicing the art of designing Oracle OLAP analytic workspaces.