Data Warehousing

Dimensional Storage & Modeling for Volatile Attribute Taxonomies

There are line of business scenarios where dimensional attribute taxonomies within data warehouses (DW) are volatile. That is, the attributes themselves, not their values, are subject to change. From a classical relational dimensional modeling approach, this means the schema of the entities representing the dimensions of a DW are not fixed. Attributes and their relationship with other attributes (hierarchies) are subject to being added or removed.

This situation, which is common within the apparel industry, makes maintaining traditional “horizontal” dimensional data models (as the primary data storage and management repository) tedious and problematic at best, especially from the perspective of the Extract-Load-Transform-Load (ELTL) processes which are critical to the function of the DW.

So what’s the solution? I’m a staunch advocate of an Entity-Attribute-Value (EAV) or “vertical” data model. This can is used to describe entities where the number of attributes that can be used to describe such is potentially vast, indeterminate and volatile, but the number that will actually apply to a given entity is relatively modest.

Many will object to an EAV approach , immediately  classifying such a data model as an anti-pattern which can lead to longer development times, poor use of database resources and more complex queries when compared to a horizontal dimensional modeling approach. In my opinion, this is uniformed, assumptive and myopic thinking. An EAV data model does not intrinsically mean an architecture is absent the use of a horizontal data model. Rather, a well rationalized use of EAV implements both vertical and horizontal data models, perhaps organized into separate “namespaces” or “schemas” with the DW.

EAV data models are, by convention, rich in “metadata”. It is this metadata which is crucial to overcoming most all objections to the EAV approach. With proper consideration given to the metadata requirements, there should exist sufficient information in the vertical schema to:

  • develop a standard dimensional attribute ELTL process that eliminates the need for traditional, tightly-coupled ELTL implementations which are dimension specific
  • use code / object generation techniques to dynamically “pivot” the data within each type of entity described into it’s horizontal equivalent
  • support the concept of “slowly-changing dimension schemas” and produce temporally unique versions of horizontal dimensional perspectives

Whether the resultset produced from a given pivot needs to be persisted (in the form of a table or indexed  view) or remains dynamic (in the form of a non-indexed view) is largely a function of the complexity of the pivoting query and it’s performance.  To be certain, implementing the code / object generator is no trivial task. However, if implemented properly, the flexibility to support both persisted and dynamic horizontal dimensional perspectives can be readily achieved. Furthermore, if a persisted tabular perspective is required, it can be created as a final stage of dimensional attributes ELTL associated with a given type of entity.

It is this pivoting methodology and support that turns EAV data modeling from an anti-pattern into a robust, flexible and schema-independent solution. Horizontal dimensional perspectives are at the core of high performance, logical and ‘business-friendly’ dimensional modeling championed by Ralph Kimball.

I believe this architecture presents a compelling alternative to deal with scenarios becoming more commonplace, especially as the need to integrate un- and semi-structured data (i.e. social media data feeds) into today’s business analytics environments.

If you’d like to further explore this topic, please share your comments. If there’s sufficient interest, I’ll do a deeper-dive into some EAV implementation specifics.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s