Tag Archives: Unique key

Data Warehouse – Surrogate keys and Foreign Keys in a Dimensional Data Model

Dictionary: (http://www.agiledata.org/essays/keys.html)

  • Key. A key is one or more data attributes that uniquely identify an entity.  In a physical database a key would be formed of one or more table columns whose value(s) uniquely identifies a row within a relational table.
  • Composite key.  A key that is composed of two or more attributes.
  • Natural key.  A key that is formed of attributes that already exist in the real world.  For example, U.S. citizens are issued a Social Security Number (SSN)  that is unique to them (this isn’t guaranteed to be true, but it’s pretty darn close in practice).  SSN could be used as a natural key, assuming privacy laws allow it, for a Person entity (assuming the scope of your organization is limited to the U.S.).
  • Surrogate key.  A key with no business meaning.
  • Candidate key.  An entity type in a logical data model will have zero or more candidate keys, also referred to simply as unique identifiers (note: some people don’t believe in identifying candidate keys in LDMs, so there’s no hard and fast rules).  For example, if we only interact with American citizens then SSN is one candidate key for the Person entity type and the combination of name and phone number (assuming the combination is unique) is potentially a second candidate key.  Both of these keys are called candidate keys because they are candidates to be chosen as the primary key, an alternate key  or perhaps not even a key at all within a physical data model.
  • Primary key.  The preferred key for an entity type.
  • Alternate key. Also known as a secondary key, is another unique identifier of a row within a table.
  • Foreign key. One or more attributes in an entity type that represents a key, either primary or secondary, in another entity type.

Surrogate/ Foreign Keys are used to link data between Fact and Dimension table and in some cases between dimension and dimension.

Surrogate Keys:

You will find every a few synonyms for a Surrogate key – meaningless keys, integer keys, nonnutural keys, artificial key, synthetic keys, link keys etc.
I propose creating a process to generate these keys during load. These keys are numeric in type with absolutely no meaning. Keep these keys small in order to optimise the retrieval of records between tables.
Create your own algorithm (i.e. use GUID ID from a system) that always unique creates the keys over platforms
What do I mean by over more than one platform? Examine the following:
Your company is now global with a data warehouse in the US and in Africa and the same data model now you are asked. – We need to see customer globally?
You need to combine the data to one Global Warehouse – this will not work if your Surrogate keys are the same thus ensure they are unique between systems. Using the auto number on a table will only work for one database, but not when merging more than database unless you specified a seed on each database from day one – but what will that seed be?
As Per Ralph Kimball : Surrogate keys “ One of the primary benefits of surrogate keys is that they buffer the data warehouse environment for operational changes” Ok so what is he saying – imagine you have used the Product code as key and the operation system re-uses product code 1 what do you now do with the rest of the old data?
So do not use a business bound soft coded values (Like product code or CIF number) as a Key this will become a major flaw in you design
Surrogate keys value

  • Enables ETL Updates to do slowly changing dimensions (Separate blog entry)
  • Binds table together in Dimensional Model
  • This key can also be the primary key (U-key) on the table

More reading: http://en.wikipedia.org/wiki/Surrogate_key

Foreign Key

this is a key stored in the fact and or the dimension that links to the foreign table for example you have the Customer Key as a foreign key in the Transaction Fact table in order to join to the fact table to the customer.
Primary Keys( AKA Unique key)

Key is generated on a table that is unique to the table only this can also be the surrogate key.
Normally this is a constrain on your database in order to ensure uniqueness.

More reading: http://en.wikipedia.org/wiki/Unique_key

{Views and opinions on this Blog does not reflect current/past employers view(s).}