Taking a new tack

Today I received a script to build a table and then that table was supposed to be the target of a bulk import task. When I reviewed the script, there were no keys or indexes specified.  When I pushed back against the people attempting to develop the table, inquiring about the data the table would contain and the context it would be used within, I found myself detailing for someone with no SQL Design background how to determine a primary key and possibly any required indexes.  I found myself stating that the table itself seemed fine, but, the data would tell us what the PK should be and whether or not it would be a composite key, or if we should be employing a surrogate key.   A composite key is when the true data itself in the table, has a unique combination of values which identify each row.  In traditional speak, we call that a domain.  Each domain being a row of data, and having a primary key or specific identifier.  When the primary key (PK) is a combination of values within the real data, it is a composite key.  This type of key isn’t very common any more in high transaction databases, because the non-related surrogate key is easier to implement and much faster to index. A surrogate key is simply a value added to uniquely identify a row of data.  It’s typically a number of some type and it will auto-increment, so it can be ignored on inserts.  More than that, it keeps the clustered index in a nice sequence which makes the indexing strategy on the table easier to handle.  Even more than that, the application can use that value as a very tiny identifier of the domain of data!  Which speeds up the application and eases other operations.  How do we explain all of this to a non-DB design person?   In trying to explain the options.  I decided I’d cover using a unique index too.  See, a unique index or constraint on the table used to enforce uniqueness across a row of data, or domain, would act similarly.  But, what you’d have then is an index on a heap.  While that is better than no index at all, the end result is large table scans.  And that’s not a strategy for quick searching and data retrieval.   Keep in mind, when you’re designing a new table that it’ll need a primary key to keep the data in order on the disk, and it’ll likely need indexes which will need to find other data within the searched domains via the primary key through a key lookup operation.  But all of that is a good thing, and means your data will likely come back fast and stay clean.   Clean data and fast searching to you!  Until next time,.. Data-Chef out.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: