Last time, we looked into why a heap is less than performance oriented. And, we looked into the Composite Key Primary Key, Clustered Keys and the combination of the two. Looking at that scenario what happened was the Composite Key, used as the Primary Key and Clustered index would produce a very random I/O on the disk as rows are inserted into the middle of the table, the entire table has to reorder itself, or be subject to massive fragmentation. The end result of that fragmentation is degraded performance over time, or table rebuilds on a very regular basis. Either way, the long term result becomes more maintenance or less than optimal performance.
Now let’s look at the Surrogate Key. Surrogate keys are essentially a key, or unique identifier for a single row that isn’t linked to the data within the row itself. Seems odd, right? Not so much. Think about all the times when a random number identifies your data with the company you’re doing business. Credit Card numbers are a good example of how that works.
Let’s setup a comparison between the two types of Keys. The relative medium I’ll ask you to use is a big line of dominoes, or a visualization of a big line of dominoes. When you employ the Surrogate Key methodology using an integer or larger data type (I’ll get into that data type in another post, but for know think of it as a basic integer value) the line is done in order, and very time you add to the line, you add a larger number on the end. Also, each time a row of data which is represented by a domino then that data doesn’t really need to be moved. The only time the data needs to be moved is in the event of a delete operation, and even then a Soft Delete operation can be deployed to ensure the row of data doesn’t need to be moved. When considering how data fragments on disk this makes data access faster because it makes writes faster, by adding to the end of the existing line.
If the Composite Key is employed in the same fashion, and the Clustered Index is on the Primary Key, think about the consequences in the line of dominoes. Each time a new row is written, that new data needs to be written in between of existing data, causing the remainder of the data to require movement. That’s not fast,.. Deletes operate in the same way. Each time a row is removed the remaining data needs to be reorganized. In the event of a large table of data, that table would require reorganization daily. The net effect is taking the system offline to rebuild the data for a new day.
I don’t want to make the case for Composite Keys being all bad, that pattern exists for a good reason. But, for high transaction tables in busy systems the main tables should be using a Surrogate Key model to make writing data fast, and updating data cause much lower amounts of fragmentation. The overall maintenance of the table is much, much lower.
Next time I’ll get deeper into designing a system using both Composite Keys and Surrogate Keys and where Clustered Index should likely be based on several factors. I think that will bring some clarity to the concepts presented here and in the previous post.
Data Chef out.