At the last post I began to touch on what is the art of database design and why it is not a hard and fast science. Essentially, any artist has tools. If the artist builds from steel, his tools are a welder, grinder, hammers, anvils, etc. and the same could be said for artists in any type of art. The design principles of database design are actually identical in that there are tools: tables, indexes, constraints (keys), domains, ranges, and a host of other tools, each designed to serve a particular function. When employed correctly, and with an artful mind, these tools become very powerful and are able to create great symphonies of storage, retrieval and complex logic to ensure data integrity or perform complicated operations.
The basics of the toolset begin with the table. A table is a structure which holds data. The data is held in columns and rows. Each row represents a single entity of data and each column across that row describes that entity. The row is known in database terms as a domain, and the domain is described by a key, or singular value that ensures each row has a certain level of uniqueness. The table is where the rubber of the tool meets the road of art. That’s because the art is knowing what can be considered an entity of data. The entity, or the concept of the entity, isn’t unique to the database itself, but rather to the understanding of the data and the understanding of the rules governing the data.
A simple example could be a table designed to hold addresses. Each row has something to denote the house number, street name, city, state, postal code, country and in some countries the neighborhood or district and/or specific locality. When a table like Addresses is considered, a specific name or attention may want to be added to help denote the uniqueness of that entity of data. Also, if the rules governing the data suggest that each address itself is unique, then other problems may arise when the data is actually employed in the application. This makes for a good case to use a Surrogate Key, a value unique to the row that is not directly related to the data but rather it is associated at the time the data is created. That unique value will forever describe that entity of data by reference to the data without being specific to the data itself. This value becomes the primary key. It’s the way the application employs the data and the way the database is able to make direct reference to the data without having to search through a table containing millions of records to find one. Moreover, the CRUD rules surrounding addresses in the application would be unified, so it makes sense from a standards process to store all addresses in a like container, and use them throughout the application and database as required, but in a standard way.
This type of implementation shouldn’t come as a surprise. Essentially, the rules governing the data have helped to describe the storage and usage of the data. The addresses sample is a simple, but far reaching example of quality design by helping enforce the rules governing the data. The argument can be easily made that the application did not dictate the design of the database but rather the CRUD rules, which is congruent with good design.
The toolset continues to take shape, now that the table and its primary key are in place, the basics of storage and retrieval need to be considered a bit further. In this case, the primary key on the table may also be a very good candidate for the clustered index. The clustered index is the sequence in which the data is stored on disk. By using a surrogate key and likely an automatic numbering, or Identity, column the data is always stored at the end of the previous data, thus filling in sequence. Why is this a good thing? It allows data to be written at the end and not in the middle of existing data, forcing all the other data to be displaced. The Identity paradigm allows the index to fill sequentially, also providing for the fewest amount of incomplete pages. When considered, this process really does allow for the quickest filling of a table and the easiest retrieval of data. It requires the data to use a nondescript value to identify the row, and therein lies the potential for errors.
Again, the art becomes the skillful use of the tools to produce a beautiful structure capable of maintaining itself with the least intervention. By understanding the data, the tool can be used to craft just the right storage.