A Tutorial on Basic Normalization
Nearly everyone who has heard of relational databases has heard of normalization and knows that a principal objective of relational data modeling is to produce a “fully normalized” data model. The value of a fully normalized model is that it minimizes redundancy in a database, and thereby makes it less likely that inconsistencies can appear in that database. There are thus specific benefits to be gained from an understanding of the process of normalization and of how it applies in the IT environment.
Normalization removes redundancies from relational data models. Redundancy in a data model means that the same fact can be recorded multiple times in a database that conforms to that data model. If the different copies of that fact are not all entered correctly into the database, or if they are not all updated the same way, at the same time, then the database will be inconsistent with respect to that fact.
Normalization also eliminates the dependency of one fact on another. If one piece of information is removed from a database, this should not force the removal of a separate piece of information. When the last employee is transferred out of department ITSP, the name of that department, “Information Technology Strategic Planning,” should not be removed from the database. The name of a department is a piece of information which is separate from the list of employees who belong to that department and should not disappear when the list becomes temporarily empty.
A third benefit of normalization is that it assists in the discovery and clarification of business rules. In the process of normalizing a data model, we are forced to get answers to questions such as, “Can a salesperson be recorded in the database even if she currently is not assigned to any customers?” These answers are business rules, and requirements gathering done in advance of building a data model will never uncover all of these rules.
There are several good books that discuss normalization. For example, years ago, in the third edition of his classic work, An Introduction to Database Systems, Chris Date1 included an excellent chapter on normalization. And there are now many other books on the subject, each with their own particular merits.
This series of articles will review the basic normal forms. These are the ones that are based on the concept of functional dependencies. No background in relational data modeling is assumed, but the topic will be covered in depth. In the last article in the series, I will provide a management-level summary of all the material presented in the earlier articles.
With respect to data modeling theory, I will define the concepts of normalization and the normal forms themselves as precisely as possible. When the definitions involve technical concepts such as those of superkeys and functional dependencies, I will do more than give a concise mathematical definition of the concepts. Instead, I will make those concepts intuitive by explaining why and how they work.
With respect to data modeling practice, my approach will include multiple solutions to many of the normalization problems I will discuss and show how the different solutions support often subtle differences in business rules. Thus, these articles will not be merely a review of the basic normal forms. They will show those normal forms at work, and illustrate how a modeler uses the rules of normalization to think through real-world modeling problems.
0 comments:
Post a Comment