Database Normalization

April 22, 2024 - Drygast

Best Practice Database MS SQL MySQL SQLite

At its core, database normalization aims to eliminate data redundancy and ensure data integrity. Think of it as arranging your data into logical and manageable chunks, each serving a specific purpose. This organization not only makes it easier to store and retrieve data but also enhances the overall efficiency and performance of your database.

If you would rather read a professional wikipedia article, take a look at this one: https://en.wikipedia.org/wiki/Database_normalization. I will only discuss the first couple of levels as I usually does not go above them.

Database Table Basics

In the world of databases, tables are the building blocks that store and organize your data. Imagine a table as a grid, with rows and columns where each row represents a record and each column represents a specific attribute or field.

These attributes could be anything from a person's name, age, or address to a product's price, description, or quantity in stock. By organizing data into tables, you create a structured format that makes it easier to store, retrieve, and manipulate information.

Data Redundancy

Data redundancy is like having multiple copies of the same information scattered throughout your database. While redundancy might seem harmless at first, it can lead to several problems, including wasted storage space, data inconsistency, and increased maintenance overhead.

Imagine you have a database table for storing customer information, and within that table, you have fields for the customer's name, email address, and phone number. Now, suppose you have multiple orders from the same customer. Without normalization, you might end up duplicating the customer's information for each order they place. This redundancy not only consumes unnecessary storage space but also increases the risk of inconsistency - for example, if the customer updates their email address, you'd need to update it in every instance where their information is duplicated.

Furthermore, redundant data can lead to inefficiencies when performing queries and updates. For instance, if you want to change a customer's phone number, you'd need to update it in every record where it appears, potentially overlooking some instances and resulting in data inconsistencies.

In addition to the practical implications, data redundancy can also pose security risks. Storing sensitive information redundantly increases the likelihood of unauthorized access or data breaches, as there are more points of entry for malicious actors to exploit.

Normal Forms

In the world of database normalization, Normal Forms (NF) serve as guidelines for organizing data in a structured and efficient manner. The term "Normal Form" refers to a standard set of rules that define the level of normalization achieved in a database schema.

Unnormalized Form (UNF or 0NF)
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Elementary Key Normal Form (EKNF)

Unnormalized Form (UNF or 0NF)

The Unnormalized Form is the initial state of a database where data is stored without any structure or organization. In this form, data may contain repeating groups and is susceptible to data redundancy and inconsistency.

Name	Department	Skills
John Doe	Marketing	Marketing, Sales, Customer Service
Jane Smith	Human Resources	Human Resources, Training, Recruitment

In this example, the table represents employee information in Unnormalized Form. Each row contains data for an employee, including their name, department, and a comma-separated list of skills. Notice how the skills column violates the principles of normalization by storing multiple values in a single cell, leading to data redundancy and making it difficult to query or update individual skills.

First Normal Form (1NF)

The First Normal Form addresses the issues of the Unnormalized Form by ensuring that each table has a primary key and that each attribute contains atomic values. In other words, there are no repeating groups, and each cell in the table holds a single value.

To transition from Unnormalized Form (0NF) to First Normal Form (1NF), we need to ensure that each attribute contains atomic values and that there are no repeating groups. In our example, we'll address the issue of the Skills column containing multiple values by splitting it into individual rows, with each skill associated with a unique employee.

EmployeeId	Name	Department	Skill
1	John Doe	Marketing	Marketing
1	John Doe	Marketing	Sales
1	John Doe	Marketing	Customer Service
2	Jane Smith	Human Resources	Human Resources
2	Jane Smith	Human Resources	Training
2	Jane Smith	Human Resources	Recruitment

In this table, each row represents a unique combination of Employee ID, Name, Department, and Skill, ensuring that each attribute contains atomic values.

Second Normal Form (2NF)

To transition from First Normal Form (1NF) to Second Normal Form (2NF), we need to ensure that every non-prime attribute is fully functionally dependent on the primary key. In our example, we'll identify the primary key and separate attributes that are not fully dependent on it into their own tables.

Let's identify the primary key in our example. Since Employee ID alone can uniquely identify each employee, it will serve as the primary key.

Here's the breakdown of our tables in Second Normal Form (2NF):

Employees Table:

EmployeeId (Primary Key)
Name
Department

Skills Table:

SkillId (Primary Key)
SkillName

Employee_Skill Table:

EmployeeId (Foreign Key referencing Employees Table)
SkillId (Foreign Key referencing Skills Table)

Employees
EmployeeId (PK)	Name	Department
1	John Doe	Marketing
2	Jane Smith	Human Resources

Skills
SkillId (PK)	SkillName
1	Marketing
2	Sales
3	Customer Service
4	Human Resources
5	Training
6	Recruitment

Employees_Skills
EmployeeId (FK)	SkillId (FK)
1	1
1	2
1	3
2	4
2	5
2	6

In this representation, each table is now in Second Normal Form (2NF), and the data is organized more efficiently with minimal redundancy.

Third Normal Form (3NF)

To satisfy Third Normal Form (3NF), a table must first be in Second Normal Form (2NF), and then all attributes that are not directly dependent on the primary key must be removed and placed into separate tables.

To achieve Third Normal Form (3NF), we'll remove the Department attribute from the Employees Table and create a new Departments Table with the Department Name and ID.

Employees
EmployeeId (PK)	Name	DepartmentId (FK)
1	John Doe	1
2	Jane Smith	2

Skills
SkillId (PK)	SkillName
1	Marketing
2	Sales
3	Customer Service
4	Human Resources
5	Training
6	Recruitment

Departments
DepartmentId (PK)	DepartmentName
1	Marketing
2	Human Resources

Employees_Skills
EmployeeId (FK)	SkillId (FK)
1	1
1	2
1	3
2	4
2	5
2	6

Now, each table in our database example satisfies Third Normal Form (3NF), as all non-key attributes are fully dependent on the primary key, and there are no transitive dependencies.

Elementary Key Normal Form (EKNF)

To determine if the tables conform to Elementary Key Normal Form (EKNF), we need to ensure that every non-prime attribute is directly dependent on the primary key and that the primary key is a single attribute.

In our example, let's focus on the Employees Table and the Employees_Skills Table:

Employees Table: The primary key is EmployeeId. All non-prime attributes (Name, DepartmentId) are directly dependent on the primary key. Additionally, the primary key is a single attribute. Therefore, the Employees Table conforms to EKNF.

Employees_Skills Table: The primary key is a composite key {EmployeeId, SkillId}. While all non-prime attributes (EmployeeId, SkillId) are directly dependent on the composite key, the primary key is not a single attribute. Therefore, the Employees_Skills Table does not conform to EKNF.

Employees_Skills
EmployeeSkillId (PK)	EmployeeId (FK)	SkillId (FK)
1	1	1
2	1	2
3	1	3
4	2	4
5	2	5
6	2	6

In this modified table:

EmployeeSkillId is introduced as the primary key, ensuring that each row has a unique identifier.
EmployeeId and SkillId are still present as foreign keys, maintaining the relationships with the Employees and Skills tables.
All non-prime attributes are directly dependent on the primary key, aligning the table with Elementary Key Normal Form (EKNF).

Summary

In this article, we delved into the world of database normalization, exploring its importance in organizing data efficiently and maintaining data integrity. From the initial Unnormalized Form (UNF) to higher forms like Third Normal Form (3NF), we learned how normalization eliminates redundancy and dependency, ensuring data consistency and optimizing database performance.

While normalization principles can lead us to forms like Elementary Key Normal Form (EKNF) and higher (BCNF, 4NF, ETNF, 5NF, DKNF, 6NF), I personally often stop at Third Normal Form (3NF), considering the database normalized by then. This practice balances complexity and practicality, creating robust, scalable databases that meet modern application needs while ensuring data reliability and efficiency.