Imagine you're starting a new job at a growing e-commerce company, and your first task is to help design a database to track customers, orders, and products. You sit down with the development team, and they immediately start throwing around terms like "VARCHAR(255)" and "foreign key constraints." Meanwhile, you're wondering: what's the difference between storing a customer's age as a number versus text? Why does it matter if a phone number is stored as "123-456-7890" or 1234567890? And what happens if you get it wrong?
These aren't just technical details — they're fundamental decisions that affect everything from how fast your queries run to whether your application breaks when someone enters unexpected data. Getting data types and schema design right from the start saves countless hours of debugging and prevents embarrassing data corruption down the road.
What you'll learn:
You should have basic familiarity with what databases and tables are conceptually. If you can write a simple SELECT statement, you're ready for this lesson. We'll be using examples that work in most SQL databases (PostgreSQL, MySQL, SQL Server), with notes about any important differences.
Think of data types as containers with specific rules. Just like you wouldn't store soup in a paper bag, you shouldn't store dates in a text field or prices in a field designed for whole numbers. Each data type tells the database exactly how to store, validate, and work with your data.
Let's start with the most common data types you'll encounter:
Text data comes in several flavors, each optimized for different use cases:
-- Creating a customers table to illustrate text types
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
first_name VARCHAR(50), -- Variable-length, up to 50 chars
last_name VARCHAR(50),
email VARCHAR(255), -- Longer for email addresses
phone_number CHAR(10), -- Fixed-length, exactly 10 chars
bio TEXT -- Unlimited length for longer content
);
VARCHAR is your workhorse for most text data. It only uses as much storage as needed (up to your specified limit), making it perfect for names, emails, and addresses. The number in parentheses is the maximum length — choose it thoughtfully. VARCHAR(50) for a name field gives room for longer names without wasting space.
CHAR is for fixed-length data where every value will be exactly the same length. Phone numbers (if you strip formatting), postal codes, and country codes are good candidates. CHAR(10) always uses 10 characters of storage, padding with spaces if necessary.
TEXT is for longer, variable-length content where you don't want to specify a maximum length. Product descriptions, user comments, and blog posts fit here.
Tip: Start with VARCHAR for most text fields. You can always change to TEXT later if you need unlimited length, but going the other direction is harder.
Numbers seem simple, but choosing the wrong numeric type can cause serious problems:
CREATE TABLE products (
product_id INT PRIMARY KEY,
name VARCHAR(100),
price DECIMAL(10,2), -- Up to 10 digits, 2 after decimal
quantity_in_stock INT, -- Whole numbers only
weight_ounces FLOAT, -- Approximate decimal numbers
is_featured BOOLEAN -- True/false values
);
INT (integer) stores whole numbers from about -2 billion to +2 billion. Perfect for IDs, quantities, and counts. Most databases offer variants like BIGINT for larger numbers and SMALLINT to save space.
DECIMAL(precision, scale) is crucial for money and other exact decimal values. DECIMAL(10,2) means up to 10 total digits with exactly 2 after the decimal point — perfect for prices up to $99,999,999.99. This type prevents rounding errors that can literally cost money.
FLOAT and DOUBLE are for approximate decimal numbers where small rounding errors are acceptable. Scientific measurements, coordinates, and statistical calculations often use these types. Never use FLOAT for money!
BOOLEAN stores true/false values efficiently. Some databases call this BIT or use TINYINT(1).
Warning: Always use DECIMAL for money, never FLOAT. Floating-point arithmetic can introduce tiny errors that compound over time, potentially causing accounting discrepancies.
Date and time data types are where many beginners make costly mistakes:
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE, -- Just the date: 2024-03-15
created_at TIMESTAMP, -- Date and time: 2024-03-15 14:30:22
shipped_at DATETIME, -- Similar to TIMESTAMP
estimated_delivery TIME -- Just time: 14:30:22
);
DATE stores just the calendar date without time information. Use this for birthdays, deadlines, and other date-only data.
TIMESTAMP includes both date and time, often with timezone information. This is perfect for recording when events happen in your application.
DATETIME is similar to TIMESTAMP but typically doesn't include timezone info (behavior varies by database).
TIME stores just the time portion — hours, minutes, and seconds without a date.
The key insight: always store dates and times in proper date types, never as text. This enables date arithmetic, proper sorting, and timezone handling.
Now that you understand data types, let's talk about organizing them into tables. Good schema design follows several key principles that prevent problems as your database grows.
Consider this poorly designed orders table:
-- DON'T DO THIS - Poor design
CREATE TABLE bad_orders (
order_id INT,
customer_name VARCHAR(100),
customer_email VARCHAR(255),
customer_phone VARCHAR(20),
product_name VARCHAR(100),
product_price DECIMAL(10,2),
order_date DATE
);
What's wrong here? If the same customer places multiple orders, you're storing their name, email, and phone number repeatedly. If they change their email, you'd have to update it in multiple places. This violates the principle of normalization.
Here's the normalized version:
-- BETTER - Normalized design
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(255),
phone VARCHAR(20)
);
CREATE TABLE products (
product_id INT PRIMARY KEY,
name VARCHAR(100),
price DECIMAL(10,2)
);
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
order_date DATE,
quantity INT,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
Now customer information lives in one place, and orders just reference it. This eliminates redundancy and ensures consistency.
Every table should have a primary key — a column (or combination of columns) that uniquely identifies each row:
CREATE TABLE employees (
employee_id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(255) UNIQUE,
first_name VARCHAR(50),
last_name VARCHAR(50),
hire_date DATE
);
The AUTO_INCREMENT (or SERIAL in PostgreSQL) automatically generates unique numbers. The UNIQUE constraint on email ensures no two employees can have the same email address.
Tip: Use surrogate keys (like auto-incrementing integers) for primary keys rather than natural keys (like email addresses). Email addresses can change, but your internal ID should never change.
Foreign keys enforce referential integrity — they ensure that references between tables remain valid:
CREATE TABLE order_items (
item_id INT AUTO_INCREMENT PRIMARY KEY,
order_id INT NOT NULL,
product_id INT NOT NULL,
quantity INT NOT NULL,
unit_price DECIMAL(10,2) NOT NULL,
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
The foreign key constraints prevent you from creating order items for non-existent orders or products. They also prevent you from deleting a customer who has orders (unless you configure cascade deletion).
Constraints enforce business rules at the database level, providing a last line of defense against invalid data:
CREATE TABLE products (
product_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
category VARCHAR(50) NOT NULL,
price DECIMAL(10,2) NOT NULL CHECK (price > 0),
stock_quantity INT NOT NULL DEFAULT 0 CHECK (stock_quantity >= 0),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Here's what each constraint does:
NOT NULL prevents empty values in critical fieldsCHECK (price > 0) ensures prices are always positiveCHECK (stock_quantity >= 0) prevents negative inventoryDEFAULT values provide sensible fallbacksLet's design a schema for a realistic e-commerce scenario. We'll think through the business requirements and translate them into database structure.
Our e-commerce platform needs to handle:
Let's start with customers and addresses:
CREATE TABLE customers (
customer_id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
phone VARCHAR(20),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
CREATE TABLE addresses (
address_id INT AUTO_INCREMENT PRIMARY KEY,
customer_id INT NOT NULL,
address_type ENUM('billing', 'shipping') NOT NULL,
street_address VARCHAR(255) NOT NULL,
city VARCHAR(100) NOT NULL,
state_province VARCHAR(100),
postal_code VARCHAR(20),
country VARCHAR(100) NOT NULL,
is_default BOOLEAN DEFAULT FALSE,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
Notice how we separated addresses into their own table. A customer might have multiple shipping addresses, so this design accommodates that flexibility.
Now let's add products with categories:
CREATE TABLE categories (
category_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) UNIQUE NOT NULL,
description TEXT,
parent_category_id INT,
FOREIGN KEY (parent_category_id) REFERENCES categories(category_id)
);
CREATE TABLE products (
product_id INT AUTO_INCREMENT PRIMARY KEY,
category_id INT NOT NULL,
name VARCHAR(200) NOT NULL,
description TEXT,
price DECIMAL(10,2) NOT NULL CHECK (price >= 0),
stock_quantity INT NOT NULL DEFAULT 0 CHECK (stock_quantity >= 0),
sku VARCHAR(50) UNIQUE,
weight_grams INT,
is_active BOOLEAN DEFAULT TRUE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (category_id) REFERENCES categories(category_id)
);
The self-referencing foreign key in categories allows for hierarchical categories (Electronics → Computers → Laptops).
Finally, let's handle orders:
CREATE TABLE orders (
order_id INT AUTO_INCREMENT PRIMARY KEY,
customer_id INT NOT NULL,
billing_address_id INT NOT NULL,
shipping_address_id INT NOT NULL,
order_status ENUM('pending', 'processing', 'shipped', 'delivered', 'cancelled') DEFAULT 'pending',
subtotal DECIMAL(10,2) NOT NULL,
tax_amount DECIMAL(10,2) NOT NULL DEFAULT 0,
shipping_cost DECIMAL(10,2) NOT NULL DEFAULT 0,
total_amount DECIMAL(10,2) NOT NULL,
order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
shipped_date TIMESTAMP NULL,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
FOREIGN KEY (billing_address_id) REFERENCES addresses(address_id),
FOREIGN KEY (shipping_address_id) REFERENCES addresses(address_id)
);
CREATE TABLE order_items (
order_item_id INT AUTO_INCREMENT PRIMARY KEY,
order_id INT NOT NULL,
product_id INT NOT NULL,
quantity INT NOT NULL CHECK (quantity > 0),
unit_price DECIMAL(10,2) NOT NULL,
total_price DECIMAL(10,2) NOT NULL,
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
Key Insight: We store the unit price in order_items even though it's also in the products table. This preserves historical pricing — if you change a product's price, old orders should still show what the customer actually paid.
Let's put these concepts to work by designing a simple library management system. You'll create tables for books, authors, library members, and book loans.
Start by thinking through the requirements:
Try designing the schema yourself first, then compare with this solution:
CREATE TABLE authors (
author_id INT AUTO_INCREMENT PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
birth_date DATE,
biography TEXT
);
CREATE TABLE genres (
genre_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(50) UNIQUE NOT NULL
);
CREATE TABLE books (
book_id INT AUTO_INCREMENT PRIMARY KEY,
isbn VARCHAR(20) UNIQUE NOT NULL,
title VARCHAR(200) NOT NULL,
publication_date DATE,
genre_id INT,
copies_total INT NOT NULL DEFAULT 1 CHECK (copies_total > 0),
copies_available INT NOT NULL DEFAULT 1 CHECK (copies_available >= 0),
FOREIGN KEY (genre_id) REFERENCES genres(genre_id),
CHECK (copies_available <= copies_total)
);
CREATE TABLE book_authors (
book_id INT,
author_id INT,
author_order INT DEFAULT 1,
PRIMARY KEY (book_id, author_id),
FOREIGN KEY (book_id) REFERENCES books(book_id),
FOREIGN KEY (author_id) REFERENCES authors(author_id)
);
CREATE TABLE members (
member_id INT AUTO_INCREMENT PRIMARY KEY,
library_card_number VARCHAR(20) UNIQUE NOT NULL,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
phone VARCHAR(20),
address TEXT,
membership_date DATE DEFAULT (CURRENT_DATE),
membership_expires DATE,
is_active BOOLEAN DEFAULT TRUE
);
CREATE TABLE loans (
loan_id INT AUTO_INCREMENT PRIMARY KEY,
book_id INT NOT NULL,
member_id INT NOT NULL,
loan_date DATE DEFAULT (CURRENT_DATE),
due_date DATE NOT NULL,
return_date DATE,
fine_amount DECIMAL(5,2) DEFAULT 0,
FOREIGN KEY (book_id) REFERENCES books(book_id),
FOREIGN KEY (member_id) REFERENCES members(member_id)
);
Key design decisions here:
Wrong approach:
CREATE TABLE products (
id TEXT,
name TEXT,
price TEXT,
created_date TEXT
);
This wastes storage and prevents proper validation. The database can't help you catch errors like negative prices or invalid dates.
Better approach: Use appropriate data types from the start.
Tables without primary keys cause serious problems with replication, updates, and data integrity. Always define a primary key, even if it's just an auto-incrementing ID.
Wrong approach:
CREATE TABLE orders (
order_id INT PRIMARY KEY,
product_ids VARCHAR(255) -- "1,5,12,8"
);
This violates first normal form and makes querying nearly impossible.
Better approach: Use a separate table for order items as shown earlier.
Using VARCHAR(50) for names in one table and VARCHAR(100) in another creates confusion. Establish conventions and stick to them.
When you encounter problems:
Pro Tip: Use database tools like phpMyAdmin, pgAdmin, or SQL Server Management Studio to visualize your schema and relationships. The visual representation often reveals design issues that aren't obvious in SQL.
You've learned the fundamentals of SQL data types and schema design — knowledge that forms the foundation of every database application. Here's what you can now do:
The key principles to remember:
Your next steps should include:
Remember, schema design is both an art and a science. The "perfect" design depends on your specific requirements, but following these principles will keep you out of trouble and set you up for success as your applications grow and evolve.
Learning Path: SQL Fundamentals