Data Modeling for Analytics: Dimensional Modeling vs One Big Table

You're staring at a spreadsheet with 50,000 rows of e-commerce transaction data. Sales wants to know which products are trending, marketing needs conversion rates by campaign, and finance wants monthly revenue summaries. Everyone's asking different questions, but they're all hitting the same messy data dump. Sound familiar?

This is where analytical data modeling saves the day. Unlike operational databases that prioritize storing data efficiently, analytical models are designed for asking questions fast. Today, we'll explore two fundamental approaches: dimensional modeling (the time-tested warehouse approach) and One Big Table (the modern cloud-first method). By the end, you'll know when to use each approach and how to implement both.

What you'll learn:

How dimensional modeling organizes data into facts and dimensions for fast analytics
When and why to use star schemas versus denormalized tables
How to design a One Big Table for cloud analytics platforms
Practical trade-offs between query performance and data complexity
Real implementation patterns you can use immediately

Prerequisites

You should be comfortable with basic SQL queries (SELECT, JOIN, GROUP BY) and understand fundamental database concepts like tables and relationships. We'll work with realistic e-commerce data throughout the examples.

The Analytics Challenge: Why Regular Tables Don't Work

Let's start with a typical problem. Here's what raw e-commerce data often looks like:

-- Raw orders table - how data usually arrives
CREATE TABLE raw_orders (
    order_id INT,
    customer_email VARCHAR(100),
    customer_name VARCHAR(100),
    customer_city VARCHAR(50),
    customer_state VARCHAR(20),
    product_id INT,
    product_name VARCHAR(200),
    product_category VARCHAR(50),
    product_price DECIMAL(10,2),
    order_date DATE,
    quantity INT,
    discount_percent DECIMAL(5,2),
    shipping_cost DECIMAL(8,2)
);

This structure works fine for recording transactions, but it creates problems for analytics:

Redundant data: Customer information repeats for every order line
Inconsistent formats: Dates might be strings, categories might have typos
Complex queries: Getting monthly sales requires joining and aggregating across multiple concepts
Poor performance: Large scans for simple questions like "total sales by state"

Let's solve these problems with two different modeling approaches.

Dimensional Modeling: The Warehouse Way

Dimensional modeling separates your data into two types of tables:

Fact tables store measurable events (sales, clicks, registrations)
Dimension tables store descriptive attributes (customers, products, time periods)

Think of it like organizing a library. Facts are the books (the things you want to analyze), and dimensions are the catalog system (how you find and group the books).

Building Your First Star Schema

Let's transform our messy e-commerce data into a clean dimensional model. We'll create a central fact table surrounded by dimension tables—this pattern is called a "star schema" because of its visual shape.

-- Dimension: Customers
CREATE TABLE dim_customers (
    customer_key INT PRIMARY KEY,
    customer_email VARCHAR(100),
    customer_name VARCHAR(100),
    customer_city VARCHAR(50),
    customer_state VARCHAR(20),
    customer_segment VARCHAR(20)  -- added business logic
);

-- Dimension: Products  
CREATE TABLE dim_products (
    product_key INT PRIMARY KEY,
    product_id INT,
    product_name VARCHAR(200),
    product_category VARCHAR(50),
    product_subcategory VARCHAR(50),  -- more granular grouping
    product_brand VARCHAR(100)
);

-- Dimension: Time (crucial for analytics!)
CREATE TABLE dim_date (
    date_key INT PRIMARY KEY,  -- format: 20240315
    date_actual DATE,
    year_num INT,
    quarter_num INT,
    month_num INT,
    month_name VARCHAR(20),
    day_of_week INT,
    day_name VARCHAR(20),
    is_weekend BOOLEAN,
    is_holiday BOOLEAN
);

-- Fact: Order Line Items
CREATE TABLE fact_order_lines (
    order_line_key INT PRIMARY KEY,
    order_id INT,
    customer_key INT,
    product_key INT,
    date_key INT,
    -- Measures (the numbers you want to analyze)
    quantity INT,
    unit_price DECIMAL(10,2),
    discount_amount DECIMAL(8,2),
    line_total DECIMAL(10,2),
    -- Foreign keys to dimensions
    FOREIGN KEY (customer_key) REFERENCES dim_customers(customer_key),
    FOREIGN KEY (product_key) REFERENCES dim_products(product_key),
    FOREIGN KEY (date_key) REFERENCES dim_date(date_key)
);

Notice how we've made several improvements:

Surrogate keys: customer_key instead of email addresses for better performance
Business logic: Added customer segments and product hierarchies
Dedicated time dimension: Makes date-based analysis much easier
Separated concerns: Customer data lives in one place, not repeated everywhere

Loading Sample Data

Let's populate our dimensional model with realistic data:

-- Load customers (removing duplicates and adding segments)
INSERT INTO dim_customers VALUES
(1, 'sarah.chen@email.com', 'Sarah Chen', 'Seattle', 'WA', 'Premium'),
(2, 'mike.jones@email.com', 'Mike Jones', 'Austin', 'TX', 'Standard'),
(3, 'lisa.park@email.com', 'Lisa Park', 'Denver', 'CO', 'Premium');

-- Load products with hierarchy
INSERT INTO dim_products VALUES
(1, 101, 'MacBook Pro 14"', 'Electronics', 'Laptops', 'Apple'),
(2, 102, 'Wireless Mouse', 'Electronics', 'Accessories', 'Logitech'),
(3, 103, 'Standing Desk', 'Furniture', 'Office', 'IKEA');

-- Load some dates
INSERT INTO dim_date VALUES
(20240315, '2024-03-15', 2024, 1, 3, 'March', 5, 'Friday', FALSE, FALSE),
(20240316, '2024-03-16', 2024, 1, 3, 'March', 6, 'Saturday', TRUE, FALSE);

-- Load fact data
INSERT INTO fact_order_lines VALUES
(1, 1001, 1, 1, 20240315, 1, 1999.00, 100.00, 1899.00),
(2, 1001, 1, 2, 20240315, 2, 79.99, 0.00, 159.98),
(3, 1002, 2, 3, 20240316, 1, 299.99, 30.00, 269.99);

Querying the Star Schema

Now see how clean your analytical queries become:

-- Monthly sales by product category
SELECT 
    d.month_name,
    p.product_category,
    SUM(f.line_total) as total_sales,
    COUNT(*) as order_lines,
    AVG(f.line_total) as avg_line_value
FROM fact_order_lines f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_products p ON f.product_key = p.product_key
WHERE d.year_num = 2024
GROUP BY d.month_name, p.product_category
ORDER BY total_sales DESC;

-- Customer performance by segment and state
SELECT 
    c.customer_state,
    c.customer_segment,
    COUNT(DISTINCT f.order_id) as total_orders,
    SUM(f.line_total) as total_revenue,
    AVG(f.line_total) as avg_order_value
FROM fact_order_lines f
JOIN dim_customers c ON f.customer_key = c.customer_key
GROUP BY c.customer_state, c.customer_segment
ORDER BY total_revenue DESC;

Pro tip: Notice how we can slice and dice by any dimension without complex subqueries. That's the power of dimensional modeling—it makes complex business questions feel simple.

One Big Table: The Modern Cloud Approach

While dimensional modeling is battle-tested, modern cloud platforms have changed the game. With columnar storage and massive parallel processing, a different approach has emerged: the One Big Table (OBT).

Instead of splitting data across multiple tables, OBT denormalizes everything into a single, wide table. This trades storage efficiency for query simplicity and performance.

Designing Your One Big Table

Let's redesign our e-commerce data as an OBT:

-- One Big Table approach
CREATE TABLE orders_obt (
    -- Order identifiers
    order_line_id INT PRIMARY KEY,
    order_id INT,
    order_date DATE,
    
    -- Customer dimensions (denormalized)
    customer_id INT,
    customer_email VARCHAR(100),
    customer_name VARCHAR(100),
    customer_city VARCHAR(50),
    customer_state VARCHAR(20),
    customer_segment VARCHAR(20),
    customer_lifetime_orders INT,  -- pre-calculated
    customer_lifetime_value DECIMAL(12,2),  -- pre-calculated
    
    -- Product dimensions (denormalized)
    product_id INT,
    product_name VARCHAR(200),
    product_category VARCHAR(50),
    product_subcategory VARCHAR(50),
    product_brand VARCHAR(100),
    product_launch_date DATE,
    
    -- Time dimensions (denormalized)
    year_num INT,
    quarter_num INT,
    month_num INT,
    month_name VARCHAR(20),
    day_of_week INT,
    day_name VARCHAR(20),
    is_weekend BOOLEAN,
    is_holiday BOOLEAN,
    
    -- Fact measures
    quantity INT,
    unit_price DECIMAL(10,2),
    discount_amount DECIMAL(8,2),
    line_total DECIMAL(10,2),
    shipping_cost DECIMAL(8,2),
    
    -- Pre-calculated measures for common queries
    order_total DECIMAL(12,2),
    customer_order_number INT,  -- 1st, 2nd, 3rd order for this customer
    days_since_last_order INT
);

Loading the One Big Table

INSERT INTO orders_obt VALUES
(1, 1001, '2024-03-15', 
 1, 'sarah.chen@email.com', 'Sarah Chen', 'Seattle', 'WA', 'Premium', 5, 8500.00,
 101, 'MacBook Pro 14"', 'Electronics', 'Laptops', 'Apple', '2023-10-01',
 2024, 1, 3, 'March', 5, 'Friday', FALSE, FALSE,
 1, 1999.00, 100.00, 1899.00, 25.00, 2083.98, 3, 45),

(2, 1001, '2024-03-15',
 1, 'sarah.chen@email.com', 'Sarah Chen', 'Seattle', 'WA', 'Premium', 5, 8500.00, 
 102, 'Wireless Mouse', 'Electronics', 'Accessories', 'Logitech', '2023-05-15',
 2024, 1, 3, 'March', 5, 'Friday', FALSE, FALSE,
 2, 79.99, 0.00, 159.98, 0.00, 2083.98, 3, 45);

Querying One Big Table

The beauty of OBT is query simplicity:

-- Same monthly sales by category - much simpler!
SELECT 
    month_name,
    product_category,
    SUM(line_total) as total_sales,
    COUNT(*) as order_lines,
    AVG(line_total) as avg_line_value
FROM orders_obt
WHERE year_num = 2024
GROUP BY month_name, product_category
ORDER BY total_sales DESC;

-- Customer cohort analysis - leveraging pre-calculated fields
SELECT 
    customer_segment,
    customer_order_number,
    COUNT(DISTINCT customer_id) as customers,
    AVG(line_total) as avg_order_value,
    AVG(days_since_last_order) as avg_days_between_orders
FROM orders_obt
WHERE customer_order_number <= 5  -- first 5 orders
GROUP BY customer_segment, customer_order_number
ORDER BY customer_segment, customer_order_number;

Key insight: OBT queries are often faster to write and run, especially for exploratory analytics. You eliminate joins and can pre-calculate common business metrics.

When to Choose Each Approach

Choose Dimensional Modeling When:

Data governance is critical: Multiple teams need consistent definitions
Storage costs matter: You're working with very large datasets
Complex hierarchies exist: Products have multiple category levels, customers have territories
Data freshness varies: Some dimensions change rarely, facts change constantly

Choose One Big Table When:

Query speed is paramount: Analysts need sub-second response times
Self-service analytics: Business users write their own queries
Cloud-first architecture: You're using Snowflake, BigQuery, or similar platforms
Rapid prototyping: You need to get insights quickly during exploration

Hybrid Approaches

Many organizations use both:

Core dimensional model for consistent reporting and dashboards
OBT views or marts for specific analytics use cases
Feature stores (specialized OBTs) for machine learning

Hands-On Exercise

Let's build both models for a subscription business scenario. You're analyzing a SaaS company with these requirements:

Track monthly recurring revenue (MRR) by customer segment
Analyze churn patterns by signup source
Monitor feature usage by plan type

Step 1: Design Your Dimensional Model

Create tables for a subscription analytics warehouse:

-- Dimension: Customers with SaaS-specific attributes
CREATE TABLE dim_customers_saas (
    customer_key INT PRIMARY KEY,
    customer_id VARCHAR(50),
    company_name VARCHAR(200),
    industry VARCHAR(100),
    company_size VARCHAR(20),  -- Small, Medium, Large, Enterprise
    signup_source VARCHAR(50),  -- Organic, Paid Search, Referral, etc.
    signup_date DATE,
    customer_status VARCHAR(20)  -- Active, Churned, Paused
);

-- Dimension: Subscription Plans
CREATE TABLE dim_plans (
    plan_key INT PRIMARY KEY,
    plan_id VARCHAR(50),
    plan_name VARCHAR(100),
    plan_tier VARCHAR(20),  -- Starter, Pro, Enterprise
    monthly_price DECIMAL(8,2),
    annual_price DECIMAL(8,2),
    max_users INT,
    features_included TEXT
);

-- Fact: Monthly Subscription Metrics
CREATE TABLE fact_subscriptions (
    subscription_key INT PRIMARY KEY,
    customer_key INT,
    plan_key INT,
    date_key INT,
    -- Subscription measures
    mrr DECIMAL(10,2),
    arr DECIMAL(12,2),
    active_users INT,
    feature_usage_score DECIMAL(5,2),
    support_tickets INT,
    is_churned BOOLEAN,
    FOREIGN KEY (customer_key) REFERENCES dim_customers_saas(customer_key),
    FOREIGN KEY (plan_key) REFERENCES dim_plans(plan_key)
);

Step 2: Create the OBT Version

CREATE TABLE subscriptions_obt (
    -- Identifiers
    subscription_record_id INT PRIMARY KEY,
    customer_id VARCHAR(50),
    subscription_id VARCHAR(50),
    record_date DATE,
    
    -- Customer attributes (denormalized)
    company_name VARCHAR(200),
    industry VARCHAR(100),
    company_size VARCHAR(20),
    signup_source VARCHAR(50),
    signup_date DATE,
    customer_status VARCHAR(20),
    days_as_customer INT,  -- calculated field
    
    -- Plan attributes (denormalized)
    plan_id VARCHAR(50),
    plan_name VARCHAR(100),
    plan_tier VARCHAR(20),
    monthly_price DECIMAL(8,2),
    annual_price DECIMAL(8,2),
    
    -- Time attributes (denormalized)
    year_num INT,
    month_num INT,
    month_name VARCHAR(20),
    quarter_name VARCHAR(10),
    
    -- Metrics
    mrr DECIMAL(10,2),
    arr DECIMAL(12,2),
    active_users INT,
    feature_usage_score DECIMAL(5,2),
    support_tickets INT,
    is_churned BOOLEAN,
    
    -- Pre-calculated business metrics
    months_subscribed INT,
    total_lifetime_value DECIMAL(12,2),
    churn_risk_score DECIMAL(3,2)  -- ML model output
);

Step 3: Compare Query Performance

Write the same business question both ways:

-- Dimensional approach: Churn rate by signup source and plan tier
SELECT 
    c.signup_source,
    p.plan_tier,
    COUNT(*) as total_customers,
    SUM(CASE WHEN f.is_churned THEN 1 ELSE 0 END) as churned_customers,
    ROUND(100.0 * SUM(CASE WHEN f.is_churned THEN 1 ELSE 0 END) / COUNT(*), 2) as churn_rate_pct
FROM fact_subscriptions f
JOIN dim_customers_saas c ON f.customer_key = c.customer_key
JOIN dim_plans p ON f.plan_key = p.plan_key
JOIN dim_date d ON f.date_key = d.date_key
WHERE d.year_num = 2024 AND d.month_num <= 6
GROUP BY c.signup_source, p.plan_tier
ORDER BY churn_rate_pct DESC;

-- OBT approach: Same query, simpler syntax
SELECT 
    signup_source,
    plan_tier,
    COUNT(*) as total_customers,
    SUM(CASE WHEN is_churned THEN 1 ELSE 0 END) as churned_customers,
    ROUND(100.0 * SUM(CASE WHEN is_churned THEN 1 ELSE 0 END) / COUNT(*), 2) as churn_rate_pct
FROM subscriptions_obt
WHERE year_num = 2024 AND month_num <= 6
GROUP BY signup_source, plan_tier
ORDER BY churn_rate_pct DESC;

Notice how the OBT version eliminates three joins while providing the same insights.

Common Mistakes & Troubleshooting

Dimensional Modeling Pitfalls

Mistake: Creating too many dimension tables

-- Don't do this - over-normalization
CREATE TABLE dim_customer_emails (email_key INT, email VARCHAR(100));
CREATE TABLE dim_customer_names (name_key INT, first_name VARCHAR(50), last_name VARCHAR(50));

Better: Keep related attributes together

-- Do this - logical grouping
CREATE TABLE dim_customers (
    customer_key INT,
    email VARCHAR(100),
    first_name VARCHAR(50),
    last_name VARCHAR(50)
);

Mistake: Forgetting surrogate keys Using natural keys like email addresses or product codes as primary keys causes problems when those values change or contain special characters.

Mistake: Ignoring slowly changing dimensions Customer addresses change, product prices change. Plan for this:

-- Add effective dating for changing attributes
CREATE TABLE dim_products (
    product_key INT PRIMARY KEY,
    product_id VARCHAR(50),
    product_name VARCHAR(200),
    current_price DECIMAL(10,2),
    effective_date DATE,
    expiration_date DATE,
    is_current BOOLEAN
);

One Big Table Pitfalls

Mistake: Including everything in one table Not every piece of data belongs in your analytics table. Focus on what analysts actually query.

Mistake: Forgetting about data types

-- Bad - everything as text
CREATE TABLE bad_obt (
    revenue VARCHAR(50),  -- Should be DECIMAL
    order_date VARCHAR(50),  -- Should be DATE
    is_weekend VARCHAR(10)  -- Should be BOOLEAN
);

Mistake: Not pre-calculating common metrics If analysts always calculate customer lifetime value, do it once during data loading:

-- Add during ETL process
UPDATE orders_obt SET customer_lifetime_value = (
    SELECT SUM(line_total) 
    FROM orders_obt o2 
    WHERE o2.customer_id = orders_obt.customer_id
    AND o2.order_date <= orders_obt.order_date
);

Performance Troubleshooting

Slow dimensional queries: Check your join keys are indexed

-- Add indexes on foreign keys
CREATE INDEX idx_fact_customer ON fact_order_lines(customer_key);
CREATE INDEX idx_fact_product ON fact_order_lines(product_key);
CREATE INDEX idx_fact_date ON fact_order_lines(date_key);

Slow OBT queries: Partition large tables by date

-- Most cloud platforms support automatic partitioning
CREATE TABLE orders_obt (
    ...columns...
    order_date DATE
) PARTITION BY order_date;

Summary & Next Steps

You now understand two fundamental approaches to analytical data modeling:

Dimensional modeling organizes data into facts and dimensions, providing:

Clean separation of measures and attributes
Consistent business definitions across teams
Optimized storage for large datasets
Clear data lineage and governance

One Big Table denormalizes everything into a single table, offering:

Faster query performance for most analytics
Simpler SQL for business users
Reduced complexity for exploratory analysis
Better fit for modern cloud platforms

Neither approach is always right. The best choice depends on your specific needs: data size, team structure, performance requirements, and platform capabilities.

Immediate Next Steps:

Inventory your current analytics: Which queries are slow? Which data is hard to access?
Start small: Pick one business area (like sales or marketing) and model it both ways
Measure performance: Time your actual queries on real data volumes
Consider hybrid approaches: You can use both patterns in the same organization

What's Next in Your Learning:

Advanced dimensional techniques: Slowly changing dimensions, bridge tables, and factless facts
Modern data architectures: Data meshes, lake houses, and streaming analytics
ETL/ELT patterns: How to populate these models efficiently
Performance optimization: Partitioning, indexing, and materialized views

The foundation you've built here will serve you well as data volumes grow and requirements evolve. Whether you choose stars or tables, you're now equipped to make that decision strategically rather than accidentally.