SQL — Complete Practical & Conceptual Guide (2025)

Introduction — Think Declaratively

SQL is a declarative language: you describe the result you want, not how to compute it. This frees the DBMS to pick the best execution plan. Good SQL design relies on expressing intent clearly and understanding the cost model (IO, CPU, memory).

In practice: write clear, composable queries, profile with EXPLAIN, and iterate based on actual performance metrics.

DDL — Schema Design, Types & Migrations

Schema is a contract. Think about constraints, types, nullability, default values, and indexing at design time.

Column Types & Practical Choices

Integer types: use smallest type big enough for range (SMALLINT, INT, BIGINT).
Numeric/Decimal: use DECIMAL/NUMERIC for money to avoid floating point issues.
Text: VARCHAR with a reasonable limit or TEXT for unconstrained content; avoid huge indexed texts.
Dates/Times: TIMESTAMP WITH TIME ZONE when you care about absolute instants.
UUIDs: use for global uniqueness but be mindful of index fragmentation (use UUIDv1 or ordered UUIDs if necessary).

Constraints & Integrity

Primary keys, foreign keys, unique constraints, and check constraints encode business rules and prevent data corruption. Prefer DB-level constraints over app checks for correctness.

Migrations

Use a migration tool (Flyway, Liquibase, Rails ActiveRecord migrations) and design migrations to be backward-compatible where possible:

Add columns with default NULL, backfill in background, then add NOT NULL constraint.
Avoid dropping columns in a single step if replicas or old app versions may still read them.

CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,
  username VARCHAR(50) NOT NULL UNIQUE,
  email VARCHAR(255) NOT NULL UNIQUE,
  profile JSONB,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

ALTER TABLE users ADD COLUMN last_login TIMESTAMP;
-- Safe migration pattern: add nullable column -> backfill -> add NOT NULL constraint

DML — SELECT Patterns & Bulk Operations

Write SELECTs that are readable and optimized. Use LIMIT for paging, avoid large offsets, and prefer keyset pagination for deep pages.

Keyset Pagination (a.k.a. cursor pagination)

-- Avoid OFFSET for large pages; use WHERE + ORDER for keyset
SELECT id, created_at FROM events
WHERE (created_at, id) < ('2025-10-01 00:00:00', 1000)
ORDER BY created_at DESC, id DESC LIMIT 50;

Bulk Inserts & Upserts

-- Postgres upsert
INSERT INTO products (id, sku, name, price) VALUES (1,'SKU1','Prod1',9.99)
ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name, price = EXCLUDED.price;

-- Batch insert\ INSERT INTO table (cols) VALUES (),(),(); -- use COPY for huge loads

Safe Deletes

Never execute unbounded deletes in production—use batched deletes or move to archive table then delete.

-- Batch delete
DELETE FROM logs WHERE created_at < NOW() - INTERVAL '90 days' LIMIT 10000;
-- Repeat until rows deleted

Joins — Correctness & Performance Patterns

Joins are powerful; understand when to use each type and how they impact execution.

Join Types

INNER JOIN — return rows with matching keys on both sides.
LEFT JOIN — keep left rows, attach right if present.
RIGHT/FULL JOIN — rarer; use when combining sparse datasets.
CROSS JOIN — Cartesian product; use cautiously.

Avoiding N+1

N+1 arises when apps run one query per parent row to fetch children. Fix by joining or preloading child data in a single query.

-- Fetch users + latest order in one query (use lateral join)
SELECT u.id, u.username, o.id AS last_order_id, o.total
FROM users u
LEFT JOIN LATERAL (
  SELECT id, total FROM orders WHERE customer_id = u.id ORDER BY created_at DESC LIMIT 1
) o ON true;

CTEs, Recursive Queries & Window Functions

CTEs (WITH)

CTEs improve clarity and composability. Be cautious: some engines materialize CTEs (affecting performance) while others inline them.

WITH recent_sales AS (
  SELECT product_id, SUM(amount) AS revenue FROM sales WHERE sale_date > NOW() - INTERVAL '7 days' GROUP BY product_id
)
SELECT p.id, p.name, r.revenue FROM products p LEFT JOIN recent_sales r ON p.id = r.product_id;

Recursive Queries (hierarchies)

WITH RECURSIVE managers AS (
  SELECT id, manager_id, name FROM employees WHERE id = 1
  UNION ALL
  SELECT e.id, e.manager_id, e.name FROM employees e JOIN managers m ON e.manager_id = m.id
)
SELECT * FROM managers;

Window Functions

Window functions let you compute aggregates without collapsing rows. Useful for rankings, running totals, and lag/lead.

SELECT id, customer_id, total,
  RANK() OVER (PARTITION BY customer_id ORDER BY total DESC) AS rank,
  SUM(total) OVER (PARTITION BY customer_id ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_sum
FROM orders;

JSON/JSONB & Semi-Structured Data

Modern DBs support JSON types—store flexible schemas while indexing critical paths with JSONB indexes (Postgres) or document store indexes (MongoDB).

-- Postgres JSONB query
SELECT id, profile->>'name' AS name FROM users WHERE profile->>'country' = 'IN';
-- GIN index for JSONB
CREATE INDEX idx_users_profile ON users USING gin (profile jsonb_path_ops);

When to use JSON: evolving schemas or storing sparse attributes. When NOT to use JSON: when you need relational constraints and frequent joins on those fields.

Indexes — Strategy, Maintenance & Heuristics

Indexes speed reads but slow writes. Decide indexes based on query patterns, selectivity, and maintenance window.

Types of Indexes

B-Tree: default for ordered queries and ranges.
Hash: exact-match lookups.
GIN/GiST: array, JSONB, full-text, geospatial indexes.
Partial indexes: index subset of rows for high selectivity (e.g., WHERE deleted_at IS NULL).

Indexing Guidelines

Index columns used in WHERE and JOIN predicates.
Composite indexes order matters—put most selective columns first or match query order.
Covering indexes include all columns needed by query — avoids lookups to table rows.

-- Composite covering index example
CREATE INDEX idx_orders_customer_date_total ON orders (customer_id, order_date DESC) INCLUDE (total);

-- Partial index
CREATE INDEX idx_active_users ON users (last_login) WHERE active = true;

Execution Plans, EXPLAIN & Interpreting Results

Use EXPLAIN / EXPLAIN ANALYZE to compare estimated vs actual rows, IO, and time. Key things to watch:

Sequential scan vs index scan
Join method (nested loop/hash/merge)
Estimated rows vs actual rows — indicates poor statistics
Buffers read/written (IO cost)

EXPLAIN (ANALYZE, BUFFERS)
SELECT p.id, COUNT(o.*) FROM products p JOIN orders o ON p.id = o.product_id GROUP BY p.id;

If actual rows are much larger than estimated, consider ANALYZE/update statistics or rewrite query to give optimizer better hints.

Partitioning, Clustering & Table Design for Scale

Partitioning splits large tables into smaller pieces (by range, list, or hash). Improves manageability and query performance for partition-pruning queries.

Partitioning Strategies

Range partitions (by date) for time-series data.
Hash partitions for even distribution across shards/partitions.
List partitions for categorical splits.

-- Postgres range partition example
CREATE TABLE events (
  id BIGSERIAL PRIMARY KEY,
  event_time TIMESTAMP NOT NULL,
  payload JSONB
) PARTITION BY RANGE (event_time);

CREATE TABLE events_2025_01 PARTITION OF events FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

Clustering (physically ordering a table by an index) can reduce IO for range queries, but must be re-run periodically to maintain order after writes.

Transactions, Locks & Isolation Levels

Understand locking semantics and pick an isolation level that balances correctness and performance.

Common Lock Types

Row-level locks (SELECT ... FOR UPDATE)
Table-level locks
Advisory locks (application-level cooperation)

Avoiding Lock Contention

Keep transactions short.
Avoid full table scans inside transactions.
Use lower isolation levels where safe (READ COMMITTED) and apply application-level checks.

BEGIN;
SELECT balance FROM accounts WHERE id = 1 FOR UPDATE; -- locks row
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;

Stored Procedures, Triggers & User-Defined Functions

Stored logic can be efficient but reduces portability. Keep procedures small and testable; avoid heavy business logic in DB unless justified.

-- Postgres example
CREATE OR REPLACE FUNCTION add_order(customer_id INT, items JSONB) RETURNS VOID AS $$
BEGIN
  -- simplified example: insert order and items with validation
  INSERT INTO orders (customer_id, total, created_at) VALUES (customer_id, 0, NOW()) RETURNING id INTO new_order_id;
  -- iterate JSON items, insert OrderItems, update total
END;
$$ LANGUAGE plpgsql;

Triggers are useful for audits and enforcing invariants but can hide side effects — document them well.

Performance Tuning & Anti-Patterns

Avoid SELECT * in production; fetch only needed columns.
Avoid functions on indexed columns in WHERE (prevents index use).
Be wary of OR conditions that prevent index usage—rewrite using UNION ALL when possible.
Use LIMIT early for exploratory queries to avoid large scans.

Index Maintenance

Rebuild fragmented indexes, monitor bloat (vacuum/reindex). For Postgres, run VACUUM/ANALYZE periodically to keep stats fresh.

Materialized Views

Materialized views precompute expensive results; refresh them on schedule or incrementally if supported.

Security — Roles, Grants & SQL Injection

Principle of least privilege and parameterized queries are non-negotiable.

-- Example: parameterized query (pseudo)
PREPARE stmt AS SELECT * FROM users WHERE email = $1;
EXECUTE stmt('a@example.com');

-- Role example
CREATE ROLE readonly NOINHERIT;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly;

Backups, Restore & Point-in-Time Recovery (PITR)

Automate backups and test restores. For WAL-based systems, ship WAL segments to remote storage to enable PITR.

-- Example Postgres pg_dump (logical)
pg_dump -Fc -h host -U user -d dbname -f backup.dump

-- Restore
pg_restore -d dbname backup.dump

-- Base backup + WAL for PITR (example workflow)
pg_basebackup -D /var/lib/postgres -F tar -z -P --wal-method=stream
-- then ship WAL files to S3 and restore to desired point in time

Distributed SQL & Cloud Considerations

Cloud-managed DBs add convenience but hide internals. Understand SLA, replication strategy, backup policies, and network topology.

Cloud trade-offs

Managed services handle backups and HA, but can cause noisy-neighbor issues and black-box behavior.
Network latency to cloud region is real — collocate apps and DBs if latency-sensitive.
Distributed SQL (Cockroach, Spanner) gives strong consistency but may increase write latency due to consensus across regions.

Worked Examples & Cheats

Top-N per group (classic)

SELECT * FROM (
  SELECT p.id, p.customer_id, p.total,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY total DESC) as rn
  FROM orders p
) x WHERE rn <= 3;

Detecting duplicates

SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1;

-- Delete duplicates keeping min(id)
DELETE FROM users u USING (
  SELECT MIN(id) as keep_id, email FROM users GROUP BY email HAVING COUNT(*) > 1
) d WHERE u.email = d.email AND u.id <> d.keep_id;

Full-text search (Postgres)

-- Create tsvector column and GIN index
ALTER TABLE articles ADD COLUMN tsv tsvector;
UPDATE articles SET tsv = to_tsvector('english', coalesce(title,'') || ' ' || coalesce(body,''));
CREATE INDEX idx_articles_tsv ON articles USING gin (tsv);

-- Query
SELECT id, title FROM articles WHERE tsv @@ plainto_tsquery('english', 'distributed databases');

Summary & Next Steps

Mastering SQL is iterative: write clear queries, profile with EXPLAIN, and adjust schema or indexes based on real workload. Apply patterns here in your apps, and pair them with the DBMS guide for storage and distributed systems concepts.

Next: integrate these SQL patterns into your testing and deployment pipelines; automate migrations and monitor production queries.