Database Interview Questions 2026: SQL, Normalization, ACID and Sharding

⏱️4 min read · 680 words

Database interview questions in 2026 cover SQL fundamentals, transactions, indexing, normalization, NoSQL vs SQL, and cloud database services. This guide covers the most commonly asked database questions for backend developer and data engineer roles.

Core Database Questions

1. What is normalization? Explain 1NF, 2NF, 3NF

Normalization: organize tables to reduce redundancy and improve integrity

1NF (First Normal Form):
  - Each column contains atomic (indivisible) values
  - No repeating groups or arrays in columns
  BAD:  users(id, name, phones="555-1234,555-5678")
  GOOD: users(id, name) + phones(id, user_id, phone)

2NF (Second Normal Form):
  - Must be in 1NF
  - All non-key columns depend on the ENTIRE primary key (no partial dependency)
  BAD:  orders(order_id, product_id, product_name, qty)
        product_name depends only on product_id, not full key
  GOOD: orders(order_id, product_id, qty)
        products(product_id, product_name)

3NF (Third Normal Form):
  - Must be in 2NF
  - No transitive dependencies (non-key columns depend only on key)
  BAD:  employees(emp_id, dept_id, dept_name)
        dept_name depends on dept_id, not emp_id
  GOOD: employees(emp_id, dept_id)
        departments(dept_id, dept_name)

2. What are ACID properties?

Atomicity: Transaction is all-or-nothing — transfer $100: debit AND credit both succeed or both fail
Consistency: DB goes from one valid state to another — balance can never be negative
Isolation: Concurrent transactions don’t interfere — two users booking same seat don’t both succeed
Durability: Committed data survives crashes — written to disk, not just memory

3. Explain isolation levels and their trade-offs

-- Isolation levels (weakest to strongest):

-- READ UNCOMMITTED: can read uncommitted data (dirty reads)
-- Rarely used, maximum performance

-- READ COMMITTED (default in PostgreSQL):
-- Only reads committed data. Prevents dirty reads.
-- Non-repeatable reads possible (same query returns different results)

-- REPEATABLE READ (default in MySQL):
-- Same rows return same data within transaction.
-- Phantom reads possible (new rows can appear)

-- SERIALIZABLE: full isolation, no phantoms
-- Slowest, all transactions run as if sequential
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- In PostgreSQL
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- ...
COMMIT;

-- Common choice: READ COMMITTED for most apps
-- REPEATABLE READ for financial reporting
-- SERIALIZABLE for critical financial transactions

4. What is the difference between clustered and non-clustered indexes?

Clustered index: Data rows stored in index order. One per table. InnoDB uses PRIMARY KEY as clustered.
Non-clustered: Separate structure pointing to data rows. Multiple per table.

-- Clustered: physical order of data = index order
-- PRIMARY KEY automatically creates clustered index in most DBs
CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,  -- clustered index
  email VARCHAR(255),
  name VARCHAR(100)
);

-- Non-clustered: lookup structure → pointer to data
CREATE INDEX idx_email ON users(email);  -- non-clustered

-- Covering index: includes all needed columns
-- Allows index-only scan (never touches data pages)
CREATE INDEX idx_user_lookup ON users(email) INCLUDE (id, name);
-- SELECT id, name FROM users WHERE email = 'alice@example.com'
-- Can be answered entirely from index — very fast!

5. When would you choose NoSQL over SQL?

Choose SQL when	Choose NoSQL when
Complex relationships (JOINs)	Simple access patterns (key-value)
ACID transactions needed	Massive scale (sharding required)
Complex queries, aggregations	Flexible schema
Financial, inventory data	High write throughput
Team knows SQL	JSON documents, nested data

6. Explain database sharding

Sharding: horizontally partition data across multiple database instances.

Sharding strategies:

Range-based: shard by value range
  Shard 1: user_id 1-1M
  Shard 2: user_id 1M-2M
  Pros: range queries efficient
  Cons: hot spots (newest users all on last shard)

Hash-based: shard_num = hash(user_id) % num_shards
  Even distribution
  Cons: range queries hit all shards, resharding is hard

Directory-based: lookup table maps key → shard
  Flexible, easy to rebalance
  Cons: lookup overhead, single point of failure

Cross-shard challenges:
  - JOINs must be done in application layer
  - Distributed transactions are complex
  - Global uniqueness (use UUIDs instead of auto-increment)

Database interviews test breadth (SQL, normalization, ACID) and depth (query optimization, indexing strategy). Know how to use EXPLAIN ANALYZE, understand when each index type helps, and explain sharding trade-offs clearly. Production questions often cover connection pooling, replication lag, and handling N+1 queries.