This guide is designed for Data Engineer, Data Analyst, Database Developer, SQL Developer, BI Developer, ETL Developer, Cloud Data Engineer, Solution Architect, and Technical Architect interviews.

SECTION 1: SQL FUNDAMENTALS

1. What is SQL?

Answer:

SQL (Structured Query Language) is a standard language used to:

Store data
Retrieve data
Update data
Delete data
Manage databases

Example:

SELECT * FROM Employees;

2. What are the different types of SQL statements?

Answer:

DDL (Data Definition Language)

CREATE
ALTER
DROP
TRUNCATE

DML (Data Manipulation Language)

INSERT
UPDATE
DELETE
MERGE

DQL (Data Query Language)

SELECT

DCL (Data Control Language)

GRANT
REVOKE

TCL (Transaction Control Language)

COMMIT
ROLLBACK
SAVEPOINT

3. What is a Database?

Answer:

A structured collection of related data.

Example:

Employee Database
Banking Database
Hospital Database

4. What is a Table?

Answer:

A table stores data in rows and columns.

Example:

EmpID	Name
101	John
102	David

5. What is a Row?

Answer:

A row represents one record.

Example:

101 | John | IT

6. What is a Column?

Answer:

A column represents one attribute.

Example:

EmployeeName
Salary
Department

7. What is a Primary Key?

Answer:

A column that uniquely identifies each row.

Properties:

Unique
Not Null
One per table

Example:

CREATE TABLE Employee(
EmpID INT PRIMARY KEY
);

8. What is a Foreign Key?

Answer:

A column that creates relationship between tables.

CREATE TABLE Orders(
CustomerID INT,
FOREIGN KEY(CustomerID)
REFERENCES Customers(CustomerID)
);

9. Difference between Primary Key and Foreign Key?

Primary Key	Foreign Key
Unique	Can repeat
Not Null	Can be Null
Identifies row	Creates relationship

10. What is a Unique Key?

Answer:

Prevents duplicate values.

CREATE TABLE Users(
Email VARCHAR(100) UNIQUE
);

SECTION 2: NULLS & CONSTRAINTS

11. What is NULL?

Answer:

Represents missing or unknown value.

SELECT * FROM Employee
WHERE ManagerID IS NULL;

12. Difference between NULL and Zero?

NULL	Zero
Unknown	Numeric value
No data	Actual value

13. What is NOT NULL Constraint?

Answer:

Ensures column cannot be empty.

Name VARCHAR(50) NOT NULL

14. What is CHECK Constraint?

Answer:

Validates data.

Age INT CHECK (Age >=18)

15. What is DEFAULT Constraint?

Answer:

Assigns default value.

Status VARCHAR(20)
DEFAULT 'Active'

16. What is Referential Integrity?

Answer:

Ensures foreign key values exist in parent table.

17. What happens when inserting NULL into NOT NULL column?

Answer:

Database throws error.

18. What is Composite Key?

Answer:

Multiple columns forming primary key.

PRIMARY KEY(OrderID, ProductID)

19. What is Candidate Key?

Answer:

Columns capable of becoming Primary Key.

20. What is Alternate Key?

Answer:

Candidate Key not chosen as Primary Key.

SECTION 3: JOINS

21. What is JOIN?

Answer:

Combines rows from multiple tables.

22. Types of Joins?

INNER JOIN
LEFT JOIN
RIGHT JOIN
FULL JOIN
SELF JOIN
CROSS JOIN

23. INNER JOIN

SELECT *
FROM Orders O
INNER JOIN Customers C
ON O.CustomerID=C.CustomerID;

Returns matching rows.

24. LEFT JOIN

Returns all rows from left table.

SELECT *
FROM A
LEFT JOIN B
ON A.ID=B.ID;

25. RIGHT JOIN

Returns all rows from right table.

26. FULL OUTER JOIN

Returns all rows from both tables.

27. CROSS JOIN

Returns Cartesian product.

SELECT *
FROM A CROSS JOIN B;

28. SELF JOIN

Join table to itself.

SELECT E.Name,M.Name
FROM Employee E
JOIN Employee M
ON E.ManagerID=M.EmpID;

29. Difference between INNER and OUTER JOIN?

INNER = matching rows only

OUTER = matching + non-matching rows

30. What is Cartesian Product?

Answer:

Every row joined with every row.

SECTION 4: AGGREGATE FUNCTIONS

31. What is COUNT()?

SELECT COUNT(*) FROM Employee;

32. COUNT(*) vs COUNT(Column)

COUNT(*) counts all rows

COUNT(Column) ignores NULLs

33. SUM()

SELECT SUM(Salary)
FROM Employee;

34. AVG()

SELECT AVG(Salary)
FROM Employee;

35. MAX()

SELECT MAX(Salary)
FROM Employee;

36. MIN()

SELECT MIN(Salary)
FROM Employee;

37. GROUP BY

SELECT Department,
COUNT(*)
FROM Employee
GROUP BY Department;

38. HAVING

Filters grouped data.

SELECT Department,
COUNT(*)
FROM Employee
GROUP BY Department
HAVING COUNT(*) > 5;

39. WHERE vs HAVING?

WHERE	HAVING
Before Grouping	After Grouping
Individual Rows	Groups

40. Can Aggregate Functions Ignore NULL?

Yes.

SECTION 5: SUBQUERIES

41. What is a Subquery?

Query inside another query.

SELECT *
FROM Employee
WHERE Salary >
(
SELECT AVG(Salary)
FROM Employee
);

42. Types of Subqueries?

Single Row
Multiple Row
Correlated
Nested

43. What is Correlated Subquery?

Runs once for every outer row.

SELECT *
FROM Employee E
WHERE Salary >
(
SELECT AVG(Salary)
FROM Employee
WHERE Department=E.Department
);

44. EXISTS vs IN?

EXISTS:

Stops after first match
Better for large datasets

IN:

Compares list

45. NOT EXISTS vs NOT IN?

NOT EXISTS handles NULL better.

SECTION 6: INDEXES

46. What is an Index?

Database object improving query performance.

47. Why use Index?

Faster retrieval.

48. Types of Indexes?

Clustered
Non-clustered
Composite
Unique
Bitmap (Oracle)

49. Clustered Index?

Stores data physically sorted.

One per table.

50. Non-Clustered Index?

Separate structure pointing to data.

Multiple allowed.

51. Clustered vs Non-Clustered?

Clustered	Non-Clustered
Physical order	Logical structure
One	Many

52. Disadvantages of Index?

Storage
Slower inserts
Slower updates

53. What is Composite Index?

Multiple columns indexed.

CREATE INDEX idx_emp
ON Employee(DeptID,Salary);

54. What is Covering Index?

Contains all required columns.

55. What is Index Scan?

Reads many index pages.

56. What is Index Seek?

Direct lookup.

More efficient.

57. What is Full Table Scan?

Entire table read.

58. How to identify missing indexes?

Using execution plans.

59. What causes index fragmentation?

Frequent inserts/updates/deletes.

60. How to fix fragmentation?

Rebuild or reorganize index.

SECTION 7: NORMALIZATION

61. What is Normalization?

Reducing redundancy.

62. Why Normalize?

Better consistency
Less duplication

63. First Normal Form (1NF)

No repeating groups.

64. Second Normal Form (2NF)

1NF + no partial dependency.

65. Third Normal Form (3NF)

2NF + no transitive dependency.

66. BCNF?

Stronger version of 3NF.

67. What is Denormalization?

Adding redundancy for performance.

68. Benefits of Denormalization?

Faster reporting.

69. Drawbacks of Denormalization?

Data duplication.

70. OLTP vs OLAP?

OLTP	OLAP
Transactions	Analytics
Normalized	Denormalized

SECTION 8: TRANSACTIONS

71. What is Transaction?

Single logical unit of work.

72. ACID Properties?

Atomicity

All or nothing

Consistency

Valid state

Isolation

Transactions independent

Durability

Permanent after commit

73. COMMIT?

Saves transaction.

74. ROLLBACK?

Undo transaction.

75. SAVEPOINT?

Partial rollback point.

76. What is Deadlock?

Two transactions waiting on each other.

77. How to prevent deadlocks?

Consistent object access order
Short transactions

78. What is Locking?

Protecting data consistency.

79. Types of Locks?

Shared
Exclusive
Update

80. What is Blocking?

One transaction waits for another.

SECTION 9: ADVANCED SQL

81. What is CTE?

Common Table Expression.

WITH SalesCTE AS
(
SELECT *
FROM Sales
)
SELECT *
FROM SalesCTE;

82. Recursive CTE?

Used for hierarchies.

83. What is Window Function?

Performs calculations across rows.

84. ROW_NUMBER()

SELECT ROW_NUMBER()
OVER(ORDER BY Salary DESC)

85. RANK()

Handles ties.

86. DENSE_RANK()

No rank gaps.

87. LEAD()

Next row value.

88. LAG()

Previous row value.

89. NTILE()

Creates buckets.

90. Running Total?

SUM(Salary)
OVER(ORDER BY EmpID)

SECTION 10: SCENARIO-BASED QUESTIONS

91. Find 2nd Highest Salary

SELECT MAX(Salary)
FROM Employee
WHERE Salary <
(
SELECT MAX(Salary)
FROM Employee
);

92. Find Nth Highest Salary

SELECT Salary
FROM
(
SELECT Salary,
DENSE_RANK()
OVER(ORDER BY Salary DESC) RN
FROM Employee
) X
WHERE RN=5;

93. Find Duplicate Records

SELECT Email,
COUNT(*)
FROM Users
GROUP BY Email
HAVING COUNT(*)>1;

94. Delete Duplicate Records

Using ROW_NUMBER().

95. Find Employees Without Manager

SELECT *
FROM Employee
WHERE ManagerID IS NULL;

96. Find Department Highest Salary

Using ROW_NUMBER partition.

97. Find Top 3 Salaries per Department

Using DENSE_RANK().

98. Find Missing IDs

Using sequence comparison.

99. Difference Between DELETE, TRUNCATE, DROP?

DELETE	TRUNCATE	DROP
DML	DDL	DDL
Can rollback	Usually no	No
Removes rows	Removes all rows	Removes table

100. MERGE Statement?

UPSERT operation.

MERGE Target T
USING Source S
ON T.ID=S.ID
WHEN MATCHED THEN
UPDATE SET Name=S.Name
WHEN NOT MATCHED THEN
INSERT(ID,Name)
VALUES(S.ID,S.Name);

101–175 QUICK FIRE INTERVIEW QUESTIONS

What is View?
Simple View vs Materialized View?
What is Stored Procedure?
Function vs Procedure?
Trigger Types?
BEFORE Trigger?
AFTER Trigger?
INSTEAD OF Trigger?
What is Cursor?
Cursor drawbacks?
Implicit Cursor?
Explicit Cursor?
Dynamic SQL?
Static SQL?
Temporary Tables?
Global Temp Table?
Local Temp Table?
Table Variables?
What is Schema?
What is Data Warehouse?
Fact Table?
Dimension Table?
Star Schema?
Snowflake Schema?
Surrogate Key?
Natural Key?
Slowly Changing Dimension?
SCD Type 1?
SCD Type 2?
SCD Type 3?
What is ETL?
ETL vs ELT?
What is Partitioning?
Horizontal Partitioning?
Vertical Partitioning?
Sharding?
Query Optimizer?
Execution Plan?
Cardinality?
Statistics?
Parameter Sniffing?
Query Hints?
Explain Analyze?
Materialized View?
Index Rebuild?
Index Reorganize?
Heap Table?
Clustered Key Lookup?
Bookmark Lookup?
Covering Index?
Snapshot Isolation?
Read Committed?
Read Uncommitted?
Repeatable Read?
Serializable?
Phantom Reads?
Dirty Reads?
Non-repeatable Reads?
Database Replication?
Log Shipping?
AlwaysOn?
CDC (Change Data Capture)?
Change Tracking?
Audit Table?
Database Migration?
SQL Injection?
Prevention of SQL Injection?
Prepared Statements?
Encryption at Rest?
Encryption in Transit?
Transparent Data Encryption?
Row-Level Security?
Dynamic Data Masking?
Database Backup Types?
Full vs Incremental Backup?

PART 1: BASIC SQL (Warm-up & Screening) — 40 Questions

1. What is SQL?

Answer: Structured Query Language – used to communicate with relational databases (RDBMS).

2. Difference between SQL and MySQL?

SQL	MySQL
Language	RDBMS software
Standardized	Specific implementation

3. What are the 5 sublanguages of SQL?

DDL (Data Definition) – CREATE, ALTER, DROP, TRUNCATE
DML (Data Manipulation) – SELECT, INSERT, UPDATE, DELETE
DCL (Data Control) – GRANT, REVOKE
TCL (Transaction Control) – COMMIT, ROLLBACK, SAVEPOINT
DQL (Data Query) – SELECT

4. Difference between `DELETE` and `TRUNCATE`

DELETE	TRUNCATE
DML	DDL
WHERE clause allowed	No WHERE
slower (row-by-row)	faster (deallocates pages)
can rollback	can rollback in some DBs (but not all)
triggers fire	triggers don’t fire

5. Difference between `DROP` and `TRUNCATE`

DROP removes table structure; TRUNCATE removes data only.

6. What is a Primary Key?

Unique, not null, only one per table, can be composite.

7. What is a Foreign Key?

Links two tables; enforces referential integrity.

8. What is a Composite Key?

Primary key using two or more columns.

9. Difference between `UNIQUE` and `PRIMARY KEY`

UNIQUE	PRIMARY KEY
One NULL allowed	No NULL
Multiple per table	One per table

10. What is a `NULL` value?

Not zero, not blank – unknown or missing.

11. Difference between `WHERE` and `HAVING`

WHERE	HAVING
Before GROUP BY	After GROUP BY
Cannot use aggregate	Can use aggregate

12. What are aggregate functions?

COUNT, SUM, AVG, MIN, MAX

13. What is `GROUP BY`?

Groups rows with same values; used with aggregates.

14. What is `ORDER BY`?

Sorts result – ASC (default) or DESC.

15. Difference between `INNER JOIN` and `OUTER JOIN`

Inner – matching rows only
Outer – matching + non-matching (LEFT/RIGHT/FULL)

16. What is `LEFT JOIN`?

All rows from left table + matched from right; NULL if no match.

17. What is `SELF JOIN`?

Joining a table with itself using aliases.

18. What is a `CROSS JOIN`?

Cartesian product (every row × every row).

19. What is a `UNION` vs `UNION ALL`?

UNION removes duplicates; UNION ALL includes all.

20. What is a `VIEW`?

Virtual table (stored query). Not physically stored.

21. Can you update a view?

Yes, if it’s based on a single table and doesn’t use aggregates.

22. What is an index?

Improves SELECT speed; slows INSERT/UPDATE/DELETE.

23. Clustered vs Non-clustered index

Clustered – determines physical order (only one).
Non-clustered – separate structure with pointers.

24. What is a `JOIN`?

Combines columns from two+ tables based on related column.

25. What is `DISTINCT`?

Removes duplicate rows in result.

26. What are wildcards in `LIKE`?

% – any number of chars
_ – exactly one char

27. What is `IN` operator?

Shorthand for multiple OR conditions.

28. What is `BETWEEN`?

Inclusive range (numeric, text, dates).

29. What is `CASE` statement?

Conditional logic in SQL (like if-else).

Example:

sql

SELECT name,
  CASE WHEN salary > 100000 THEN 'High' ELSE 'Low' END
FROM employees;

30. What is `COALESCE`?

Returns first non-NULL value.

31. What is `NULLIF`?

Returns NULL if two expressions equal, else first expression.

32. Difference between `CHAR` and `VARCHAR`

CHAR(n)	VARCHAR(n)
Fixed length	Variable length
Faster for fixed-size	Saves space

33. What is a `TEMPORARY TABLE`?

Exists only for session; auto-dropped.

34. What is a `STORED PROCEDURE`?

Precompiled SQL code; can have input/output parameters.

35. What is a `FUNCTION` vs `PROCEDURE`?

Function	Procedure
Returns one value	May return none or multiple
Cannot have output params	Can have
Called in SELECT	Called with EXEC/CALL

36. What is `AUTO_INCREMENT` / `IDENTITY`?

Automatically generates sequential numeric values.

37. What is a transaction?

Set of SQL statements executed as a unit (ACID).

38. What are ACID properties?

Atomicity – all or nothing
Consistency – valid state before/after
Isolation – concurrent transactions don’t interfere
Durability – committed changes persist

39. What are SQL constraints?

NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, CHECK, DEFAULT.

40. Difference between `HAVING` and `WHERE` (detailed)

WHERE filters rows before aggregation; HAVING filters after.

PART 2: INTERMEDIATE SQL (Data Rounds) — 50 Questions

41. Write a query to find duplicate rows in a table.

sql

SELECT col1, col2, COUNT(*)
FROM table
GROUP BY col1, col2
HAVING COUNT(*) > 1;

42. Delete duplicate rows keeping one occurrence.

sql

DELETE FROM table
WHERE id NOT IN (
  SELECT MIN(id)
  FROM table
  GROUP BY col1, col2
);

43. Find second highest salary.

sql

SELECT MAX(salary) FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

Or using OFFSET:

sql

SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

44. Find nth highest salary.

sql

SELECT DISTINCT salary
FROM employees e1
WHERE N-1 = (
  SELECT COUNT(DISTINCT salary)
  FROM employees e2
  WHERE e2.salary > e1.salary
);

45. Find employees earning more than their manager.

sql

SELECT e.name
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;

46. Running total using window function.

sql

SELECT date, amount,
  SUM(amount) OVER (ORDER BY date) AS running_total
FROM sales;

47. Moving average (3-day).

sql

SELECT date, amount,
  AVG(amount) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM sales;

48. Rank employees by salary per department.

sql

SELECT name, dept_id, salary,
  RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rnk
FROM employees;

Difference: RANK (gaps), DENSE_RANK (no gaps), ROW_NUMBER (unique).

49. Lead/Lag – previous & next salary.

sql

SELECT name, salary,
  LAG(salary, 1) OVER (ORDER BY hire_date) AS prev_salary,
  LEAD(salary, 1) OVER (ORDER BY hire_date) AS next_salary
FROM employees;

50. Find employees hired in last 30 days.

sql

SELECT * FROM employees
WHERE hire_date >= CURRENT_DATE - INTERVAL '30 days';

51. Pivot table (row to column).

sql

SELECT product_id,
  SUM(CASE WHEN month='Jan' THEN sales END) AS Jan_sales,
  SUM(CASE WHEN month='Feb' THEN sales END) AS Feb_sales
FROM sales_data
GROUP BY product_id;

52. Unpivot using `UNION ALL`.

sql

SELECT product_id, 'Jan' AS month, Jan_sales AS sales
FROM pivot_table
UNION ALL
SELECT product_id, 'Feb', Feb_sales FROM pivot_table;

53. Recursive CTE to get hierarchy (org chart).

sql

WITH RECURSIVE org AS (
  SELECT id, name, manager_id, 1 AS level
  FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, e.manager_id, o.level+1
  FROM employees e
  JOIN org o ON e.manager_id = o.id
)
SELECT * FROM org;

54. Find missing IDs in sequence.

sql

SELECT t1.id + 1 AS missing_start
FROM table t1
LEFT JOIN table t2 ON t1.id + 1 = t2.id
WHERE t2.id IS NULL;

55. String concatenation per group.

MySQL: GROUP_CONCAT(name)
PostgreSQL: STRING_AGG(name, ',')
SQL Server: STRING_AGG(name, ',')

56. Update using join.

sql

UPDATE orders
SET status = 'Shipped'
FROM shipments
WHERE orders.id = shipments.order_id;

57. Delete using join (MySQL/PostgreSQL).

sql

DELETE o
FROM orders o
JOIN old_orders oo ON o.id = oo.id;

58. Merge / Upsert (INSERT OR UPDATE).

MySQL: INSERT ... ON DUPLICATE KEY UPDATE
PostgreSQL: INSERT ... ON CONFLICT DO UPDATE

59. Find first and last order per customer.

sql

SELECT DISTINCT customer_id,
  FIRST_VALUE(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS first,
  LAST_VALUE(order_date) OVER (PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last
FROM orders;

60. Difference between `ROW_NUMBER()`, `RANK()`, `DENSE_RANK()`.

Example:

Salary	ROW_NUMBER	RANK	DENSE_RANK
100	1	1	1
100	2	1	1
90	3	3	2

61. What is a CTE (Common Table Expression)?

Temporary named result set; improves readability.

62. CTE vs Subquery.

CTE	Subquery
Reusable in same query	Not reusable
Recursive possible	No recursion
Readable for complex queries	Can be messy

63. What is a Window Function?

Performs calculation across rows related to current row without collapsing them.

64. Difference between `WHERE` and `QUALIFY` (Snowflake etc.)

QUALIFY filters after window functions; WHERE cannot.

65. Find median salary.

sql

SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) FROM employees;

Or manual:

sql

SELECT AVG(salary) FROM (
  SELECT salary, ROW_NUMBER() OVER (ORDER BY salary) AS rn,
         COUNT(*) OVER () AS total
  FROM employees
) t WHERE rn IN (FLOOR((total+1)/2), CEIL((total+1)/2));

66. Mode (most frequent salary).

sql

SELECT salary FROM employees
GROUP BY salary
ORDER BY COUNT(*) DESC
LIMIT 1;

67. What is a covering index?

Index that includes all columns needed for query (no table access).

68. Composite index order importance?

Leftmost prefix rule: index on (A,B,C) supports A, A+B, A+B+C, but not B or C alone.

69. Write query for “employees who have same salary”.

sql

SELECT e1.name, e2.name, e1.salary
FROM employees e1
JOIN employees e2 ON e1.salary = e2.salary AND e1.id < e2.id;

70. What is `EXISTS` vs `IN`?

EXISTS stops on first match (faster for large subqueries).
IN evaluates all.

71. What is `ANY` / `ALL`?

sql

SELECT name FROM employees WHERE salary > ANY (SELECT salary FROM managers);

72. What is `INTERSECT`?

Returns common rows from two queries (removes duplicates).

73. What is `EXCEPT` / `MINUS`?

Rows in first query not in second.

74. Case when multiple conditions.

sql

CASE WHEN score >= 90 THEN 'A'
     WHEN score >= 80 THEN 'B'
     ELSE 'C'
END

75. What is `ISNULL()` vs `NULLIF()`.

ISNULL(col,0) – replace NULL with 0.
NULLIF(a,b) – returns NULL if a=b.

76. Generate date series.

PostgreSQL: generate_series('2024-01-01', '2024-12-31', '1 day'::interval)

77. Split comma-separated string into rows.

sql

SELECT unnest(string_to_array('a,b,c', ','));  -- PostgreSQL

78. Convert rows to JSON.

PostgreSQL: json_agg(), row_to_json()

79. Explain query plan.

EXPLAIN – shows execution strategy; EXPLAIN ANALYZE – runs + shows actual.

80. What is a bitmap index?

Used in data warehouses; efficient for low-cardinality columns.

81. What is table partitioning?

Splits large table into smaller physical pieces (range, list, hash).

82. What is sharding?

Horizontal partitioning across different servers.

83. What is materialized view?

Physical copy of query result (refreshed manually/scheduled).

84. What is a database deadlock?

Two transactions each waiting for the other to release a lock.

85. What is isolation level?

Read Uncommitted, Read Committed, Repeatable Read, Serializable.

86. What is a dirty read?

Reading uncommitted changes from another transaction.

87. What is a non-repeatable read?

Same query returns different results due to updates.

88. What is a phantom read?

New rows appear matching WHERE condition.

89. What is `WITH (NOLOCK)`? (SQL Server)

Allows dirty reads (less blocking).

90. What is an execution plan operator (e.g., nested loop, hash join)?

Nested loop – small tables, index present.
Hash join – large, unsorted tables.
Merge join – sorted inputs.

PART 3: ADVANCED SQL (Hard/Tricky) — 45 Questions

91. Find gaps and islands (consecutive days).

Islands example:

sql

WITH grp AS (
  SELECT order_date,
         order_date - ROW_NUMBER() OVER (ORDER BY order_date)::int AS grp
  FROM orders
)
SELECT MIN(order_date), MAX(order_date), COUNT(*)
FROM grp
GROUP BY grp;

92. Sessionization (user sessions from clicks).

sql

WITH session_start AS (
  SELECT user_id, click_time,
    CASE WHEN click_time - LAG(click_time) OVER (PARTITION BY user_id ORDER BY click_time) > interval '30 minutes'
         THEN 1 ELSE 0 END AS new_session
  FROM clicks
)
SELECT user_id, session_id, MIN(click_time), MAX(click_time)
FROM (
  SELECT *, SUM(new_session) OVER (PARTITION BY user_id ORDER BY click_time) AS session_id
  FROM session_start
) s
GROUP BY user_id, session_id;

93. Rolling retention (cohort analysis).

sql

WITH cohorts AS (
  SELECT user_id, MIN(order_date) AS first_order
  FROM orders GROUP BY user_id
)
SELECT EXTRACT(MONTH FROM c.first_order) AS cohort,
       EXTRACT(MONTH FROM o.order_date) - EXTRACT(MONTH FROM c.first_order) AS month_offset,
       COUNT(DISTINCT o.user_id) / MAX(COUNT(DISTINCT o.user_id)) OVER (PARTITION BY c.first_order) AS retention
FROM orders o
JOIN cohorts c ON o.user_id = c.user_id
GROUP BY 1,2;

94. Self-join to find overlapping date ranges.

sql

SELECT a.emp_id, a.start_date, a.end_date, b.start_date, b.end_date
FROM assignments a
JOIN assignments b ON a.emp_id = b.emp_id
WHERE a.id < b.id
  AND a.start_date <= b.end_date
  AND a.end_date >= b.start_date;

95. Recursive CTE to generate Fibonacci.

sql

WITH RECURSIVE fib(a,b,n) AS (
  SELECT 0::bigint,1::bigint,1
  UNION ALL
  SELECT b, a+b, n+1 FROM fib WHERE n < 20
)
SELECT a FROM fib;

96. What is a Window Frame?

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
RANGE BETWEEN – based on value, not row position.

97. Rank with ties skipping next rank?

Use RANK().

98. Cumulative distribution `CUME_DIST()`.

Relative rank (0 to 1).

99. Percent rank `PERCENT_RANK()`.

(RANK - 1) / (total rows - 1)

100. Conditional aggregation with filter clause (PostgreSQL).

sql

SELECT dept,
       COUNT(*) FILTER (WHERE salary > 50000) AS high_earners
FROM employees
GROUP BY dept;

101. String manipulation – extract domain from email.

sql

SELECT SUBSTRING(email FROM '@(.*)$') FROM users;

102. Split name into first/last.

sql

SELECT SPLIT_PART(full_name, ' ', 1) AS first,
       SPLIT_PART(full_name, ' ', 2) AS last
FROM names;

103. Parsing JSON in SQL.

PostgreSQL: data->>'key', jsonb_array_elements()

104. What is a partial index?

sql

CREATE INDEX active_users ON users(id) WHERE status = 'active';

105. Function vs stored procedure performance.

Functions are cached; procedures can manage transactions.

106. Dynamic SQL in stored procedure.

sql

EXECUTE IMMEDIATE 'SELECT * FROM ' || table_name;

107. Temporary table vs CTE performance.

Temp table can have indexes; CTE is in-memory (not always).

108. Database normalization up to 3NF.

1NF: atomic values
2NF: no partial dependency on composite key
3NF: no transitive dependency

109. Denormalization use cases.

Reporting, data warehouses, frequent joins.

110. What is a `FULL OUTER JOIN` example?

sql

SELECT * FROM customers c
FULL OUTER JOIN orders o ON c.id = o.cust_id;
-- includes customers without orders + orders without customers

111. What is a `NATURAL JOIN`?

Auto-joins on same column names (dangerous, avoid).

112. What is `USING` clause in join?

sql

SELECT * FROM employees JOIN departments USING (dept_id);

113. What are window function framing options?

ROWS, RANGE, GROUPS (SQL:2016).

114. What is `IGNORE NULLS` in LEAD/LAG?

Some DBs support: LAG(salary IGNORE NULLS)

115. What is `SELECT FOR UPDATE`?

Locks selected rows for update.

116. What is a `SAVEPOINT`?

sql

BEGIN;
SAVEPOINT sp1;
DELETE FROM orders;
ROLLBACK TO sp1;
COMMIT;

117. What is `READ COMMITTED SNAPSHOT`? (SQL Server)

Row-versioning to avoid blocking.

118. What is `pg_stat_activity`?

PostgreSQL view for current queries.

119. What is an execution time vs parse time?

Parse time builds plan; execution time runs it.

120. What is query optimization heuristic?

Rule-based vs cost-based optimizer.

121. What is a statistics histogram?

Used by optimizer to estimate row counts.

122. What is a cross-database query?

sql

SELECT * FROM db1.schema.table AS t1
JOIN db2.schema.table AS t2 ...

123. Export query results to CSV (PostgreSQL).

sql

\copy (SELECT * FROM users) TO 'users.csv' CSV HEADER;

124. What is `pg_dump` / `mysqldump`?

Backup utilities.

125. What is CDC (Change Data Capture)?

Capturing INSERT/UPDATE/DELETE for replication/analytics.

126. What is Slowly Changing Dimension (SCD) Type 2?

Track history with start/end date and current flag.

127. Write SCD Type 2 merge.

sql

UPDATE dim SET end_date = CURRENT_DATE, is_current = FALSE
FROM stage WHERE dim.id = stage.id AND dim.is_current = TRUE AND dim.data <> stage.data;

INSERT INTO dim (id, data, start_date, end_date, is_current)
SELECT id, data, CURRENT_DATE, '9999-12-31', TRUE FROM stage
ON CONFLICT ...;

128. What is a surrogate key vs natural key?

Surrogate: artificial (auto-increment)
Natural: business column (SSN)

129. What is an UUID vs serial?

UUID – globally unique, larger storage, slower.

130. What is a covering index with INCLUDE?

sql

CREATE INDEX idx ON table (col1) INCLUDE (col2, col3);

131. What is index skip scan?

Used if leading column has low cardinality but not in WHERE.

132. Write a query for “customers who bought all products”.

sql

SELECT customer_id
FROM orders
GROUP BY customer_id
HAVING COUNT(DISTINCT product_id) = (SELECT COUNT(*) FROM products);

133. Write a query for “top 3 products per category”.

sql

WITH ranked AS (
  SELECT product, category, sales,
         ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) AS rn
  FROM products
)
SELECT * FROM ranked WHERE rn <= 3;

134. Write query for “cumulative sum reset on negative”.

Requires recursion or custom aggregation.

135. Write query to find “students who have same marks in all subjects”.

sql

SELECT student_id
FROM scores
GROUP BY student_id
HAVING MIN(marks) = MAX(marks);

136. Write query to find “employees with no manager”.

sql

SELECT * FROM employees WHERE manager_id IS NULL;

137. Difference between `CROSS APPLY` and `JOIN` (SQL Server).

CROSS APPLY – like join with a correlated subquery; can call table-valued function.

138. What is `OUTER APPLY`?

Like LEFT JOIN for table-valued functions.

139. What is `PIVOT` in SQL Server?

sql

SELECT * FROM sales
PIVOT (SUM(amount) FOR month IN ([Jan],[Feb])) AS p;

140. What is `UNPIVOT`?

sql

SELECT product, month, sales
FROM pivot_table
UNPIVOT (sales FOR month IN (Jan, Feb)) AS u;

141. What are database hints?

Force specific join type or index usage (avoid unless expert).

142. What is `OPTION (RECOMPILE)`?

Forces recompilation (parameter sniffing fix).

143. What is an `OUTPUT` clause? (SQL Server)

Returns affected rows.

144. What is a `MERGE` statement?

Upsert in single statement (INSERT/UPDATE/DELETE).

145. What are sequences?

Generate numbers independent of table (vs identity).

146. What is `NEXTVAL` and `CURRVAL`?

Sequence functions.

147. What is `SYSDATE` vs `CURRENT_DATE`?

SYSDATE – includes time (Oracle)
CURRENT_DATE – ANSI SQL standard

148. What is an `INTERVAL` data type?

Stores duration (e.g., INTERVAL '1 day')

149. What is a common pitfall of `OR` in WHERE?

Prevents index usage; use UNION or IN instead.

150. Write query to “swap gender values”.

sql

UPDATE users SET gender = CASE WHEN gender='M' THEN 'F' ELSE 'M' END;

151. Write query to “remove spaces from column”.

sql

UPDATE table SET col = REPLACE(col, ' ', '');

152. What is `REGEXP_SUBSTR`?

Extract pattern (e.g., phone number).

153. What is `TRANSLATE` vs `REPLACE`?

TRANSLATE – char-by-char mapping; REPLACE – substring.

154. What is `SOUNDEX`?

Phonetic matching.

155. Difference between `IN` and `EXISTS` with example.

sql

-- IN (evaluates all)
SELECT name FROM cust WHERE id IN (SELECT cust_id FROM orders)

-- EXISTS (short-circuits)
SELECT name FROM cust WHERE EXISTS (SELECT 1 FROM orders WHERE cust_id = cust.id)

156. What is `GREATEST` / `LEAST`?

Returns max/min of list (not aggregate).

157. What is `WITH CHECK OPTION` in views?

Prevents updates that would exclude row from view.

158. What is an updatable view?

Single table, no aggregates, no DISTINCT, no GROUP BY.

159. What is a `GENERATED ALWAYS` column?

Computed column (virtual or stored).

160. What is `ON DELETE CASCADE`?

Deletes child rows when parent deleted.

161. What is `ON DELETE SET NULL`?

Sets foreign key to NULL.

162. What is deferrable constraint?

Checked at commit, not immediately.

163. What is a `CHECK` constraint example?

sql

ALTER TABLE employees ADD CONSTRAINT check_salary CHECK (salary > 0);

164. What is a `DEFAULT` constraint?

sql

created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP

165. What is an exclusion constraint? (PostgreSQL)

Prevents overlapping ranges.

166. What is a `PARTITION BY` for `ROW_NUMBER()`?

Groups before numbering.

167. Difference between `LAG` and `LEAD`.

LAG – previous row; LEAD – next row.

168. What is a `RANGE` window frame?

Based on value range, not row count.

169. What is `FILTER` clause for aggregates? (PostgreSQL)

Conditional aggregation.

170. What is `WITH ORDINALITY`?

Adds row number to set-returning functions.

171. What is a lateral join? (PostgreSQL)

Allows subquery to reference previous from clause.

172. What is `jsonb` vs `json` (PostgreSQL)?

jsonb – binary, faster indexing, no duplicate keys.

173. What is an `ARRAY` type in SQL?

PostgreSQL: INT[], ARRAY_AGG.

174. What is `unnest()`?

Expands array to rows.

175. What is a polymorphic table function?

Returns different schema based on parameters (advanced).

PART 4: SCENARIO-BASED HIGHEST PRIORITY — 10 Questions

176. Design a schema for Instagram.

Answer: Users, Posts, Followers, Likes, Comments, Stories, Messages.

177. Write query to find followers of followers (recommendations).

sql

SELECT DISTINCT f2.follower_id
FROM follows f1
JOIN follows f2 ON f1.follower_id = f2.followee_id
WHERE f1.followee_id = ?
  AND f2.follower_id NOT IN (
    SELECT follower_id FROM follows WHERE followee_id = ?
  );

178. Optimize a slow query.

Steps: EXPLAIN ANALYZE, missing index, wrong join order, statistics outdated, query rewrite.

179. Find “active users” who posted at least 1 post each week for 4 weeks.

sql

WITH weekly_posts AS (
  SELECT user_id, DATE_TRUNC('week', post_date) AS week,
         COUNT(*) AS posts
  FROM posts
  WHERE post_date >= CURRENT_DATE - INTERVAL '4 weeks'
  GROUP BY user_id, week
  HAVING COUNT(*) >= 1
)
SELECT user_id FROM weekly_posts
GROUP BY user_id
HAVING COUNT(DISTINCT week) = 4;

180. Handle billion-row table with no index.

Use partitioning, summary tables, map-reduce, or columnar storage (Parquet, Redshift).

Final Tips for SQL Interview

Write readable SQL – use CTEs, indentation, aliases.
Know window functions thoroughly – 90% of advanced questions.
Practice on real platforms – LeetCode (Database section), HackerRank, StrataScratch.
Know your DBMS quirks – MySQL, PostgreSQL, SQL Server differ.
Ask clarifying questions – joins, NULLs, duplicates handling.