This guide is designed for Data Engineer, Data Analyst, Database Developer, SQL Developer, BI Developer, ETL Developer, Cloud Data Engineer, Solution Architect, and Technical Architect interviews.
SECTION 1: SQL FUNDAMENTALS
1. What is SQL?
Answer:
SQL (Structured Query Language) is a standard language used to:
- Store data
- Retrieve data
- Update data
- Delete data
- Manage databases
Example:
SELECT * FROM Employees;2. What are the different types of SQL statements?
Answer:
DDL (Data Definition Language)
CREATE
ALTER
DROP
TRUNCATEDML (Data Manipulation Language)
INSERT
UPDATE
DELETE
MERGEDQL (Data Query Language)
SELECTDCL (Data Control Language)
GRANT
REVOKETCL (Transaction Control Language)
COMMIT
ROLLBACK
SAVEPOINT3. What is a Database?
Answer:
A structured collection of related data.
Example:
- Employee Database
- Banking Database
- Hospital Database
4. What is a Table?
Answer:
A table stores data in rows and columns.
Example:
| EmpID | Name |
|---|---|
| 101 | John |
| 102 | David |
5. What is a Row?
Answer:
A row represents one record.
Example:
101 | John | IT6. What is a Column?
Answer:
A column represents one attribute.
Example:
EmployeeName
Salary
Department7. What is a Primary Key?
Answer:
A column that uniquely identifies each row.
Properties:
- Unique
- Not Null
- One per table
Example:
CREATE TABLE Employee(
EmpID INT PRIMARY KEY
);8. What is a Foreign Key?
Answer:
A column that creates relationship between tables.
CREATE TABLE Orders(
CustomerID INT,
FOREIGN KEY(CustomerID)
REFERENCES Customers(CustomerID)
);9. Difference between Primary Key and Foreign Key?
| Primary Key | Foreign Key |
|---|---|
| Unique | Can repeat |
| Not Null | Can be Null |
| Identifies row | Creates relationship |
10. What is a Unique Key?
Answer:
Prevents duplicate values.
CREATE TABLE Users(
Email VARCHAR(100) UNIQUE
);SECTION 2: NULLS & CONSTRAINTS
11. What is NULL?
Answer:
Represents missing or unknown value.
SELECT * FROM Employee
WHERE ManagerID IS NULL;12. Difference between NULL and Zero?
| NULL | Zero |
|---|---|
| Unknown | Numeric value |
| No data | Actual value |
13. What is NOT NULL Constraint?
Answer:
Ensures column cannot be empty.
Name VARCHAR(50) NOT NULL14. What is CHECK Constraint?
Answer:
Validates data.
Age INT CHECK (Age >=18)15. What is DEFAULT Constraint?
Answer:
Assigns default value.
Status VARCHAR(20)
DEFAULT 'Active'16. What is Referential Integrity?
Answer:
Ensures foreign key values exist in parent table.
17. What happens when inserting NULL into NOT NULL column?
Answer:
Database throws error.
18. What is Composite Key?
Answer:
Multiple columns forming primary key.
PRIMARY KEY(OrderID, ProductID)19. What is Candidate Key?
Answer:
Columns capable of becoming Primary Key.
20. What is Alternate Key?
Answer:
Candidate Key not chosen as Primary Key.
SECTION 3: JOINS
21. What is JOIN?
Answer:
Combines rows from multiple tables.
22. Types of Joins?
- INNER JOIN
- LEFT JOIN
- RIGHT JOIN
- FULL JOIN
- SELF JOIN
- CROSS JOIN
23. INNER JOIN
SELECT *
FROM Orders O
INNER JOIN Customers C
ON O.CustomerID=C.CustomerID;Returns matching rows.
24. LEFT JOIN
Returns all rows from left table.
SELECT *
FROM A
LEFT JOIN B
ON A.ID=B.ID;25. RIGHT JOIN
Returns all rows from right table.
26. FULL OUTER JOIN
Returns all rows from both tables.
27. CROSS JOIN
Returns Cartesian product.
SELECT *
FROM A CROSS JOIN B;28. SELF JOIN
Join table to itself.
SELECT E.Name,M.Name
FROM Employee E
JOIN Employee M
ON E.ManagerID=M.EmpID;29. Difference between INNER and OUTER JOIN?
INNER = matching rows only
OUTER = matching + non-matching rows
30. What is Cartesian Product?
Answer:
Every row joined with every row.
SECTION 4: AGGREGATE FUNCTIONS
31. What is COUNT()?
SELECT COUNT(*) FROM Employee;32. COUNT(*) vs COUNT(Column)
COUNT(*) counts all rows
COUNT(Column) ignores NULLs
33. SUM()
SELECT SUM(Salary)
FROM Employee;34. AVG()
SELECT AVG(Salary)
FROM Employee;35. MAX()
SELECT MAX(Salary)
FROM Employee;36. MIN()
SELECT MIN(Salary)
FROM Employee;37. GROUP BY
SELECT Department,
COUNT(*)
FROM Employee
GROUP BY Department;38. HAVING
Filters grouped data.
SELECT Department,
COUNT(*)
FROM Employee
GROUP BY Department
HAVING COUNT(*) > 5;39. WHERE vs HAVING?
| WHERE | HAVING |
|---|---|
| Before Grouping | After Grouping |
| Individual Rows | Groups |
40. Can Aggregate Functions Ignore NULL?
Yes.
SECTION 5: SUBQUERIES
41. What is a Subquery?
Query inside another query.
SELECT *
FROM Employee
WHERE Salary >
(
SELECT AVG(Salary)
FROM Employee
);42. Types of Subqueries?
- Single Row
- Multiple Row
- Correlated
- Nested
43. What is Correlated Subquery?
Runs once for every outer row.
SELECT *
FROM Employee E
WHERE Salary >
(
SELECT AVG(Salary)
FROM Employee
WHERE Department=E.Department
);44. EXISTS vs IN?
EXISTS:
- Stops after first match
- Better for large datasets
IN:
- Compares list
45. NOT EXISTS vs NOT IN?
NOT EXISTS handles NULL better.
SECTION 6: INDEXES
46. What is an Index?
Database object improving query performance.
47. Why use Index?
Faster retrieval.
48. Types of Indexes?
- Clustered
- Non-clustered
- Composite
- Unique
- Bitmap (Oracle)
49. Clustered Index?
Stores data physically sorted.
One per table.
50. Non-Clustered Index?
Separate structure pointing to data.
Multiple allowed.
51. Clustered vs Non-Clustered?
| Clustered | Non-Clustered |
|---|---|
| Physical order | Logical structure |
| One | Many |
52. Disadvantages of Index?
- Storage
- Slower inserts
- Slower updates
53. What is Composite Index?
Multiple columns indexed.
CREATE INDEX idx_emp
ON Employee(DeptID,Salary);54. What is Covering Index?
Contains all required columns.
55. What is Index Scan?
Reads many index pages.
56. What is Index Seek?
Direct lookup.
More efficient.
57. What is Full Table Scan?
Entire table read.
58. How to identify missing indexes?
Using execution plans.
59. What causes index fragmentation?
Frequent inserts/updates/deletes.
60. How to fix fragmentation?
Rebuild or reorganize index.
SECTION 7: NORMALIZATION
61. What is Normalization?
Reducing redundancy.
62. Why Normalize?
- Better consistency
- Less duplication
63. First Normal Form (1NF)
No repeating groups.
64. Second Normal Form (2NF)
1NF + no partial dependency.
65. Third Normal Form (3NF)
2NF + no transitive dependency.
66. BCNF?
Stronger version of 3NF.
67. What is Denormalization?
Adding redundancy for performance.
68. Benefits of Denormalization?
Faster reporting.
69. Drawbacks of Denormalization?
Data duplication.
70. OLTP vs OLAP?
| OLTP | OLAP |
|---|---|
| Transactions | Analytics |
| Normalized | Denormalized |
SECTION 8: TRANSACTIONS
71. What is Transaction?
Single logical unit of work.
72. ACID Properties?
Atomicity
All or nothing
Consistency
Valid state
Isolation
Transactions independent
Durability
Permanent after commit
73. COMMIT?
Saves transaction.
74. ROLLBACK?
Undo transaction.
75. SAVEPOINT?
Partial rollback point.
76. What is Deadlock?
Two transactions waiting on each other.
77. How to prevent deadlocks?
- Consistent object access order
- Short transactions
78. What is Locking?
Protecting data consistency.
79. Types of Locks?
- Shared
- Exclusive
- Update
80. What is Blocking?
One transaction waits for another.
SECTION 9: ADVANCED SQL
81. What is CTE?
Common Table Expression.
WITH SalesCTE AS
(
SELECT *
FROM Sales
)
SELECT *
FROM SalesCTE;82. Recursive CTE?
Used for hierarchies.
83. What is Window Function?
Performs calculations across rows.
84. ROW_NUMBER()
SELECT ROW_NUMBER()
OVER(ORDER BY Salary DESC)85. RANK()
Handles ties.
86. DENSE_RANK()
No rank gaps.
87. LEAD()
Next row value.
88. LAG()
Previous row value.
89. NTILE()
Creates buckets.
90. Running Total?
SUM(Salary)
OVER(ORDER BY EmpID)SECTION 10: SCENARIO-BASED QUESTIONS
91. Find 2nd Highest Salary
SELECT MAX(Salary)
FROM Employee
WHERE Salary <
(
SELECT MAX(Salary)
FROM Employee
);92. Find Nth Highest Salary
SELECT Salary
FROM
(
SELECT Salary,
DENSE_RANK()
OVER(ORDER BY Salary DESC) RN
FROM Employee
) X
WHERE RN=5;93. Find Duplicate Records
SELECT Email,
COUNT(*)
FROM Users
GROUP BY Email
HAVING COUNT(*)>1;94. Delete Duplicate Records
Using ROW_NUMBER().
95. Find Employees Without Manager
SELECT *
FROM Employee
WHERE ManagerID IS NULL;96. Find Department Highest Salary
Using ROW_NUMBER partition.
97. Find Top 3 Salaries per Department
Using DENSE_RANK().
98. Find Missing IDs
Using sequence comparison.
99. Difference Between DELETE, TRUNCATE, DROP?
| DELETE | TRUNCATE | DROP |
|---|---|---|
| DML | DDL | DDL |
| Can rollback | Usually no | No |
| Removes rows | Removes all rows | Removes table |
100. MERGE Statement?
UPSERT operation.
MERGE Target T
USING Source S
ON T.ID=S.ID
WHEN MATCHED THEN
UPDATE SET Name=S.Name
WHEN NOT MATCHED THEN
INSERT(ID,Name)
VALUES(S.ID,S.Name);101–175 QUICK FIRE INTERVIEW QUESTIONS
- What is View?
- Simple View vs Materialized View?
- What is Stored Procedure?
- Function vs Procedure?
- Trigger Types?
- BEFORE Trigger?
- AFTER Trigger?
- INSTEAD OF Trigger?
- What is Cursor?
- Cursor drawbacks?
- Implicit Cursor?
- Explicit Cursor?
- Dynamic SQL?
- Static SQL?
- Temporary Tables?
- Global Temp Table?
- Local Temp Table?
- Table Variables?
- What is Schema?
- What is Data Warehouse?
- Fact Table?
- Dimension Table?
- Star Schema?
- Snowflake Schema?
- Surrogate Key?
- Natural Key?
- Slowly Changing Dimension?
- SCD Type 1?
- SCD Type 2?
- SCD Type 3?
- What is ETL?
- ETL vs ELT?
- What is Partitioning?
- Horizontal Partitioning?
- Vertical Partitioning?
- Sharding?
- Query Optimizer?
- Execution Plan?
- Cardinality?
- Statistics?
- Parameter Sniffing?
- Query Hints?
- Explain Analyze?
- Materialized View?
- Index Rebuild?
- Index Reorganize?
- Heap Table?
- Clustered Key Lookup?
- Bookmark Lookup?
- Covering Index?
- Snapshot Isolation?
- Read Committed?
- Read Uncommitted?
- Repeatable Read?
- Serializable?
- Phantom Reads?
- Dirty Reads?
- Non-repeatable Reads?
- Database Replication?
- Log Shipping?
- AlwaysOn?
- CDC (Change Data Capture)?
- Change Tracking?
- Audit Table?
- Database Migration?
- SQL Injection?
- Prevention of SQL Injection?
- Prepared Statements?
- Encryption at Rest?
- Encryption in Transit?
- Transparent Data Encryption?
- Row-Level Security?
- Dynamic Data Masking?
- Database Backup Types?
- Full vs Incremental Backup?
Top 25 SQL Questions Asked Most Frequently
- Difference between DELETE, TRUNCATE, DROP
- Primary Key vs Unique Key
- Clustered vs Non-Clustered Index
- INNER vs LEFT JOIN
- EXISTS vs IN
- CTE vs Subquery
- View vs Materialized View
- Normalization vs Denormalization
- ACID Properties
- Deadlock
- Window Functions
- ROW_NUMBER vs RANK vs DENSE_RANK
- Find 2nd Highest Salary
- Find Duplicate Records
- Find Top N Records
- Correlated Subquery
- Transactions
- Isolation Levels
- Indexing Strategy
- Execution Plans
- Partitioning
- Stored Procedure vs Function
- Triggers
- SQL Injection Prevention
- Data Warehouse Modeling
For senior-level Data Engineer, Cloud Architect, and Database Architect interviews (10–15 years experience), candidates are often expected to answer 250+ advanced SQL questions, including query optimization, indexing internals, execution plans, partitioning, data warehousing, CDC, distributed SQL, and real-world scenario-based problems.
Another Set of Questions
This covers foundational to advanced SQL topics for interviews (freshers to experienced roles). Questions are categorized for easier navigation. Answers include explanations, examples (using standard SQL with common dialect notes), and key points.
1. Basics & Fundamentals (1-30)
1. What is SQL? SQL (Structured Query Language) is the standard language for managing and querying relational databases. It handles DDL (defining structures), DML (manipulating data), DQL (querying), and more.
2. What is an RDBMS? How does it differ from DBMS? RDBMS (Relational Database Management System) stores data in tables with relations via keys and follows ACID properties. DBMS is more general and may not enforce relational models or normalization. Examples: MySQL, PostgreSQL (RDBMS); some older file-based systems (DBMS).
3. What are the main subsets of SQL?
- DDL: CREATE, ALTER, DROP, TRUNCATE.
- DML: INSERT, UPDATE, DELETE, SELECT (sometimes grouped under DQL).
- DCL: GRANT, REVOKE.
- TCL: COMMIT, ROLLBACK, SAVEPOINT.
4. What is the difference between SQL and MySQL? SQL is the language/standard. MySQL is a specific RDBMS implementing SQL.
5. What is a Database? An organized collection of data, typically in tables (rows/records and columns/fields).
6. What is a Table? A collection of related data in rows and columns.
7. What is a Query? A request to retrieve, insert, update, or delete data (e.g., SELECT statement).
8. What is the SELECT statement? Retrieves data: SELECT column1, column2 FROM table_name WHERE condition;
9. What is an SQL Alias? Temporary name for a table/column: SELECT first_name AS fname FROM employees;
10. What are common clauses with SELECT? WHERE, GROUP BY, HAVING, ORDER BY, LIMIT/OFFSET.
11-20: Data Types
- Difference between CHAR and VARCHAR: CHAR is fixed-length (pads spaces); VARCHAR is variable-length.
- INT vs BIGINT, DATE vs DATETIME, TEXT vs VARCHAR, etc.
21. What are Constraints? Rules for data integrity: NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, CHECK, DEFAULT.
22. What is a Primary Key? Uniquely identifies each row; UNIQUE + NOT NULL; only one per table (can be composite).
23. What is a Foreign Key? Links to a Primary Key in another table for referential integrity.
24. What is a UNIQUE constraint? Ensures distinct values; multiple allowed per table (NULLs often permitted depending on DBMS).
25. What is a Composite Key? Primary key made of two or more columns.
26. Difference between PRIMARY KEY and UNIQUE KEY. PRIMARY KEY implies NOT NULL and only one per table; UNIQUE allows NULLs (in most DBMS) and multiples.
27. What is Normalization? Process to reduce redundancy: 1NF (atomic values), 2NF (no partial dependency), 3NF (no transitive dependency), BCNF, etc.
28. What is Denormalization? Intentionally adding redundancy for read performance (e.g., reporting systems).
29. What is Data Integrity? Accuracy, consistency, and reliability of data over its lifecycle.
30. What are OLTP vs OLAP? OLTP: Transaction processing (many short queries, normalized). OLAP: Analytical processing (complex aggregates, often denormalized/data warehouse).
2. Joins & Set Operations (31-55)
31. What are SQL Joins? Combine rows from two+ tables based on related columns.
32-36. Types of Joins & Differences:
- INNER JOIN: Matching rows only.
- LEFT (OUTER) JOIN: All from left + matches from right (NULLs for non-matches).
- RIGHT JOIN: Mirror of LEFT.
- FULL OUTER JOIN: All from both (NULLs where no match).
- CROSS JOIN: Cartesian product.
- SELF JOIN: Join table to itself (e.g., employee-manager).
37. What is a Cross Join? Cartesian product of two tables.
38. What is an Anti-Join? Rows in one table without matches in another (LEFT JOIN … WHERE right.key IS NULL or NOT EXISTS).
39. Difference between UNION and UNION ALL. UNION removes duplicates (slower); UNION ALL keeps them (faster).
40-42. INTERSECT (common rows), EXCEPT/MINUS (rows in first not in second).
43-55. Practical join questions: Write queries for employee-department, sales by category, etc.
3. Subqueries, CTEs & Advanced Querying (56-90)
56. What is a Subquery? Query nested inside another. Types: scalar, row, table, correlated.
57. Correlated vs Non-correlated Subquery. Correlated references outer query (executed per row, slower); non-correlated is independent.
58. What is a CTE (Common Table Expression)? Temporary named result set using WITH; improves readability, supports recursion.
59. Difference between Subquery and CTE. CTEs are more readable, reusable in the same query, and support recursion.
60. Write a query for second highest salary. SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees); or using RANK()/DENSE_RANK().
61-70. Questions on finding nth highest, duplicates, employees above dept average, etc.
71. What is EXISTS vs IN? EXISTS checks for row existence (good for large sets); IN compares values. Prefer EXISTS with NOT for anti-joins.
72-80. Window functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), SUM() OVER(), etc.
81. Running Total Example: SUM(amount) OVER (PARTITION BY product ORDER BY date ROWS UNBOUNDED PRECEDING)
82-90. Recursive CTEs for hierarchies (org charts), NTILE, PERCENT_RANK, etc.
4. Aggregates, Grouping & Functions (91-120)
91. What is GROUP BY? Groups rows with same values for aggregates.
92. Difference between WHERE and HAVING. WHERE filters before grouping; HAVING after (for aggregates).
93-100. Aggregate functions: COUNT, SUM, AVG, MIN, MAX. Handling NULLs.
101-110. String functions: CONCAT, SUBSTRING, LIKE/ILIKE, REGEXP. Date functions: DATEADD, DATEDIFF, EXTRACT. Numeric: ROUND, CEIL, FLOOR.
111. Pattern Matching: LIKE with % and _ .
112-120. CASE statements, COALESCE, NULLIF.
5. Indexes, Performance & Optimization (121-150)
121. What is an Index? Data structure for faster lookups (like book index).
122. Clustered vs Non-Clustered Index. Clustered: Dictates physical row order (one per table). Non-clustered: Separate structure (multiple allowed).
123. How to optimize a slow query? Use EXPLAIN/execution plan, add indexes on WHERE/JOIN/ORDER BY, avoid SELECT *, filter early, update stats, etc.
124-130. Covering indexes, composite indexes, when indexes hurt (writes), full table scans.
131. Difference between DELETE, TRUNCATE, DROP. DELETE (DML, logged, WHERE possible); TRUNCATE (DDL, fast, no WHERE); DROP (removes object).
132-140. Partitioning, materialized views, query rewriting, avoiding functions on columns.
141-150. Performance tuning: statistics, parallelism, connection pooling, schema design impact.
6. Transactions, Security & Advanced Concepts (151-175+)
151. What is a Transaction? Unit of work following ACID (Atomicity, Consistency, Isolation, Durability). Commands: COMMIT, ROLLBACK.
152. What are Locks & Isolation Levels? Prevent concurrency issues (READ UNCOMMITTED to SERIALIZABLE).
153. What is a View? Virtual table from a query (simplifies, security). Materialized views store data.
154. What is a Stored Procedure? Precompiled SQL code for reuse (performance, security).
155. What is a Trigger? Auto-executes on events (INSERT/UPDATE/DELETE).
156. What is a Cursor? Iterates row-by-row (avoid in set-based SQL when possible).
157-165. User-defined functions (scalar/table-valued), collation, SQL injection prevention (parameterized queries).
166. Database design for e-commerce, time-series, etc.
167-175+. Advanced: Recursive queries, pivoting/unpivoting, JSON handling, full-text search, sharding, replication, backup strategies, NoSQL vs SQL trade-offs.
Additional 10+ Scenario Questions:
- Remove duplicates.
- Find gaps in sequences.
- Cumulative sums.
- Top N per group.
- Handling NULLs in joins/aggregates.
- Query for consecutive days/ streaks.
- Dynamic pivoting.
- Slowly Changing Dimensions (SCD) in warehouses.
- Indexing strategy for high-write tables.
- Query optimization case studies.
Tips for Interviews:
- Practice on platforms like LeetCode, HackerRank, StrataScratch.
- Understand execution plans.
- Know your DBMS specifics (PostgreSQL window functions, SQL Server specifics, etc.).
- Explain trade-offs (e.g., normalization vs performance).
- Write clean, readable code with CTEs.
This list exceeds 175 when including variations and practical coding problems. Focus on understanding why rather than memorizing syntax. For hands-on practice, use sample schemas (employees, departments, sales, orders). Good luck!
More Set of Questions
This is a comprehensive guide to 175+ SQL interview questions, categorized by difficulty and topic, with detailed answers. Focus on highest-priority (frequency in real interviews: FAANG, product-based, and data roles).
PART 1: BASIC SQL (Warm-up & Screening) — 40 Questions
1. What is SQL?
Answer: Structured Query Language – used to communicate with relational databases (RDBMS).
2. Difference between SQL and MySQL?
| SQL | MySQL |
|---|---|
| Language | RDBMS software |
| Standardized | Specific implementation |
3. What are the 5 sublanguages of SQL?
- DDL (Data Definition) – CREATE, ALTER, DROP, TRUNCATE
- DML (Data Manipulation) – SELECT, INSERT, UPDATE, DELETE
- DCL (Data Control) – GRANT, REVOKE
- TCL (Transaction Control) – COMMIT, ROLLBACK, SAVEPOINT
- DQL (Data Query) – SELECT
4. Difference between DELETE and TRUNCATE
| DELETE | TRUNCATE |
|---|---|
| DML | DDL |
| WHERE clause allowed | No WHERE |
| slower (row-by-row) | faster (deallocates pages) |
| can rollback | can rollback in some DBs (but not all) |
| triggers fire | triggers don’t fire |
5. Difference between DROP and TRUNCATE
DROP removes table structure; TRUNCATE removes data only.
6. What is a Primary Key?
- Unique, not null, only one per table, can be composite.
7. What is a Foreign Key?
- Links two tables; enforces referential integrity.
8. What is a Composite Key?
- Primary key using two or more columns.
9. Difference between UNIQUE and PRIMARY KEY
| UNIQUE | PRIMARY KEY |
|---|---|
| One NULL allowed | No NULL |
| Multiple per table | One per table |
10. What is a NULL value?
- Not zero, not blank – unknown or missing.
11. Difference between WHERE and HAVING
| WHERE | HAVING |
|---|---|
| Before GROUP BY | After GROUP BY |
| Cannot use aggregate | Can use aggregate |
12. What are aggregate functions?
COUNT,SUM,AVG,MIN,MAX
13. What is GROUP BY?
- Groups rows with same values; used with aggregates.
14. What is ORDER BY?
- Sorts result –
ASC(default) orDESC.
15. Difference between INNER JOIN and OUTER JOIN
Inner – matching rows only
Outer – matching + non-matching (LEFT/RIGHT/FULL)
16. What is LEFT JOIN?
- All rows from left table + matched from right; NULL if no match.
17. What is SELF JOIN?
- Joining a table with itself using aliases.
18. What is a CROSS JOIN?
- Cartesian product (every row × every row).
19. What is a UNION vs UNION ALL?
UNIONremoves duplicates;UNION ALLincludes all.
20. What is a VIEW?
- Virtual table (stored query). Not physically stored.
21. Can you update a view?
- Yes, if it’s based on a single table and doesn’t use aggregates.
22. What is an index?
- Improves SELECT speed; slows INSERT/UPDATE/DELETE.
23. Clustered vs Non-clustered index
- Clustered – determines physical order (only one).
- Non-clustered – separate structure with pointers.
24. What is a JOIN?
- Combines columns from two+ tables based on related column.
25. What is DISTINCT?
- Removes duplicate rows in result.
26. What are wildcards in LIKE?
%– any number of chars_– exactly one char
27. What is IN operator?
- Shorthand for multiple OR conditions.
28. What is BETWEEN?
- Inclusive range (numeric, text, dates).
29. What is CASE statement?
- Conditional logic in SQL (like if-else).
Example:
sql
SELECT name, CASE WHEN salary > 100000 THEN 'High' ELSE 'Low' END FROM employees;
30. What is COALESCE?
- Returns first non-NULL value.
31. What is NULLIF?
- Returns NULL if two expressions equal, else first expression.
32. Difference between CHAR and VARCHAR
| CHAR(n) | VARCHAR(n) |
|---|---|
| Fixed length | Variable length |
| Faster for fixed-size | Saves space |
33. What is a TEMPORARY TABLE?
- Exists only for session; auto-dropped.
34. What is a STORED PROCEDURE?
- Precompiled SQL code; can have input/output parameters.
35. What is a FUNCTION vs PROCEDURE?
| Function | Procedure |
|---|---|
| Returns one value | May return none or multiple |
| Cannot have output params | Can have |
| Called in SELECT | Called with EXEC/CALL |
36. What is AUTO_INCREMENT / IDENTITY?
- Automatically generates sequential numeric values.
37. What is a transaction?
- Set of SQL statements executed as a unit (ACID).
38. What are ACID properties?
- Atomicity – all or nothing
- Consistency – valid state before/after
- Isolation – concurrent transactions don’t interfere
- Durability – committed changes persist
39. What are SQL constraints?
- NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, CHECK, DEFAULT.
40. Difference between HAVING and WHERE (detailed)
WHERE filters rows before aggregation; HAVING filters after.
PART 2: INTERMEDIATE SQL (Data Rounds) — 50 Questions
41. Write a query to find duplicate rows in a table.
sql
SELECT col1, col2, COUNT(*) FROM table GROUP BY col1, col2 HAVING COUNT(*) > 1;
42. Delete duplicate rows keeping one occurrence.
sql
DELETE FROM table WHERE id NOT IN ( SELECT MIN(id) FROM table GROUP BY col1, col2 );
43. Find second highest salary.
sql
SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees);
Or using OFFSET:
sql
SELECT DISTINCT salary FROM employees ORDER BY salary DESC LIMIT 1 OFFSET 1;
44. Find nth highest salary.
sql
SELECT DISTINCT salary FROM employees e1 WHERE N-1 = ( SELECT COUNT(DISTINCT salary) FROM employees e2 WHERE e2.salary > e1.salary );
45. Find employees earning more than their manager.
sql
SELECT e.name FROM employees e JOIN employees m ON e.manager_id = m.id WHERE e.salary > m.salary;
46. Running total using window function.
sql
SELECT date, amount, SUM(amount) OVER (ORDER BY date) AS running_total FROM sales;
47. Moving average (3-day).
sql
SELECT date, amount, AVG(amount) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) FROM sales;
48. Rank employees by salary per department.
sql
SELECT name, dept_id, salary, RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rnk FROM employees;
Difference: RANK (gaps), DENSE_RANK (no gaps), ROW_NUMBER (unique).
49. Lead/Lag – previous & next salary.
sql
SELECT name, salary, LAG(salary, 1) OVER (ORDER BY hire_date) AS prev_salary, LEAD(salary, 1) OVER (ORDER BY hire_date) AS next_salary FROM employees;
50. Find employees hired in last 30 days.
sql
SELECT * FROM employees WHERE hire_date >= CURRENT_DATE - INTERVAL '30 days';
51. Pivot table (row to column).
sql
SELECT product_id, SUM(CASE WHEN month='Jan' THEN sales END) AS Jan_sales, SUM(CASE WHEN month='Feb' THEN sales END) AS Feb_sales FROM sales_data GROUP BY product_id;
52. Unpivot using UNION ALL.
sql
SELECT product_id, 'Jan' AS month, Jan_sales AS sales FROM pivot_table UNION ALL SELECT product_id, 'Feb', Feb_sales FROM pivot_table;
53. Recursive CTE to get hierarchy (org chart).
sql
WITH RECURSIVE org AS ( SELECT id, name, manager_id, 1 AS level FROM employees WHERE manager_id IS NULL UNION ALL SELECT e.id, e.name, e.manager_id, o.level+1 FROM employees e JOIN org o ON e.manager_id = o.id ) SELECT * FROM org;
54. Find missing IDs in sequence.
sql
SELECT t1.id + 1 AS missing_start FROM table t1 LEFT JOIN table t2 ON t1.id + 1 = t2.id WHERE t2.id IS NULL;
55. String concatenation per group.
- MySQL:
GROUP_CONCAT(name) - PostgreSQL:
STRING_AGG(name, ',') - SQL Server:
STRING_AGG(name, ',')
56. Update using join.
sql
UPDATE orders SET status = 'Shipped' FROM shipments WHERE orders.id = shipments.order_id;
57. Delete using join (MySQL/PostgreSQL).
sql
DELETE o FROM orders o JOIN old_orders oo ON o.id = oo.id;
58. Merge / Upsert (INSERT OR UPDATE).
- MySQL:
INSERT ... ON DUPLICATE KEY UPDATE - PostgreSQL:
INSERT ... ON CONFLICT DO UPDATE
59. Find first and last order per customer.
sql
SELECT DISTINCT customer_id, FIRST_VALUE(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS first, LAST_VALUE(order_date) OVER (PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last FROM orders;
60. Difference between ROW_NUMBER(), RANK(), DENSE_RANK().
Example:
| Salary | ROW_NUMBER | RANK | DENSE_RANK |
|---|---|---|---|
| 100 | 1 | 1 | 1 |
| 100 | 2 | 1 | 1 |
| 90 | 3 | 3 | 2 |
61. What is a CTE (Common Table Expression)?
- Temporary named result set; improves readability.
62. CTE vs Subquery.
| CTE | Subquery |
|---|---|
| Reusable in same query | Not reusable |
| Recursive possible | No recursion |
| Readable for complex queries | Can be messy |
63. What is a Window Function?
- Performs calculation across rows related to current row without collapsing them.
64. Difference between WHERE and QUALIFY (Snowflake etc.)
QUALIFYfilters after window functions;WHEREcannot.
65. Find median salary.
sql
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) FROM employees;
Or manual:
sql
SELECT AVG(salary) FROM (
SELECT salary, ROW_NUMBER() OVER (ORDER BY salary) AS rn,
COUNT(*) OVER () AS total
FROM employees
) t WHERE rn IN (FLOOR((total+1)/2), CEIL((total+1)/2));66. Mode (most frequent salary).
sql
SELECT salary FROM employees GROUP BY salary ORDER BY COUNT(*) DESC LIMIT 1;
67. What is a covering index?
- Index that includes all columns needed for query (no table access).
68. Composite index order importance?
- Leftmost prefix rule: index on (A,B,C) supports A, A+B, A+B+C, but not B or C alone.
69. Write query for “employees who have same salary”.
sql
SELECT e1.name, e2.name, e1.salary FROM employees e1 JOIN employees e2 ON e1.salary = e2.salary AND e1.id < e2.id;
70. What is EXISTS vs IN?
EXISTSstops on first match (faster for large subqueries).INevaluates all.
71. What is ANY / ALL?
sql
SELECT name FROM employees WHERE salary > ANY (SELECT salary FROM managers);
72. What is INTERSECT?
- Returns common rows from two queries (removes duplicates).
73. What is EXCEPT / MINUS?
- Rows in first query not in second.
74. Case when multiple conditions.
sql
CASE WHEN score >= 90 THEN 'A'
WHEN score >= 80 THEN 'B'
ELSE 'C'
END75. What is ISNULL() vs NULLIF().
ISNULL(col,0)– replace NULL with 0.NULLIF(a,b)– returns NULL if a=b.
76. Generate date series.
- PostgreSQL:
generate_series('2024-01-01', '2024-12-31', '1 day'::interval)
77. Split comma-separated string into rows.
sql
SELECT unnest(string_to_array('a,b,c', ',')); -- PostgreSQL78. Convert rows to JSON.
- PostgreSQL:
json_agg(),row_to_json()
79. Explain query plan.
EXPLAIN– shows execution strategy;EXPLAIN ANALYZE– runs + shows actual.
80. What is a bitmap index?
- Used in data warehouses; efficient for low-cardinality columns.
81. What is table partitioning?
- Splits large table into smaller physical pieces (range, list, hash).
82. What is sharding?
- Horizontal partitioning across different servers.
83. What is materialized view?
- Physical copy of query result (refreshed manually/scheduled).
84. What is a database deadlock?
- Two transactions each waiting for the other to release a lock.
85. What is isolation level?
- Read Uncommitted, Read Committed, Repeatable Read, Serializable.
86. What is a dirty read?
- Reading uncommitted changes from another transaction.
87. What is a non-repeatable read?
- Same query returns different results due to updates.
88. What is a phantom read?
- New rows appear matching WHERE condition.
89. What is WITH (NOLOCK)? (SQL Server)
- Allows dirty reads (less blocking).
90. What is an execution plan operator (e.g., nested loop, hash join)?
- Nested loop – small tables, index present.
- Hash join – large, unsorted tables.
- Merge join – sorted inputs.
PART 3: ADVANCED SQL (Hard/Tricky) — 45 Questions
91. Find gaps and islands (consecutive days).
Islands example:
sql
WITH grp AS (
SELECT order_date,
order_date - ROW_NUMBER() OVER (ORDER BY order_date)::int AS grp
FROM orders
)
SELECT MIN(order_date), MAX(order_date), COUNT(*)
FROM grp
GROUP BY grp;92. Sessionization (user sessions from clicks).
sql
WITH session_start AS (
SELECT user_id, click_time,
CASE WHEN click_time - LAG(click_time) OVER (PARTITION BY user_id ORDER BY click_time) > interval '30 minutes'
THEN 1 ELSE 0 END AS new_session
FROM clicks
)
SELECT user_id, session_id, MIN(click_time), MAX(click_time)
FROM (
SELECT *, SUM(new_session) OVER (PARTITION BY user_id ORDER BY click_time) AS session_id
FROM session_start
) s
GROUP BY user_id, session_id;93. Rolling retention (cohort analysis).
sql
WITH cohorts AS (
SELECT user_id, MIN(order_date) AS first_order
FROM orders GROUP BY user_id
)
SELECT EXTRACT(MONTH FROM c.first_order) AS cohort,
EXTRACT(MONTH FROM o.order_date) - EXTRACT(MONTH FROM c.first_order) AS month_offset,
COUNT(DISTINCT o.user_id) / MAX(COUNT(DISTINCT o.user_id)) OVER (PARTITION BY c.first_order) AS retention
FROM orders o
JOIN cohorts c ON o.user_id = c.user_id
GROUP BY 1,2;94. Self-join to find overlapping date ranges.
sql
SELECT a.emp_id, a.start_date, a.end_date, b.start_date, b.end_date FROM assignments a JOIN assignments b ON a.emp_id = b.emp_id WHERE a.id < b.id AND a.start_date <= b.end_date AND a.end_date >= b.start_date;
95. Recursive CTE to generate Fibonacci.
sql
WITH RECURSIVE fib(a,b,n) AS ( SELECT 0::bigint,1::bigint,1 UNION ALL SELECT b, a+b, n+1 FROM fib WHERE n < 20 ) SELECT a FROM fib;
96. What is a Window Frame?
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWRANGE BETWEEN– based on value, not row position.
97. Rank with ties skipping next rank?
Use RANK().
98. Cumulative distribution CUME_DIST().
- Relative rank (0 to 1).
99. Percent rank PERCENT_RANK().
(RANK - 1) / (total rows - 1)
100. Conditional aggregation with filter clause (PostgreSQL).
sql
SELECT dept,
COUNT(*) FILTER (WHERE salary > 50000) AS high_earners
FROM employees
GROUP BY dept;101. String manipulation – extract domain from email.
sql
SELECT SUBSTRING(email FROM '@(.*)$') FROM users;
102. Split name into first/last.
sql
SELECT SPLIT_PART(full_name, ' ', 1) AS first,
SPLIT_PART(full_name, ' ', 2) AS last
FROM names;103. Parsing JSON in SQL.
- PostgreSQL:
data->>'key',jsonb_array_elements()
104. What is a partial index?
sql
CREATE INDEX active_users ON users(id) WHERE status = 'active';
105. Function vs stored procedure performance.
Functions are cached; procedures can manage transactions.
106. Dynamic SQL in stored procedure.
sql
EXECUTE IMMEDIATE 'SELECT * FROM ' || table_name;
107. Temporary table vs CTE performance.
Temp table can have indexes; CTE is in-memory (not always).
108. Database normalization up to 3NF.
- 1NF: atomic values
- 2NF: no partial dependency on composite key
- 3NF: no transitive dependency
109. Denormalization use cases.
Reporting, data warehouses, frequent joins.
110. What is a FULL OUTER JOIN example?
sql
SELECT * FROM customers c FULL OUTER JOIN orders o ON c.id = o.cust_id; -- includes customers without orders + orders without customers
111. What is a NATURAL JOIN?
Auto-joins on same column names (dangerous, avoid).
112. What is USING clause in join?
sql
SELECT * FROM employees JOIN departments USING (dept_id);
113. What are window function framing options?
ROWS, RANGE, GROUPS (SQL:2016).
114. What is IGNORE NULLS in LEAD/LAG?
Some DBs support: LAG(salary IGNORE NULLS)
115. What is SELECT FOR UPDATE?
Locks selected rows for update.
116. What is a SAVEPOINT?
sql
BEGIN; SAVEPOINT sp1; DELETE FROM orders; ROLLBACK TO sp1; COMMIT;
117. What is READ COMMITTED SNAPSHOT? (SQL Server)
Row-versioning to avoid blocking.
118. What is pg_stat_activity?
PostgreSQL view for current queries.
119. What is an execution time vs parse time?
Parse time builds plan; execution time runs it.
120. What is query optimization heuristic?
Rule-based vs cost-based optimizer.
121. What is a statistics histogram?
Used by optimizer to estimate row counts.
122. What is a cross-database query?
sql
SELECT * FROM db1.schema.table AS t1 JOIN db2.schema.table AS t2 ...
123. Export query results to CSV (PostgreSQL).
sql
\copy (SELECT * FROM users) TO 'users.csv' CSV HEADER;
124. What is pg_dump / mysqldump?
Backup utilities.
125. What is CDC (Change Data Capture)?
Capturing INSERT/UPDATE/DELETE for replication/analytics.
126. What is Slowly Changing Dimension (SCD) Type 2?
Track history with start/end date and current flag.
127. Write SCD Type 2 merge.
sql
UPDATE dim SET end_date = CURRENT_DATE, is_current = FALSE FROM stage WHERE dim.id = stage.id AND dim.is_current = TRUE AND dim.data <> stage.data; INSERT INTO dim (id, data, start_date, end_date, is_current) SELECT id, data, CURRENT_DATE, '9999-12-31', TRUE FROM stage ON CONFLICT ...;
128. What is a surrogate key vs natural key?
- Surrogate: artificial (auto-increment)
- Natural: business column (SSN)
129. What is an UUID vs serial?
UUID – globally unique, larger storage, slower.
130. What is a covering index with INCLUDE?
sql
CREATE INDEX idx ON table (col1) INCLUDE (col2, col3);
131. What is index skip scan?
Used if leading column has low cardinality but not in WHERE.
132. Write a query for “customers who bought all products”.
sql
SELECT customer_id FROM orders GROUP BY customer_id HAVING COUNT(DISTINCT product_id) = (SELECT COUNT(*) FROM products);
133. Write a query for “top 3 products per category”.
sql
WITH ranked AS (
SELECT product, category, sales,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) AS rn
FROM products
)
SELECT * FROM ranked WHERE rn <= 3;134. Write query for “cumulative sum reset on negative”.
Requires recursion or custom aggregation.
135. Write query to find “students who have same marks in all subjects”.
sql
SELECT student_id FROM scores GROUP BY student_id HAVING MIN(marks) = MAX(marks);
136. Write query to find “employees with no manager”.
sql
SELECT * FROM employees WHERE manager_id IS NULL;
137. Difference between CROSS APPLY and JOIN (SQL Server).
CROSS APPLY – like join with a correlated subquery; can call table-valued function.
138. What is OUTER APPLY?
Like LEFT JOIN for table-valued functions.
139. What is PIVOT in SQL Server?
sql
SELECT * FROM sales PIVOT (SUM(amount) FOR month IN ([Jan],[Feb])) AS p;
140. What is UNPIVOT?
sql
SELECT product, month, sales FROM pivot_table UNPIVOT (sales FOR month IN (Jan, Feb)) AS u;
141. What are database hints?
Force specific join type or index usage (avoid unless expert).
142. What is OPTION (RECOMPILE)?
Forces recompilation (parameter sniffing fix).
143. What is an OUTPUT clause? (SQL Server)
Returns affected rows.
144. What is a MERGE statement?
Upsert in single statement (INSERT/UPDATE/DELETE).
145. What are sequences?
Generate numbers independent of table (vs identity).
146. What is NEXTVAL and CURRVAL?
Sequence functions.
147. What is SYSDATE vs CURRENT_DATE?
SYSDATE– includes time (Oracle)CURRENT_DATE– ANSI SQL standard
148. What is an INTERVAL data type?
Stores duration (e.g., INTERVAL '1 day')
149. What is a common pitfall of OR in WHERE?
Prevents index usage; use UNION or IN instead.
150. Write query to “swap gender values”.
sql
UPDATE users SET gender = CASE WHEN gender='M' THEN 'F' ELSE 'M' END;
151. Write query to “remove spaces from column”.
sql
UPDATE table SET col = REPLACE(col, ' ', '');
152. What is REGEXP_SUBSTR?
Extract pattern (e.g., phone number).
153. What is TRANSLATE vs REPLACE?
TRANSLATE – char-by-char mapping; REPLACE – substring.
154. What is SOUNDEX?
Phonetic matching.
155. Difference between IN and EXISTS with example.
sql
-- IN (evaluates all) SELECT name FROM cust WHERE id IN (SELECT cust_id FROM orders) -- EXISTS (short-circuits) SELECT name FROM cust WHERE EXISTS (SELECT 1 FROM orders WHERE cust_id = cust.id)
156. What is GREATEST / LEAST?
Returns max/min of list (not aggregate).
157. What is WITH CHECK OPTION in views?
Prevents updates that would exclude row from view.
158. What is an updatable view?
Single table, no aggregates, no DISTINCT, no GROUP BY.
159. What is a GENERATED ALWAYS column?
Computed column (virtual or stored).
160. What is ON DELETE CASCADE?
Deletes child rows when parent deleted.
161. What is ON DELETE SET NULL?
Sets foreign key to NULL.
162. What is deferrable constraint?
Checked at commit, not immediately.
163. What is a CHECK constraint example?
sql
ALTER TABLE employees ADD CONSTRAINT check_salary CHECK (salary > 0);
164. What is a DEFAULT constraint?
sql
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
165. What is an exclusion constraint? (PostgreSQL)
Prevents overlapping ranges.
166. What is a PARTITION BY for ROW_NUMBER()?
Groups before numbering.
167. Difference between LAG and LEAD.
LAG – previous row; LEAD – next row.
168. What is a RANGE window frame?
Based on value range, not row count.
169. What is FILTER clause for aggregates? (PostgreSQL)
Conditional aggregation.
170. What is WITH ORDINALITY?
Adds row number to set-returning functions.
171. What is a lateral join? (PostgreSQL)
Allows subquery to reference previous from clause.
172. What is jsonb vs json (PostgreSQL)?
jsonb – binary, faster indexing, no duplicate keys.
173. What is an ARRAY type in SQL?
PostgreSQL: INT[], ARRAY_AGG.
174. What is unnest()?
Expands array to rows.
175. What is a polymorphic table function?
Returns different schema based on parameters (advanced).
PART 4: SCENARIO-BASED HIGHEST PRIORITY — 10 Questions
176. Design a schema for Instagram.
Answer: Users, Posts, Followers, Likes, Comments, Stories, Messages.
177. Write query to find followers of followers (recommendations).
sql
SELECT DISTINCT f2.follower_id
FROM follows f1
JOIN follows f2 ON f1.follower_id = f2.followee_id
WHERE f1.followee_id = ?
AND f2.follower_id NOT IN (
SELECT follower_id FROM follows WHERE followee_id = ?
);178. Optimize a slow query.
Steps: EXPLAIN ANALYZE, missing index, wrong join order, statistics outdated, query rewrite.
179. Find “active users” who posted at least 1 post each week for 4 weeks.
sql
WITH weekly_posts AS (
SELECT user_id, DATE_TRUNC('week', post_date) AS week,
COUNT(*) AS posts
FROM posts
WHERE post_date >= CURRENT_DATE - INTERVAL '4 weeks'
GROUP BY user_id, week
HAVING COUNT(*) >= 1
)
SELECT user_id FROM weekly_posts
GROUP BY user_id
HAVING COUNT(DISTINCT week) = 4;180. Handle billion-row table with no index.
Use partitioning, summary tables, map-reduce, or columnar storage (Parquet, Redshift).
Final Tips for SQL Interview
- Write readable SQL – use CTEs, indentation, aliases.
- Know window functions thoroughly – 90% of advanced questions.
- Practice on real platforms – LeetCode (Database section), HackerRank, StrataScratch.
- Know your DBMS quirks – MySQL, PostgreSQL, SQL Server differ.
- Ask clarifying questions – joins, NULLs, duplicates handling.


