Microsoft Fabric is a unified, end-to-end SaaS analytics platform from Microsoft that integrates data integration, engineering, warehousing, science, real-time analytics, and business intelligence (Power BI) on a shared foundation called OneLake. It eliminates data silos, reduces tool fragmentation, and enables seamless collaboration across roles.
Core Concepts and Architecture
Q: What is Microsoft Fabric? A: Microsoft Fabric is a SaaS analytics platform unifying data workloads (ingestion, transformation, analytics, ML, real-time, and reporting) in one environment. It uses OneLake as a single logical data lake (built on ADLS Gen2) for zero-copy data sharing across workloads, with shared governance, security, and compute. Key workloads include Data Factory, Data Engineering (Lakehouse), Data Warehouse, Data Science, Real-Time Intelligence, Power BI, and Databases.
Q: What is OneLake? How does it differ from Azure Data Lake Gen2 (ADLS Gen2)? A: OneLake is Fabric’s tenant-wide unified data lake, acting as “OneDrive for data.” It provides a single namespace, automatic provisioning, and seamless integration across all Fabric items without manual setup. Data is stored in open formats like Delta Parquet. Unlike ADLS Gen2, OneLake is SaaS-managed (no need to handle storage accounts, RBAC at the storage level, or infrastructure), supports shortcuts for zero-copy external data access, and enforces unified governance/metadata.
Q: Explain Lakehouse vs. Warehouse in Fabric. A:
- Lakehouse: Combines data lake flexibility (files, unstructured/semi-structured data, Spark processing) with warehouse reliability (ACID via Delta Lake, SQL endpoint). Ideal for data engineering/science with notebooks, Spark jobs, and open formats.
- Warehouse: SQL-first, high-performance for structured analytics (T-SQL). Separates compute/storage, optimized for BI/warehouse workloads. Both sit on OneLake and support Direct Lake.
Q: What is Direct Lake mode? A: A Power BI semantic model storage mode (Fabric-only) that queries Delta tables in OneLake directly with import-like performance (in-memory columnar) but near-real-time data without full imports or frequent refreshes. It transcodes Parquet on-the-fly and uses V-Order for optimization. Great for large datasets; supports Direct Lake on OneLake (multi-item) or SQL endpoints.
Q: What are Shortcuts in OneLake? A: Virtual references to data in other Fabric items, ADLS Gen2, S3, etc., without copying. They enable zero-copy analytics, reduce costs/latency, and support governance policies from the source.
Q: Explain the Medallion Architecture in Fabric. A: A layered data organization:
- Bronze (Raw): Ingested data as-is.
- Silver (Cleansed/Enriched): Validated, deduplicated, conformed.
- Gold (Business-ready): Aggregated, modeled for consumption (e.g., semantic models). Implemented via Lakehouse with Delta tables, pipelines/dataflows for ETL, and Spark/SQL for transformations.
Workloads and Components
Q: What are the main workloads in Microsoft Fabric? A: Data Factory (ingestion/orchestration), Data Engineering (Spark/Lakehouse), Data Warehouse (SQL), Data Science (ML/notebooks), Real-Time Intelligence (KQL/eventstreams), Power BI (visualization), Databases (SQL/Cosmos mirroring). All share OneLake.
Q: What are Pipelines and Dataflows (Gen2) in Fabric? A:
- Pipelines (Data Factory): Orchestration workflows (like Azure Data Factory) for scheduling, copying, and transforming data. Support low-code and activities like notebooks.
- Dataflows Gen2: Low-code Power Query-based ETL for reusable transformations, landing data in Lakehouse/Warehouse as Delta tables. Faster and more scalable than Gen1.
Q: What notebooks and languages are supported in Fabric? A: Interactive environments for code-first development (Data Engineering/Science). Support PySpark, Spark SQL, Scala, and others. Integrate with Lakehouse for data exploration and ML.
Q: Explain Real-Time Intelligence in Fabric. A: Handles streaming/event data with low latency using Eventstreams, KQL (Kusto Query Language) for querying logs/time-series, and integration with Power BI for live dashboards. Includes Real-Time Hub for cataloging streams.
Security, Governance, and Administration
Q: How does security and governance work in Fabric? A: Built-in Microsoft Purview for lineage, sensitivity labels, auditing. RBAC (roles at workspace/item level), encryption, tenant settings, domains for logical organization, and cross-tenant sharing with policy enforcement. Unified metadata layer.
Q: What is the pricing model? A: Capacity-based (F SKUs or Pay-As-You-Go). Shared compute across workloads; billed on usage (CU-seconds). Includes trials and reserved capacities.
Q: Explain Workspaces, Capacities, Domains, and Tenants. A:
- Tenant: Org-level Fabric environment.
- Capacity: Compute resource (F SKUs) assigned to workspaces.
- Workspace: Collaboration boundary (like folders) containing items.
- Domains: Logical grouping of workspaces for governance (e.g., by department).
Performance, Optimization, and Advanced Topics
Q: How do you optimize performance in Fabric (Spark jobs, queries, reports)? A: Use partitioning, OPTIMIZE/VACUUM on Delta tables, caching, proper file sizes, V-Order, Direct Lake, aggregate tables, and capacity scaling. Monitor with Fabric tools. For Spark: tune executors, avoid skew.
Q: How does Fabric handle schema drift/evolution and incremental loads? A: Delta Lake supports schema evolution/merges. Use watermarks, CDC, or pipeline conditions for incrementals. Dataflows/pipelines handle drift with dynamic mapping.
Q: How does Fabric support CI/CD and collaboration? A: Git integration for version control (notebooks, pipelines, etc.), deployment pipelines for promoting across environments, and workspace sharing.
Q: What is the difference between Fabric and Azure Synapse? A: Fabric is a unified SaaS platform with OneLake and shared everything; Synapse is more PaaS with separate services requiring integration. Fabric simplifies management and reduces duplication.
Scenario-Based and Practical Questions
Q: How would you design a real-time sales dashboard in Fabric? A: Ingest via Eventstreams/Pipelines → Process in Lakehouse/Eventhouse with KQL → Model in semantic layer → Visualize in Power BI with Direct Lake for near-real-time updates. Use shortcuts if needed.
Q: How do you migrate from Power BI or Synapse to Fabric? A: Use shortcuts/mirroring for data, convert datasets to Lakehouse/Warehouse, leverage Direct Lake, and use deployment pipelines. Start with OneLake as the foundation.
Q: Troubleshoot a failed pipeline or slow Spark job. A: Check activity logs, dependencies, capacity, data skew, and errors. For Spark: review Spark UI for stages, optimize code (broadcast joins, etc.). Use Lakehouse maintenance activities (OPTIMIZE/VACUUM).
Other Common Questions
- AI/ML in Fabric: Notebooks, integration with Azure ML/Foundry, Copilot for code/query assistance, predictive modeling with data in OneLake.
- File formats: Primarily Delta (Parquet), plus CSV, JSON, Avro.
- Certifications: DP-600 (Fabric Analytics Engineer) is key.
- Limitations: Cloud-only (with gateway for on-prem), licensing for full features, learning curve for new concepts.
Preparation Tips: Focus on hands-on practice in a Fabric trial (lakehouses, pipelines, Direct Lake reports, Spark optimization). Understand end-to-end scenarios, governance, and how workloads interoperate. Review Microsoft Learn paths and DP-600 topics.
This covers the highest-priority areas for interviews (fundamentals to advanced/scenario). Tailor depth based on the role (e.g., more engineering for Data Engineer positions). For the absolute latest, check official Microsoft Fabric documentation. Good luck!
Microsoft Fabric has become one of the most important skills for Data Engineers, Analytics Engineers, BI Developers, Data Architects, and AI/Data Science professionals. Many companies are migrating from traditional Azure Data Factory, Synapse Analytics, Databricks, and Power BI architectures into Microsoft Fabric because it provides a unified SaaS analytics platform.
1. What is Microsoft Fabric?
Answer
Microsoft Fabric is an end-to-end unified analytics platform that combines:
- Data Engineering
- Data Factory
- Data Science
- Data Warehouse
- Real-Time Analytics
- Power BI
- Data Activator
into a single SaaS platform.
Fabric is built on OneLake and eliminates the need to manage multiple services separately.
Interview Answer
“Microsoft Fabric is Microsoft’s unified analytics platform that integrates data ingestion, storage, transformation, analytics, AI, and visualization into a single SaaS solution. It uses OneLake as a centralized data lake and supports Data Engineering, Data Factory, Data Science, Data Warehousing, Real-Time Analytics, and Power BI.”
2. Why was Microsoft Fabric introduced?
Answer
Problems with traditional architecture:
- Separate services
- Data duplication
- Complex integrations
- Multiple security models
- High operational overhead
Fabric solves these by:
- Single storage layer
- Unified governance
- Shared security model
- Reduced ETL movement
- Integrated AI capabilities
3. What are the core components of Microsoft Fabric?
Answer
- Data Factory
- Data Engineering
- Data Warehouse
- Data Science
- Real-Time Analytics
- Power BI
- Data Activator
- OneLake
4. What is OneLake?
Answer
OneLake is the central storage layer of Microsoft Fabric.
Similar to:
- AWS S3
- Azure Data Lake Storage Gen2
Features:
- Single copy of data
- Organization-wide lake
- Open format storage
- Delta Lake support
Interview Answer
“OneLake acts as the unified data lake for Fabric. Every workload stores data inside OneLake, eliminating data silos and reducing duplication.”
5. What is the significance of OneLake?
Answer
Benefits:
- Single source of truth
- Data sharing
- Governance
- Security
- Faster analytics
6. Explain the Medallion Architecture in Fabric
Answer
Layers:
Bronze
Raw Data
Example:
Customer.csv
Sales.jsonSilver
Cleaned Data
Example:
Deduplicated customers
Validated transactionsGold
Business-ready Data
Example:
Sales KPI tables
Revenue AggregationsArchitecture:
Source
↓
Bronze
↓
Silver
↓
Gold
↓
Power BI7. What is a Lakehouse?
Answer
A Lakehouse combines:
- Data Lake flexibility
- Data Warehouse performance
Features:
- Structured data
- Semi-structured data
- Unstructured data
Stored as Delta Tables.
8. Difference Between Data Lake and Lakehouse
| Data Lake | Lakehouse |
|---|---|
| Raw Storage | Managed Tables |
| Limited SQL | Full SQL |
| No ACID | ACID Support |
| Schema-on-read | Schema Enforcement |
9. What is a Warehouse in Fabric?
Answer
Fabric Warehouse provides:
- SQL analytics
- T-SQL support
- Star schema support
- BI reporting
Use cases:
- Enterprise reporting
- KPI dashboards
- Data marts
10. Lakehouse vs Warehouse
| Lakehouse | Warehouse |
|---|---|
| Spark Engine | SQL Engine |
| Data Engineering | BI Analytics |
| Flexible Schema | Structured Schema |
| Data Science | Reporting |
11. What is Delta Lake?
Answer
Delta Lake is an open-source storage layer providing:
- ACID transactions
- Schema enforcement
- Time travel
- Data versioning
12. What are Delta Tables?
Answer
Delta tables are parquet files with transaction logs.
Benefits:
- Faster reads
- Rollbacks
- Incremental processing
13. Explain Time Travel
Answer
Allows querying historical data versions.
Example:
SELECT *
FROM sales
VERSION AS OF 5;Useful for:
- Audits
- Recovery
- Data validation
14. What is Data Factory in Fabric?
Answer
Data Factory provides:
- Pipelines
- Dataflows Gen2
- Data ingestion
- ETL/ELT
Similar to Azure Data Factory.
15. Pipeline vs Dataflow
| Pipeline | Dataflow |
|---|---|
| Orchestration | Transformation |
| Workflow Control | Power Query |
| Scheduling | Data Cleaning |
16. What are Dataflows Gen2?
Answer
Low-code ETL tool.
Features:
- Power Query
- Incremental refresh
- Data transformation
17. What is Data Engineering in Fabric?
Answer
Uses Spark notebooks for:
- ETL
- Big Data processing
- Data transformation
Languages:
- Python
- Scala
- SQL
- Spark SQL
18. What is a Notebook?
Answer
Interactive development environment.
Supports:
- PySpark
- Spark SQL
- Python
Example:
df = spark.read.csv("/Files/sales.csv")
df.show()19. What is Spark?
Answer
Apache Spark is a distributed processing engine.
Benefits:
- In-memory computation
- Parallel processing
- Big data support
20. What is Spark Pool?
Answer
Collection of Spark compute resources.
Fabric manages Spark automatically.
No infrastructure management required.
21. Explain Fabric Capacity
Answer
Fabric uses Capacity Units (CU).
Examples:
- F2
- F4
- F8
- F64
More CU = More compute power.
22. What is Capacity Throttling?
Answer
Occurs when workload exceeds available capacity.
Symptoms:
- Slow reports
- Delayed jobs
- Query failures
23. How do you optimize Fabric Capacity?
Answer
- Use incremental refresh
- Optimize SQL
- Schedule heavy jobs
- Use partitioning
- Monitor Capacity Metrics
24. What is Direct Lake?
Answer
Direct Lake allows Power BI to read OneLake data directly.
Benefits:
- No import
- No refresh
- Near real-time
25. Direct Lake vs Import Mode
| Direct Lake | Import |
|---|---|
| Real-time | Refresh Required |
| Faster Updates | Cached |
| No Data Duplication | Duplicate Storage |
26. Direct Lake vs DirectQuery
| Direct Lake | DirectQuery |
|---|---|
| Reads OneLake | Reads Source |
| Faster | Slower |
| Fabric Native | External Source |
27. Explain Data Activator
Answer
No-code event monitoring tool.
Example:
If:
Sales < $1000Then:
Send Alert28. What is Real-Time Analytics?
Answer
Analyzes streaming data.
Examples:
- IoT
- Clickstream
- Telemetry
29. What is Eventstream?
Answer
Fabric service for streaming ingestion.
Sources:
- Kafka
- Event Hub
- IoT
30. What is KQL?
Answer
Kusto Query Language.
Used in:
- Real-Time Analytics
- Log Analytics
Example:
StormEvents
| where State == "Texas"31. Explain Fabric Security
Answer
Security layers:
- Entra ID Authentication
- RBAC
- Workspace Permissions
- OneLake Security
- Sensitivity Labels
32. What is Row-Level Security (RLS)?
Answer
Restricts data access by user.
Example:
Manager sees:
All RegionsSales Rep sees:
Assigned Region Only33. What is Object-Level Security (OLS)?
Answer
Restricts access to:
- Tables
- Columns
34. What is Purview Integration?
Answer
Fabric integrates with Microsoft Purview for:
- Data Catalog
- Lineage
- Governance
- Compliance
35. Explain Data Lineage
Answer
Tracks data movement.
Example:
SQL DB
↓
Pipeline
↓
Lakehouse
↓
Power BI36. What are Shortcuts in OneLake?
Answer
Virtual references to external data.
Supported:
- ADLS
- Amazon S3
No data copy required.
37. Why are Shortcuts important?
Answer
Benefits:
- Zero-copy architecture
- Reduced storage costs
- Faster access
38. What is Mirroring?
Answer
Near real-time replication into Fabric.
Sources:
- Azure SQL
- SQL Server
- Databases
39. Explain Fabric Mirroring Use Case
Answer
Example:
ERP Database → Fabric
Changes automatically synchronized.
40. What is Semantic Model?
Answer
Business layer used by Power BI.
Contains:
- Measures
- Relationships
- KPIs
Advanced Scenario Questions
41. How would you design a Fabric architecture for a retail company?
Answer
Architecture:
POS Systems
ERP
CRM
↓
Data Factory
↓
Bronze Lakehouse
↓
Silver Transformations
↓
Gold Layer
↓
Direct Lake
↓
Power BI42. How would you process 10 TB daily in Fabric?
Answer
- Use Spark notebooks
- Delta Tables
- Partitioning
- Incremental loads
- Auto Optimize
43. How would you reduce Power BI refresh time?
Answer
- Direct Lake
- Aggregations
- Incremental refresh
- Star schema
44. How would you migrate from Synapse to Fabric?
Answer
- Move data to OneLake
- Recreate pipelines
- Convert Spark notebooks
- Rebuild semantic models
- Enable Direct Lake
45. How would you handle Slowly Changing Dimensions (SCD Type 2)?
Answer
Maintain:
Customer_ID
Start_Date
End_Date
Current_FlagTrack historical changes.
Senior Architect Questions
46. Why choose Fabric over Databricks?
Answer
Choose Fabric when:
- Heavy Power BI usage
- Unified platform needed
- Business analytics focus
Choose Databricks when:
- Advanced ML
- Custom Spark workloads
- Multi-cloud strategy
47. Why choose Fabric over Snowflake?
Answer
Fabric:
- Integrated Power BI
- OneLake
- Lower integration effort
Snowflake:
- Multi-cloud
- Strong SQL warehouse
- Independent ecosystem
48. Explain End-to-End Fabric Data Flow
Source Systems
↓
Data Factory
↓
OneLake
↓
Lakehouse
↓
Spark Transformation
↓
Gold Layer
↓
Warehouse
↓
Semantic Model
↓
Power BIMost Frequently Asked Microsoft Fabric Interview Questions
What is OneLake?
Unified storage layer.
What is Direct Lake?
Power BI reads OneLake directly.
What is a Lakehouse?
Combination of Data Lake + Warehouse.
What is Mirroring?
Near real-time database replication.
What is a Shortcut?
Zero-copy data access.
What is Eventstream?
Streaming ingestion service.
What is Delta Lake?
ACID-compliant storage layer.
What is Medallion Architecture?
Bronze → Silver → Gold.
What is Data Activator?
Event-driven alerting service.
What is Fabric Capacity?
Compute units used to run workloads.
Critical Real-World Questions Asked by US Companies (2025–2026)
- Design a Fabric Lakehouse architecture for 100 million daily transactions.
- Explain Direct Lake internals.
- Fabric vs Databricks vs Snowflake.
- Implement SCD Type 2 using Fabric.
- Optimize a slow Direct Lake report.
- Design Medallion architecture in Fabric.
- Explain OneLake shortcuts.
- Fabric security model.
- Fabric capacity planning strategy.
- Fabric migration roadmap from Synapse/ADF.
- CDC implementation using Mirroring.
- Delta Lake optimization techniques.
- Spark optimization in Fabric.
- Fabric governance using Purview.
- Real-time streaming architecture using Eventstream + KQL.
These topics cover approximately 90–95% of Microsoft Fabric interview questions typically asked for Data Engineer, Analytics Engineer, BI Engineer, Cloud Data Engineer, and Data Architect roles in the U.S. market.
Preparing for a Microsoft Fabric interview requires understanding both architectural concepts and practical implementation details. This guide covers fundamental, data engineering, security, and scenario-based questions with comprehensive answers.
Table of Contents
- Fundamental & Architecture Questions
- Lakehouse & Data Warehousing Questions
- Data Engineering & Pipelines
- Security & Governance Questions
- Performance Optimization Questions
- Real-World Scenario Questions
- Quick Reference Answer Guide
Fundamental & Architecture Questions
Q1: What is Microsoft Fabric, and how does it unify data analytics workflows?
Answer:
Microsoft Fabric is a unified, end-to-end analytics platform that integrates multiple data workloads into a single software-as-a-service (SaaS) solution . It combines capabilities from Power BI, Azure Synapse, and Azure Data Factory, creating a cohesive environment where data professionals can work without juggling multiple services.
Key unification aspects:
- Single data lake (OneLake): One central storage system instead of separate silos
- Shared semantic model: Consistent business definitions across all tools
- Integrated experiences: Data engineering, data warehousing, real-time analytics, and Power BI all within one interface
- SaaS simplicity: No infrastructure management—automatic scaling, patching, and updates
Interview Tip: Emphasize how Fabric eliminates the “spaghetti architecture” problem where companies had disconnected tools for each analytics task.
Q2: What is OneLake, and how does it compare to traditional data lakes?
Answer:
OneLake is the single, unified data lake that serves as the storage foundation for Microsoft Fabric . Unlike traditional data lakes, OneLake is not another storage account you provision—it’s built into Fabric and automatically available.
Comparison table:
| Aspect | Traditional Data Lake (ADLS Gen2) | OneLake in Fabric |
|---|---|---|
| Provisioning | Manual storage account creation | Automatic with Fabric workspace |
| Multiple tenants | Separate accounts per workload | Single logical lake across all workloads |
| Structure | Folder-based organization | Workspace + item-based organization |
| Shortcuts | Not available | Native shortcuts to external data |
| Open formats | Supports Parquet/Delta | Built on Delta Parquet format |
Key differentiator: OneLake provides “shortcuts” (similar to symbolic links) that let you reference data from other lakes without copying it .
Q3: Explain the concept of Lakehouse in Microsoft Fabric.
Answer:
A Lakehouse combines the best features of data lakes (low-cost storage, schema flexibility) and data warehouses (performance, ACID transactions, governance). In Microsoft Fabric, the Lakehouse is a native item that stores data in Delta Parquet format.
Core components:
- Files section: Raw files (structured/semi-structured)
- Tables section: Managed Delta tables with schema enforcement
- SQL endpoint: Automatic availability for T-SQL queries
- Default semantic model: Auto-generated for Power BI reporting
Why it matters: Fabric Lakehouse eliminates the need to choose between data lake and warehouse—you get both without data duplication .
Q4: What are the core components of Microsoft Fabric?
Answer:
Fabric consists of seven core workloads (called “experiences”):
| Component | Primary Use |
|---|---|
| Data Factory | Data ingestion and orchestration (like ADF) |
| Synapse Data Engineering | Spark-based data transformation using notebooks |
| Synapse Data Warehousing | Traditional data warehouse with T-SQL |
| Synapse Data Science | ML model development and tracking |
| Real-Time Analytics | Streaming data with KQL (Kusto Query Language) |
| Power BI | Reporting and semantic modeling |
| Data Activator | Automated actions based on data patterns |
All components share the same OneLake storage and security model .
Lakehouse & Data Warehousing Questions
Q5: What’s the difference between Lakehouse and Warehouse items in Fabric, and when would you choose one over the other?
Answer:
This is a critical distinction interviewers test frequently .
Lakehouse:
- Storage format: Delta Parquet (open format)
- Query engine: Spark SQL + SQL endpoint
- Best for: Exploratory analytics, streaming data, ML workloads
- Schema: Schema-on-read with optional enforcement
- Cost: Lower storage cost, moderate query performance
Warehouse:
- Storage format: Native columnar format (optimized for T-SQL)
- Query engine: T-SQL only
- Best for: Enterprise reporting, governed BI, complex joins
- Schema: Schema-on-write (strict enforcement)
- Cost: Higher storage cost, excellent query performance
Decision guide:
- Choose Lakehouse when: Data is unstructured, multiple formats, data science workloads, cost is primary concern
- Choose Warehouse when: Strong governance required, complex T-SQL, existing Synapse migration, predictable schemas
Interview Tip: Explain hybrid approaches—build Lakehouse for raw ingestion, then create shortcuts or views from Warehouse.
Q6: How do you implement an end-to-end Lakehouse architecture for real-time analytics?
Answer:
A complete Lakehouse architecture for real-time analytics follows this flow :
text
Ingestion → Streaming Processing → Storage → Serving → Visualization 1. Event Hub/IoT Hub → 2. Stream Analytics/Spark Streaming → 3. Lakehouse Delta Tables → 4. SQL Endpoint → 5. Power BI Direct Lake
Step-by-step implementation:
- Ingestion layer: Configure Event Hubs or Kafka to receive streaming data
- Processing: Use Fabric notebooks with Structured Streaming or Real-Time Hub
- Storage: Write to Lakehouse Delta tables with automatic partitioning
- Optimization: Implement Z-order clustering on frequently filtered columns
- Serving: Use the automatic SQL endpoint for reporting
- Visualization: Connect Power BI using Direct Lake mode for sub-second latency
Key to real-time: Use Direct Lake mode—not DirectQuery or Import—to query Delta files directly without moving data .
Q7: How do you set up a Lakehouse, and what scenarios make it a good fit?
Answer:
- Navigate to Fabric workspace → Create new item → Lakehouse
- Name the Lakehouse and confirm creation
- Two views appear: Explorer (files/tables) and SQL endpoint
- Load data via Data Factory pipeline, notebook, or upload
- Tables auto-discover from Delta files in managed folder
Best-fit scenarios:
- Data lake modernization: Existing ADLS Gen2 with Delta format
- Medallion architecture: Bronze (raw), Silver (cleaned), Gold (aggregated) layers
- Data science projects: Need Spark and Python support
- Multi-format data: Mix of JSON, CSV, Parquet, images
- Cost-sensitive analytics: Large data volumes with moderate performance needs
Data Engineering & Pipelines
Q8: Explain how Fabric Data Pipelines differ from Azure Data Factory pipelines—and when to use each.
Answer:
This question tests your understanding of Fabric’s relationship to existing Azure services .
| Aspect | Fabric Data Pipelines | Azure Data Factory (ADF) |
|---|---|---|
| Location | Within Fabric workspace | Standalone Azure service |
| Activities | Copy, Notebook, KQL, Dataflow, Spark | 100+ activities including custom |
| Integration | Native with OneLake, shortcuts | Requires explicit connectors |
| Orchestration | Workspace-level triggers | More complex trigger options |
| Code reusability | Limited to workspace | Data Factory Studio reuse |
| Pricing model | Fabric Capacity Units | ADF v2 pricing |
When to use each:
- Use Fabric Pipeline: For workloads entirely within Fabric ecosystem, OneLake-to-OneLake copies, native Fabric activities
- Use ADF: For complex hybrid scenarios, existing ADF investments, custom .NET activities, SSIS lift-and-shift
Interview Insight: Microsoft wants Fabric to eventually replace ADF for most data integration, but they maintain both for now .
Q9: How do you optimize Spark job performance inside Fabric Notebooks for large-scale datasets?
Answer:
Spark optimization is a high-priority topic in Fabric interviews .
Key strategies:
1. Partitioning optimization:
python
# Read with optimal partitions
df = spark.read.parquet("path").repartition(200) # Based on cluster cores
# Write with partitioning column
df.write.partitionBy("year", "month").format("delta").save("path")2. Use Delta optimizations:
sql
OPTIMIZE table_name; -- Compacts small files VACUUM table_name RETAIN 168 HOURS; -- Clean up old versions
3. Caching strategies:
python
df.cache() # For reused DataFrames df.count() # Forces cache materialization
4. Shuffle tuning:
python
spark.conf.set("spark.sql.shuffle.partitions", "200")
spark.conf.set("sppark.sql.adaptive.enabled", "true") # AQE in Spark 3.x5. Notebook-specific:
- Use
%runfor modular code instead of large monolithic notebooks - Detach and reattach sessions when memory issues occur
- Monitor Spark UI for skew detection
Q10: How would you implement complete data lineage, monitoring, and alerting within Fabric?
Answer:
Fabric provides multi-layered observability :
Data Lineage:
- Impact analysis view: Shows upstream/downstream dependencies
- Workspace lineage: Visual graph of item relationships (pipelines, dataflows, datasets)
- Column-level lineage: When using Purview integration
Monitoring:
- Monitor Hub: Central view of all runs (pipeline, notebook, dataflow)
- Spark Application History: Detailed job stages, tasks, and executors
- Capacity Metrics app: Track Consumption (CUs) and throttling events
Alerting Setup:
python
# Example: Custom alert from notebook
from notebookutils import mssparkutils
if failed_count > threshold:
mssparkutils.notebook.run("SendAlertNotebook")
# Or use Pipeline Web activity to call Logic AppBest practice: Set up alerts on:
- Pipeline failures (email to Data Engineer DL)
- Capacity exceeding 80% for more than 10 minutes
- Long-running queries (> 5 minutes)
Q11: What’s your strategy for handling schema drift, incremental loading, and CDC in Fabric?
Answer:
python
# In Dataflow Gen2: Use "Allow schema drift" option
# In Notebooks: Use dynamic schema evolution
df.write.mode("append").option("mergeSchema", "true").saveAsTable("target")Incremental Loading patterns:
- Watermark technique:
sql
SELECT * FROM source WHERE LastModifiedDate > (SELECT MAX(LastModifiedDate) FROM target)
- Delta Lake Change Data Feed:
sql
-- Enable on table
ALTER TABLE mytable SET TBLPROPERTIES (delta.enableChangeDataFeed = true)
-- Read changes
SELECT * FROM table_changes('mytable', 123, 456)CDC Implementation:
| Source Type | Fabric Method |
|---|---|
| SQL Server | Azure Data Factory with CDC connector |
| Event Hubs | Real-Time Hub → Lakehouse |
| API sources | Incremental pipeline with last-run timestamp |
| Files (ADLS) | File modified date with LastFileModified function |
Security & Governance Questions
Q12: How does Microsoft Fabric handle data encryption?
Answer:
Fabric implements a multi-layer encryption approach :
At Rest (OneLake storage):
- Default: Azure Storage Service Encryption (SSE) with Microsoft-managed keys
- Enhanced: Customer-managed keys (CMK) via Azure Key Vault for compliance (HIPAA, PCI DSS)
- Double encryption: Optional for highest sensitivity workloads
In Transit:
- TLS 1.2+ for all service-to-service communication
- HTTPS required for all API access
Practical example: A healthcare provider storing patient records enables CMK encryption to meet HIPAA requirements while maintaining analytics processing .
Q13: What are the key security roles in Fabric, and how would you implement them in an enterprise scenario?
Answer:
Pre-defined roles (Workspace level) :
| Role | Permissions |
|---|---|
| Admin | Full control: manage users, settings, delete workspace |
| Member | Create/Edit items, share, but cannot manage permissions |
| Contributor | Create/Edit items, cannot share or manage access |
| Viewer | Read-only access to all workspace items |
Enterprise implementation:
yaml
# Global retail company example: Workspace_Europe_Sales: - Germany_Team: Contributor (local sales data only) - France_Team: Contributor - Global_Finance: Viewer (cross-region reports) - IT_Admin_Group: Admin Workspace_APAC_SupplyChain: - Singapore_Ops: Admin - India_Logistics: Member - Global_Exec: Viewer
Best practice: Apply least privilege principle. Use Azure AD groups—never assign permissions directly to users .
Q14: How do you implement Row-Level Security (RLS) for a sales organization using Fabric?
Answer:
RLS restricts row access based on user attributes. Implementation path :
Step 1: Define DAX filters in the semantic model
text
DAX Role: "SalesRep_Filter" [Region] = USERNAME() DAX Role: "Manager_Filter" [EmployeeID] = USERPRINCIPALNAME()
Step 2: Create roles in Fabric
- Navigate to semantic model → Row-Level Security
- Define role name and DAX expression
Step 3: Test using “View as” role
- Verify each user sees only their rows
Step 4: Assign users (Azure AD groups recommended)
text
Role: SalesRep_Filter → Members: SalesRep_Europe_Group, SalesRep_APAC_Group Role: Manager_View → Members: Sales_Management_Group
Real-world scenario: A global sales company ensures European sales reps cannot see APAC transaction data while maintaining a single dataset for management reporting .
Q15: How do sensitivity labels help secure financial reports?
Answer:
Microsoft Purview sensitivity labels integrate natively with Fabric .
How it works:
- Classification: Label automatically applied based on content patterns (SSN, credit card, financial terms)
- Protection: Encryption applied to the content
- Enforcement: Prevents sharing outside authorized departments
- Audit: Tracks all access attempts
Banking example:
- M&A analysis reports automatically get “Highly Confidential” label
- Label triggers encryption and blocks external sharing
- Manager approval required before sharing outside M&A department
- All access attempts logged for FINRA compliance
Implementation:
powershell
# Labels applied via Purview compliance portal # Auto-labeling policies based on: - Sensitive info types (account numbers, SWIFT codes) - Pattern matching (confidentiality headers) - User classification (investment banking vs retail banking)
Q16: Why would a pharmaceutical company implement Private Link for their Fabric environment?
Answer:
Private Link creates secure, private network connections between Fabric and Azure Virtual Network .
Clinical trial scenario:
- Patient data must never traverse public internet
- Private Link ensures all data movement stays within Azure backbone network
- Combines with Network Security Groups for strict IP whitelisting
- Complies with FDA data protection requirements
Benefits for regulated industries:
| Concern | Private Link Solution |
|---|---|
| Data interception | No internet exposure |
| Hybrid connectivity | Secure on-prem gateway connection |
| IP restriction | Whitelist only research facility IPs |
| Compliance | No public endpoint logging required |
Setup: Configure private endpoints for OneLake, Power BI, and Data Factory within the organization’s VNet.
Q17: How should a retail chain handle payment processing secrets in their Fabric pipelines?
Answer:
Use Azure Key Vault as the secrets management solution .
DO NOT hardcode:
python
# ❌ Wrong - Never do this connection_string = "Server=paymentdb;User=sa;Password=SuperSecret123!"
Correct approach:
python
# ✅ Right - Reference Key Vault
from notebookutils import mssparkutils
secret = mssparkutils.credentials.getSecret("https://keyvault.vault.azure.net/", "payment-api-key")Best practices for retail:
- Store payment gateway credentials, API keys, connection strings
- Implement automated key rotation (every 30-90 days)
- Enable detailed audit logging for all secret access attempts
- Use managed identities—no service principal secrets stored anywhere
- Comply with PCI DSS by logging every secret access
Performance Optimization Questions
Q18: Your Power BI report connected via Direct Lake is showing slow load times. What performance tuning strategies would you apply?
Answer:
This is a common scenario-based question .
Diagnostic checklist:
- Check if report is actually using Direct Lake (not falling back to DirectQuery)
- Identify bottlenecks using Performance Analyzer in Power BI Desktop
Optimization strategies:
| Issue | Solution |
|---|---|
| Small file problem | Run OPTIMIZE on Delta tables to compact files |
| Large column counts | Remove unused columns from the semantic model |
| Complex DAX measures | Pre-aggregate at Lakehouse level via Spark |
| Cross-table joins | Ensure join columns are partition-aligned |
| High cardinality columns | Implement aggregations for high-level reports |
Step-by-step tuning:
sql
-- 1. Compact Delta files
OPTIMIZE sales_table WHERE date >= '2024-01-01'
-- 2. Create Z-order index on frequently filtered colum
OPTIMIZE sales_table ZORDER BY (transaction_date, region)
-- 3. Apply VACUUM to remove old versions
VACUUM sales_table RETAIN 168 HOURS
-- 4. Verify file sizes (target 100MB-1GB per file)
Direct Lake specific: Force refresh of semantic model after major Lakehouse changes to ensure caching.
Q19: How do you configure and manage Fabric capacities to avoid throttling?
Answer:
Fabric uses Capacity Units (CUs) that function like fuel—operations consume CUs, and smooth out over a 30-second window .
Key concepts:
- Smoothing window: 30 seconds (operations in this window are averaged)
- Throttling occurs when: Average CU consumption > provisioned capacity for >30 seconds
- Throttling behavior: Queue operations (spillover) not failure
Management strategies:
1. Monitor capacity usage:
- Use Fabric Capacity Metrics app (download from AppSource)
- Track operations by user, item type, time of day
2. Optimize heavy operations:
python
# Instead of 10 small writes:
for file in files:
df.write.mode("append").save() # 10 operations
# Do one batch write:
all_df = reduce(union, [read(f) for f in files])
all_df.write.mode("append").save() # 1 operation3. Time-shift workloads:
- Schedule heavy ETL during off-hours
- Stagger pipeline start times
4. Use bursting:
Fabric allows short-term spikes above provisioned capacity—design for average, not peak
5. Purchase backup capacity:
Pay-as-you-go for overflow during peak seasons
Real-World Scenario Questions
Q20: How do you integrate external Delta Lake tables from ADLS Gen2 into Fabric’s OneLake while preserving lineage?
Answer:
Use OneLake Shortcuts—references to external data without copying .
Implementation:
text
ADLS Gen2 location: abfss://container@storage.dfs.core.windows.net/external/tables/sales/ In Fabric Lakehouse: Create shortcut to above path Result: Data appears in Lakehouse files/tables section
Lineage preservation:
- External tables maintain original metadata (created/modified dates)
- Purview integration traces source system through shortcut
- Pipeline lineage shows data flow from external source to Fabric items
Steps:
python
# Create shortcut via notebook
from notebookutils import fabricNotebook
fabricNotebook.create_shortcut(
path="abfss://external@storage.dfs.core.windows.net/sales",
shortcut_type="ADLS"
)Benefits over copy:
- No storage duplication costs
- Instant availability (zero copy time)
- Always current (no sync lag)
- Access control enforced at source level
Q21: How would you design a disaster recovery and data backup strategy in Fabric?
Answer:
Fabric presents unique challenges since OneLake is a unified storage system .
Backup strategies:
| Component | DR Approach |
|---|---|
| OneLake data | Zone-redundant storage (ZRS) across availability zones |
| Cross-region | Manual or automated replication to paired region |
| Semantic models | Export .pbip files to source control |
| Pipelines/Notebooks | Git integration (Azure DevOps or GitHub) |
| Workspace metadata | Use Fabric REST APIs to export workspace JSON |
Practical implementation:
Level 1 (Within region):
- Enable ZRS on storage (Microsoft managed)
- Recovery Point Objective (RPO): < 15 minutes
Level 2 (Cross-region):
python
# Scheduled notebook to replicate critical tables
def replicate_to_dr_region():
df = spark.read.format("delta").load("abfss://primary@onelake.dfs.fabric.microsoft.com/critical")
df.write.format("delta").save("abfss://dr@drstorage.dfs.core.windows.net/backup/")Level 3 (Metadata backup):
- Connect Fabric workspace to Git repository
- Schedule weekly export of pipeline definitions
- Store notebook source code in version control
Recovery procedure:
- Restore workspace from Git (metadata)
- Reattach to surviving OneLake data (automatic)
- If region-level failure, redirect to DR storage using shortcuts
Q22: Your organization wants to integrate machine learning predictions from Azure ML into a Fabric Lakehouse. How will you design that end-to-end integration pipeline?
Answer:
This tests your ability to combine Microsoft’s ML ecosystem with Fabric .
Architecture:
text
Azure ML Model Training → Model Registry → Batch Scoring Pipeline → Lakehouse Storage → Power BI Reporting
Step-by-step implementation:
1. Register and deploy model in Azure ML:
python
# In Azure ML workspace model = Model.register(workspace, "model_path", "sales_forecast_model") endpoint = OnlineEndpoint.deploy(name="forecast-endpoint", model=model)
2. Create Fabric notebook for batch inference:
python
# Fabric notebook - Batch scoring
import requests
from notebookutils import mssparkutils
# Get model endpoint from Key Vault
endpoint = mssparkutils.credentials.getSecret("keyvault", "ml-endpoint")
token = mssparkutils.credentials.getToken("https://ml.azure.com")
# Read new data from Lakehouse
new_orders = spark.read.table("bronze.orders").filter("processed = false")
# Score in batches (5000 records per call)
results = []
for batch in new_orders.limit(5000).collect():
response = requests.post(endpoint, json={"data": batch}, headers={"Authorization": f"Bearer {token}"})
results.append(response.json())
# Write predictions to Lakehouse
scored_df = spark.createDataFrame(results)
scored_df.write.mode("append").saveAsTable("silver.sales_predictions")3. Orchestrate with Data Pipeline:
text
Pipeline Schedule (daily at 6 AM):
Notebook: "Batch Scoring" →
Copy Activity: predictions to gold layer →
Semantic Model Refresh4. Consumption:
- Power BI connects to predictions table via Direct Lake
- Optionally retrain model monthly using Fabric Data Science
Quick Reference Answer Guide
Top 5 Interviewers’ Focus Areas:
- OneLake & Shortcuts – Understand zero-copy data sharing
- Direct Lake mode – Know when and why to use it
- Capacity management – Monitor and optimize CU consumption
- Security (RLS + Purview) – Implement compliance patterns
- Differences from Synapse/ADF – Know when Fabric is better
Critical Differences to Remember:
| Concept | Traditional Azure | Microsoft Fabric |
|---|---|---|
| Storage | Separate ADLS accounts | OneLake (unified) |
| Data warehouse | Dedicated SQL pool | Warehouse item |
| ETL | ADF pipelines | Data Factory pipelines (Fabric-native) |
| Lake queries | Spark/SQL endpoints | Lakehouse + SQL endpoint |
| Semantic model | Power BI dataset | Same but native in Fabric |
Final Preparation Tips
From real interview experiences :
✅ DO:
- Practice explaining Fabric as “SaaS for all data workloads”
- Prepare real project examples that use multiple Fabric components
- Understand capacity units (CUs) and smoothing window concept
- Know when to choose Lakehouse vs Warehouse with specific trade-offs
❌ DON’T:
- Treat Fabric like “just another Synapse module” (this is a common failure point)
- Claim Direct Lake mode is always better than Import mode
- Ignore security and governance (it’s a top interview focus)
- Forget about Purview integration for enterprise scenarios
Remember: Microsoft doesn’t test syntax—they test how you think about data as a unified, governed, and scalable ecosystem . Focus on architectural understanding and trade-off analysis in your answers.


