Cloud Engineer Interview Questions & Answers (Highest Priority)

Cloud Engineer

For a Cloud Engineer role (AWS/Azure/GCP), interviewers generally evaluate:

  1. Cloud Fundamentals
  2. Networking
  3. Compute & Virtualization
  4. Storage
  5. Security & IAM
  6. Containers & Kubernetes
  7. Infrastructure as Code (IaC)
  8. CI/CD & DevOps
  9. Monitoring & Troubleshooting
  10. High Availability & Disaster Recovery
  11. Cost Optimization
  12. Cloud Migration
  13. Automation & Scripting
  14. System Design & Scenario-Based Questions

Cloud Engineer roles focus on designing, implementing, managing, and optimizing cloud infrastructure across providers like AWS, Azure, and GCP. Interviews typically cover fundamentals, platform-specific knowledge (often one primary provider), IaC (Terraform, CloudFormation, ARM/Bicep), networking, security, DevOps practices, cost optimization, monitoring, and scenario-based troubleshooting.

Expect a mix of:

  • Conceptual questions
  • Technical/deep-dive
  • Scenario-based (design, troubleshooting, migration)
  • Behavioral (past projects, teamwork, incidents)

Preparation priorities in 2026: Terraform fluency, Kubernetes (EKS/GKE/AKS), IAM/least privilege, multi-cloud/hybrid, observability, security/compliance, and cost management.

Below is a structured list of high-priority questions with detailed answers (explanations, best practices, and examples).

1. Cloud Fundamentals

Q1. What is Cloud Computing?

Answer:

Cloud computing is the delivery of computing services such as servers, storage, databases, networking, analytics, and AI over the internet on a pay-as-you-go model.

Benefits:

  • Scalability
  • High Availability
  • Cost Optimization
  • Global Reach
  • Security
  • Faster Deployment

Example:

Instead of buying physical servers, organizations use AWS EC2 instances and pay only for usage.


Q2. What are the Cloud Service Models?

Answer:

IaaS (Infrastructure as a Service)

Provides infrastructure.

Examples:

  • AWS EC2
  • Azure VM
  • Google Compute Engine

User manages:

  • OS
  • Applications
  • Security

Provider manages:

  • Hardware
  • Network

PaaS (Platform as a Service)

Examples:

  • AWS Elastic Beanstalk
  • Azure App Service

User manages:

  • Application

Provider manages:

  • Infrastructure
  • Runtime

SaaS (Software as a Service)

Examples:

  • Salesforce
  • Gmail
  • Microsoft 365

Everything managed by provider.


Q3. What are Public, Private, and Hybrid Clouds?

Public Cloud

Infrastructure shared among customers.

Examples:

  • AWS
  • Azure
  • GCP

Private Cloud

Dedicated infrastructure.

Examples:

  • VMware
  • OpenStack

Hybrid Cloud

Combination of both.

Example:
On-premises database + AWS applications.


2. Networking

Q4. What is a VPC?

Answer:

Virtual Private Cloud (VPC) is a logically isolated network in AWS.

Components:

  • CIDR Block
  • Subnets
  • Route Tables
  • Internet Gateway
  • NAT Gateway
  • Security Groups
  • NACLs

Q5. Difference between Security Group and NACL?

FeatureSecurity GroupNACL
LevelInstanceSubnet
StatefulYesNo
Allow RulesYesYes
Deny RulesNoYes

Example:

Security Group:
Allow TCP 443

NACL:
Allow 443
Deny specific IP range


Q6. What is NAT Gateway?

Answer:

Allows private subnet resources to access internet without exposing them.

Example:
Private EC2 downloading software updates.


Q7. What is CIDR?

Answer:

Example:

10.0.0.0/16

  • 65,536 IP addresses
  • Network Mask = 255.255.0.0

Q8. Explain DNS.

Answer:

DNS converts domain names into IP addresses.

Example:

google.com → 142.250.x.x

AWS Service:
Route 53


3. Compute Services

Q9. What is EC2?

Answer:

Elastic Compute Cloud provides virtual servers.

Features:

  • Auto Scaling
  • Load Balancer Integration
  • Multiple Instance Types

Q10. Types of EC2 Instances?

General Purpose

t3, t4g, m5

Compute Optimized

c5, c6

Memory Optimized

r5, x1

Storage Optimized

i3, d2

GPU

p4, g5


Q11. What is Auto Scaling?

Answer:

Automatically increases or decreases EC2 instances based on demand.

Benefits:

  • High Availability
  • Cost Optimization

Q12. What is Elastic Load Balancer?

Answer:

Distributes traffic across servers.

Types:

  • ALB
  • NLB
  • GWLB

4. Storage

Q13. Difference Between EBS, EFS, and S3?

ServiceType
EBSBlock Storage
EFSFile Storage
S3Object Storage

Q14. What is S3?

Answer:

Object storage service.

Features:

  • 11 9’s durability
  • Lifecycle Policies
  • Versioning
  • Replication

Q15. What is Versioning in S3?

Answer:

Stores multiple versions of an object.

Benefits:

  • Recovery
  • Protection against accidental deletion

Q16. What is Glacier?

Answer:

Low-cost archival storage.

Used for:

  • Compliance
  • Backups

5. IAM & Security

Q17. What is IAM?

Answer:

Identity and Access Management controls access to AWS resources.

Components:

  • Users
  • Groups
  • Roles
  • Policies

Q18. Difference Between IAM Role and IAM User?

IAM User

Permanent credentials.

IAM Role

Temporary credentials.

Used by:

  • EC2
  • Lambda
  • EKS

Best Practice:
Use Roles whenever possible.


Q19. What is Least Privilege?

Answer:

Grant only required permissions.

Example:

Allow:
s3:GetObject

Not:
s3:*


Q20. What is Multi-Factor Authentication?

Answer:

Adds second authentication factor.

Example:
Password + Mobile OTP


6. Containers & Kubernetes

Q21. What is Docker?

Answer:

Containerization platform.

Benefits:

  • Lightweight
  • Portable
  • Fast startup

Q22. Difference Between VM and Container?

VMContainer
Has OSShares Host OS
HeavyLightweight
SlowFast

Q23. What is Kubernetes?

Answer:

Container orchestration platform.

Functions:

  • Scheduling
  • Auto-healing
  • Scaling
  • Service Discovery

Q24. Kubernetes Components?

Control Plane:

  • API Server
  • Scheduler
  • Controller Manager
  • etcd

Worker Node:

  • kubelet
  • kube-proxy
  • Pods

Q25. What is a Pod?

Answer:

Smallest deployable unit in Kubernetes.

Contains:

  • One or more containers

Q26. What is EKS?

Answer:

Amazon Elastic Kubernetes Service.

Managed Kubernetes platform.

Benefits:

  • Control plane managed by AWS
  • Integrated IAM
  • High Availability

7. Infrastructure as Code

Q27. What is Terraform?

Answer:

Infrastructure as Code tool by HashiCorp.

Benefits:

  • Automation
  • Version Control
  • Reusable Infrastructure

Q28. What are Terraform State Files?

Answer:

Track deployed resources.

File:
terraform.tfstate


Q29. What is Terraform Remote State?

Answer:

Stores state in S3.

Benefits:

  • Team Collaboration
  • Locking via DynamoDB

Q30. Terraform vs CloudFormation?

TerraformCloudFormation
Multi-cloudAWS only
HCLJSON/YAML

8. CI/CD

Q31. What is CI/CD?

CI

Continuous Integration

CD

Continuous Delivery/Deployment

Benefits:

  • Faster releases
  • Reduced errors

Q32. Popular CI/CD Tools?

  • Jenkins
  • GitHub Actions
  • GitLab CI
  • Azure DevOps
  • AWS CodePipeline

Q33. Explain CI/CD Pipeline.

Steps:

  1. Code Commit
  2. Build
  3. Test
  4. Security Scan
  5. Deploy
  6. Monitor

9. Monitoring

Q34. How do you monitor cloud resources?

AWS:

  • CloudWatch
  • CloudTrail
  • Config
  • X-Ray

Q35. Difference Between CloudWatch and CloudTrail?

CloudWatch

Performance Monitoring

CloudTrail

Audit Logging

Q36. What Metrics Do You Monitor?

  • CPU
  • Memory
  • Disk
  • Network
  • Error Rate
  • Latency

10. High Availability & DR

Q37. What is High Availability?

Answer:

Application remains operational during failures.

Methods:

  • Multi-AZ
  • Load Balancing
  • Auto Scaling

Q38. Explain RTO and RPO.

RTO

Recovery Time Objective

Maximum downtime allowed.

RPO

Recovery Point Objective

Maximum acceptable data loss.

Q39. Disaster Recovery Strategies?

Backup & Restore

Pilot Light

Warm Standby

Multi-Site Active-Active

11. Cost Optimization

Q40. How do you reduce cloud costs?

Methods:

  • Reserved Instances
  • Savings Plans
  • Spot Instances
  • Auto Scaling
  • S3 Lifecycle Policies
  • Right-Sizing

Q41. What are Spot Instances?

Answer:

Unused AWS capacity at discounted prices.

Savings:
Up to 90%

Used for:

  • Batch jobs
  • Data processing

12. Automation & Scripting

Q42. Which scripting languages do you use?

  • Python
  • Bash
  • PowerShell

Q43. Why is Python popular for Cloud Engineering?

Uses:

  • Automation
  • Lambda Functions
  • Infrastructure Management
  • API Integration

Libraries:

  • boto3
  • requests
  • pandas

13. Troubleshooting Scenarios

Q44. EC2 is unreachable. What will you check?

Steps:

  1. Security Group
  2. NACL
  3. Route Table
  4. Internet Gateway
  5. Instance Status
  6. SSH Key
  7. OS Firewall

Q45. Website is slow. How do you troubleshoot?

Check:

  • CPU
  • Memory
  • Network
  • Database
  • Load Balancer
  • Logs
  • Auto Scaling

Q46. S3 Access Denied Error?

Verify:

  • IAM Policy
  • Bucket Policy
  • SCP
  • KMS Permissions

14. System Design Questions

Q47. Design a Highly Available Web Application.

Architecture:

Users

Route53

ALB

Auto Scaling EC2

RDS Multi-AZ

S3

CloudFront

Q48. Design a 3-Tier Architecture.

Presentation Layer

ALB + EC2

Application Layer

Private EC2/EKS

Database Layer

RDS

Q49. Design a Secure AWS Environment.

Security Controls:

  • IAM Roles
  • MFA
  • KMS Encryption
  • Private Subnets
  • VPC Endpoints
  • Security Groups
  • CloudTrail
  • GuardDuty

Top 20 Questions Most Frequently Asked for Senior Cloud Engineer (10+ Years Experience)

  1. Explain VPC architecture in detail.
  2. Difference between Security Groups and NACLs.
  3. Design a multi-region disaster recovery solution.
  4. Explain Terraform state management.
  5. How would you secure AWS workloads?
  6. Explain Kubernetes architecture.
  7. Difference between ECS and EKS.
  8. Explain CI/CD pipeline design.
  9. How would you migrate an on-prem application to AWS?
  10. Explain Auto Scaling strategies.
  11. How would you reduce AWS costs by 30%?
  12. Explain S3 consistency model.
  13. Explain Route53 routing policies.
  14. Troubleshoot a failed deployment.
  15. Explain IAM cross-account access.
  16. Design a secure enterprise landing zone.
  17. Explain container security.
  18. Explain monitoring strategy for production workloads.
  19. Design a scalable microservices platform.
  20. Explain Well-Architected Framework pillars.

The 5 Questions That Almost Always Appear

  1. “Tell me about a cloud migration project you led.”
  2. “How would you secure a production AWS environment?”
  3. “Explain a Kubernetes issue you resolved.”
  4. “How do you optimize cloud costs?”
  5. “Design a highly available architecture for 1 million users.”

For a Cloud Engineer with 10–15 years of experience, mastery of AWS networking, IAM, Terraform, Kubernetes (EKS), CI/CD, monitoring, security, automation, and troubleshooting scenarios is typically the highest-priority interview focus.

1. General Cloud Computing Fundamentals

Q1: Explain the difference between IaaS, PaaS, and SaaS with examples. Answer:

  • IaaS (Infrastructure as a Service): Provides virtualized computing resources (servers, storage, networking). You manage OS, middleware, apps. Example: AWS EC2, Azure VMs, GCP Compute Engine.
  • PaaS (Platform as a Service): Provides a platform for developing, running, and managing applications. Abstracts infrastructure. Example: AWS Elastic Beanstalk, Azure App Service, Google App Engine.
  • SaaS (Software as a Service): Fully managed applications delivered over the internet. Example: Gmail, Salesforce, Microsoft 365.

Q2: What are the main cloud deployment models? Answer: Public (AWS, Azure), Private (on-prem cloud), Hybrid (combination), Multi-cloud (multiple public providers). Hybrid is common for legacy systems + cloud scalability and compliance.

Q3: Horizontal vs Vertical scaling? When to use each? Answer:

  • Horizontal: Add more instances (e.g., Auto Scaling Groups in AWS). Better for stateless apps, fault tolerance.
  • Vertical: Increase resources (CPU/RAM) on existing instance. Simpler but has limits and downtime risk. Use horizontal for web apps; vertical for databases with licensing constraints.

Q4: What is Infrastructure as Code (IaC)? Name tools and benefits. Answer: Managing infrastructure through code (declarative/imperative). Tools: Terraform (multi-cloud), AWS CloudFormation, Azure ARM/Bicep, Ansible. Benefits: Version control, repeatability, reduced errors, faster deployments.

2. AWS-Specific Questions (Most Common Provider)

Q5: Explain VPC and key components for a 3-tier app. Answer: Virtual Private Cloud is an isolated network. For a 3-tier app:

  • Public subnet: Load Balancer (ALB/NLB).
  • Private subnets: App tier (EC2/ASG) and DB tier (RDS).
  • NAT Gateway (or NAT Instance) for outbound internet from private subnets.
  • Security Groups (stateful) + NACLs (stateless).
  • Internet Gateway for inbound. Use VPC Endpoints for private AWS service access (S3, DynamoDB).

Q6: How do you design a highly available and scalable web app on AWS? Answer: Multi-AZ deployment, Auto Scaling Groups + ALB, RDS Multi-AZ or Aurora, S3 + CloudFront for static content, Route 53 for DNS, WAF for security. Use EC2 Spot/Reserved Instances for cost. Monitor with CloudWatch + X-Ray.

Q7: What is S3? Explain storage classes and lifecycle policies. Answer: Object storage. Classes: Standard (frequent access), Intelligent-Tiering, Standard-IA, Glacier (cold), Deep Archive. Use lifecycle policies to transition/move/delete objects automatically for cost optimization.

Q8: How do you secure an S3 bucket? Answer: Block public access, use IAM policies/bucket policies (least privilege), KMS encryption, versioning, MFA Delete, access logs, Macie for sensitive data discovery.

Q9: Explain IAM and how you implement least privilege. Answer: Identity and Access Management. Use IAM roles (for EC2, Lambda) instead of users. Policies: JSON with Effect, Action, Resource. Start with AWS managed policies, then custom. Use IAM Access Analyzer. Rotate credentials, use SSO (IAM Identity Center).

Q10: Troubleshooting: EC2 in private subnet can’t reach internet. Answer: Check route tables (0.0.0.0/0 to NAT), NAT Gateway in public subnet, Security Groups outbound rules, instance private IP. Alternatives: VPC Endpoints or NAT Instance.

Other key AWS topics: Lambda (serverless), EKS (Kubernetes), RDS/ Aurora, CloudFront, API Gateway, ECS/Fargate, CloudTrail, Config, Security Hub, Well-Architected Framework (Reliability, Security, Cost pillars).

3. Azure-Specific Questions

Q11: What is Azure Resource Manager (ARM) and why use it? Answer: Deployment and management service. Allows templating (JSON/ Bicep) for consistent, idempotent deployments. Supports RBAC, tags, policy.

Q12: Difference between Azure VMs and App Services? Answer: VMs = IaaS (full control, manage OS). App Services = PaaS (managed hosting for web/apps, auto-scale, easier CI/CD).

Q13: Explain Azure Virtual Network (VNet) and security features. Answer: Isolated network. Subnets, NSGs (Network Security Groups), Application Security Groups, Azure Firewall, Private Endpoints, Service Endpoints. Use VNet peering for connectivity.

Key Azure services: AKS, Azure Functions, Cosmos DB, Blob Storage, Key Vault, Azure AD (Entra ID), Logic Apps, Data Factory.

4. GCP-Specific Questions

Q14: What is a VPC in GCP? Key components. Answer: Global resource with regional subnets. Uses firewall rules (not NACLs). Routes, Cloud NAT, Shared VPC, VPC Peering. Compute Engine, GKE, Cloud Storage, BigQuery, Cloud Load Balancing (global/regional).

Q15: Explain GKE and when to use it. Answer: Managed Kubernetes. Use for containerized workloads needing orchestration. Features: Autopilot (fully managed), auto-scaling, binary authorization.

Other GCP: Cloud Run (serverless containers), Firestore, Pub/Sub, Operations Suite (monitoring).

5. IaC, DevOps & Containers

Q16: How do you structure Terraform for multi-environment (dev/staging/prod)? Answer: Monorepo with modules (networking, compute, database). Use workspaces or separate state files per env. Variables files (.tfvars), remote backend (S3 + DynamoDB lock). Terragrunt for DRY. Handle state drift with terraform refresh/plan.

Q17: Design a CI/CD pipeline for a containerized app. Answer: Source (GitHub) → Build/Test (GitHub Actions/Jenkins) → Scan (Trivy, SAST) → Push to ECR/ACR → Deploy (Terraform/Helm to EKS/AKS/GKE or Blue-Green/Canary with ArgoCD/Flagger). Use GitOps.

Q18: Kubernetes basics: Deployment vs StatefulSet vs DaemonSet. Answer: Deployment (stateless, replicas), StatefulSet (stateful, stable identity/storage), DaemonSet (one pod per node, e.g., logging agents).

6. Security, Compliance & Cost

Q19: How do you implement least privilege and manage secrets? Answer: Short-lived credentials, IAM roles, Secrets Manager/Key Vault, just-in-time access. Tools: HashiCorp Vault. Rotate regularly. Use service accounts.

Q20: How do you handle a misconfigured S3 bucket or data breach? Answer: Immediate: Revoke access, enable encryption. Investigate (CloudTrail), notify (if PII), remediate with policy. Prevent with Config rules, GuardDuty, automated remediation (Lambda).

Q21: Cost optimization strategies. Answer: Rightsizing, Reserved/Savings Plans, Spot Instances, Auto Scaling, Storage tiering, tagging + Cost Explorer/Budgets, Serverless where possible.

7. Scenario-Based & Troubleshooting (High Priority)

Q22: Migrate on-prem app to cloud with minimal downtime. Answer: Assessment (AWS Migration Hub/ Azure Migrate). Lift-and-shift (VM Import), then refactor. Use DMS for DB, replication (DMS/SRS), cutover with Route 53 weighted routing or Azure Traffic Manager.

Q23: App has uneven load balancing or scaling issues. Answer: Check health checks, sticky sessions, target groups. Ensure stateless design. Monitor metrics. For EKS: HPA + Cluster Autoscaler.

Q24: Production outage due to change. How do you handle rollback? Answer: Use IaC (Terraform rollback), Blue-Green deployments, feature flags, backups/snapshots. Post-incident: RCA, blameless review, improved testing/gates.

8. Behavioral Questions

Q25: Tell me about a complex cloud project/challenge. Answer: Use STAR (Situation, Task, Action, Result). Example: Migrated legacy monolith to microservices on EKS, reduced costs 40%, improved scalability.

Emphasize collaboration, learning new tech quickly, documentation, and business impact.

Preparation Tips (Highest Priority)

  • Hands-on: Build projects (e.g., 3-tier app, multi-env Terraform repo). Use free tiers.
  • Certifications: AWS Solutions Architect Associate, Azure AZ-400/AZ-305, Google Professional Cloud Engineer.
  • Practice: LeetCode-style scenarios, draw diagrams (Lucidchart), explain aloud.
  • 2026 Trends: AI/ML integration (SageMaker, Vertex AI), FinOps, zero-trust security, GitOps, sustainability.
  • Review Well-Architected Frameworks for each cloud.

This covers ~80-90% of common questions. Tailor answers to the job description (e.g., heavy AWS vs multi-cloud). Good luck! Practice scenarios with real diagrams.

This is a comprehensive guide to Cloud Engineer interview questions, categorized by difficulty and domain. These are the highest priority questions—those most likely to appear in FAANG, startups, and enterprise cloud roles.

Part 1: Foundational Concepts (Every candidate must know)

Q1: What happens when you type https://www.example.com in a browser? (Explain with cloud in mind)

Answer:

  1. DNS Resolution → Route 53 (AWS) / Cloud DNS (GCP) / Azure DNS resolves example.com to an IP (often a Cloud Load Balancer).
  2. TCP/TLS Handshake → Cloud Firewall (Security Group/NSG) allows port 443.
  3. Load Balancer (ALB/GLB/Azure Front Door) receives request → checks health of backend servers.
  4. WAF inspects request for threats.
  5. Reverse Proxy (Nginx on EC2/VM) or Serverless (Lambda/Cloud Function) generates response.
  6. Caching (CloudFront/Cloud CDN) serves static assets.
  7. Response traverses back.

Q2: Explain the difference between IaaS, PaaS, and SaaS with examples.

ModelExampleYour ControlCloud Manages
IaaSAWS EC2, GCP Compute Engine, Azure VMOS, middleware, appsVirtualization, storage, network
PaaSAWS Elastic Beanstalk, GCP App Engine, Azure App ServiceCode & dataRuntime, OS, servers
SaaSGmail, Salesforce, Office 365Only user data & settingsEverything else

Q3: What is the Shared Responsibility Model?

Answer: Cloud provider secures the cloud (physical hardware, network, hypervisor). Customer secures what’s in the cloud (OS, firewall rules, IAM, data encryption, application code).
Example: AWS secures the data center; you secure your S3 bucket permissions.

Q4: Explain High Availability (HA), Fault Tolerance (FT), and Disaster Recovery (DR).

  • HA: Minimizes downtime (e.g., 99.9% → 8.76 hrs/year downtime). Auto-scaling group + load balancer across 2+ AZs.
  • FT: Zero downtime (99.999%). Active-Active clusters, synchronous replication.
  • DR: Restore after disaster. RTO (Recovery Time Objective) & RPO (Recovery Point Objective). Strategies: Backup & Restore, Pilot Light, Warm Standby, Multi-site.

Q5: What is a region, availability zone (AZ), and edge location?

  • Region: Geographic area (us-east-1).
  • Availability Zone: 1+ data centers, isolated but connected via low-latency links.
  • Edge Location: CDN endpoint (CloudFront) for caching.

Part 2: Cloud-Specific Deep Dive (AWS/Azure/GCP)

Q6: Explain AWS VPC vs Azure VNet vs GCP VPC.

Answer (unified):
All provide logically isolated networks.

  • AWS VPC: Subnets (public/private), Internet Gateway, NAT Gateway, Security Groups (stateful), NACLs (stateless).
  • Azure VNet: Subnets, NSGs, Azure Firewall, VNet Peering, Route Tables.
  • GCP VPC: Global (subnets per region), firewall rules (stateful), Cloud NAT.
    Key difference: GCP VPC is global by default; AWS/Azure are regional.

Q7: How do you securely store secrets (DB passwords, API keys)?

Answer: Use a dedicated secrets manager:

  • AWS: Secrets Manager + KMS encryption + IAM policies.
  • Azure: Key Vault with RBAC.
  • GCP: Secret Manager + CMEK.
    Never hardcode secrets; inject via environment variables or use IAM roles for EC2/K8s service accounts.

Q8: Design a cost-optimized 3-tier web app on AWS.

Answer:

  • Web tier: Spot instances + Auto Scaling (min 2) + ALB.
  • App tier: Reserved Instances (1-year) + ASG across 2 AZs.
  • DB tier: RDS with Multi-AZ (only for prod) or Aurora Serverless for dev.
  • Storage: S3 + lifecycle policies (move to Glacier after 30 days).
  • CDN: CloudFront with origin shield.
  • Cost tools: AWS Budgets, Compute Optimizer, Trusted Advisor.

Q9: What is Infrastructure as Code (IaC)? Compare Terraform vs CloudFormation vs ARM/Bicep.

Answer: IaC manages infrastructure via declarative config files.

ToolCloudLanguageState MgmtDrift Detection
TerraformMulti-cloudHCLYes (state file)Yes
CloudFormationAWS onlyJSON/YAMLManaged by AWSYes (drift detect)
ARM/BicepAzure onlyJSON or BicepManaged by AzurePartial

Best practice: Use Terraform for multi-cloud; CloudFormation for AWS-only shops.

Q10: Explain AWS IAM roles vs policies vs groups.

  • User: Individual person/application.
  • Group: Collection of users (same permissions).
  • Role: Assumable by AWS service or federated user (no long-term creds).
  • Policy: JSON document (Allow/Deny) defining permissions.
    Principle of least privilege + use IAM Roles for EC2 (never store keys on EC2).

Q11: How do you troubleshoot a “connection timeout” vs “connection refused” in cloud?

  • Timeout: Network path blocked (Security Group outbound, NACL, route table missing IGW/NAT, corporate firewall).
  • Refused: Service not listening (application crashed, wrong port, target group health check failed).

Q12: Explain S3 storage classes & when to use each.

  1. S3 Standard: Frequent access, low latency (analytics, websites).
  2. S3 Intelligent-Tiering: Unknown/changeable patterns (automatically moves).
  3. S3 Standard-IA: Infrequent access, but rapid when needed (backups).
  4. S3 One Zone-IA: Non-critical, recreate-able data.
  5. S3 Glacier Instant Retrieval: Long-term, need millisecond retrieval.
  6. S3 Glacier Flexible: Archived data, retrieval minutes to hours.
  7. S3 Glacier Deep Archive: Compliance, 12+ hour retrieval.

Part 3: Kubernetes & Containers (Critical for modern roles)

Q13: What is Kubernetes? Explain Pod, Service, Ingress, ConfigMap, Secret.

  • Pod: Smallest deployable unit (1+ containers sharing network/storage).
  • Service: Stable IP/DNS to access pods (ClusterIP, NodePort, LoadBalancer).
  • Ingress: HTTP/S routing (host/path based) to services.
  • ConfigMap: Non-sensitive config (env vars, config files).
  • Secret: Base64-encoded sensitive data (mount as volume or env).

Q14: Explain the difference between a deployment, statefulset, and daemonset.

  • Deployment: Stateless apps (web servers). Rolling updates, replicas.
  • StatefulSet: Stateful apps (DBs, Kafka). Stable network identity, ordered deployment.
  • DaemonSet: Runs exactly one pod per node (log collector, monitoring agent).

Q15: How do you secure a Kubernetes cluster?

  1. RBAC (Role-Based Access Control) – least privilege.
  2. Network Policies (zero-trust intra-cluster).
  3. Pod Security Standards (restrict privileged containers).
  4. Secrets encryption at rest (KMS).
  5. Image scanning (Trivy, Snyk).
  6. Update Kubernetes version regularly.
  7. Use OPA/Gatekeeper for policy as code.

Q16: Helm vs Kustomize – which and when?

  • Helm: Package manager (charts). Good for complex, parameterized apps, sharing publicly.
  • Kustomize: Native kubectl, no server-side component. Good for overlaying configs (dev/staging/prod) without templating complexity.
    Best practice: Helm for third-party apps (Prometheus, NGINX Ingress); Kustomize for internal apps with env-specific patches.

Part 4: Networking & Security (High priority for senior roles)

Q17: Explain TLS termination vs TLS passthrough.

  • Termination: Load balancer decrypts TLS, then forwards HTTP to backend (simplifies cert mgmt, but LB sees plaintext).
  • Passthrough: LB forwards encrypted traffic to backend (higher security, backend needs certs).
    Use termination for WAF/header inspection; passthrough for compliance.

Q18: What is a Web Application Firewall (WAF)? Give examples.

Answer: Filters HTTP/S traffic to block SQL injection, XSS, bad bots.

  • AWS: WAF on CloudFront/ALB.
  • Azure: WAF on Application Gateway/Front Door.
  • GCP: Cloud Armor.

Q19: Explain VPC peering vs Transit Gateway vs VPN vs Direct Connect.

  • VPC Peering: Connect two VPCs (no transitive routing).
  • Transit Gateway: Hub-and-spoke for many VPCs/on-prem (AWS).
  • VPN: Encrypted tunnel over internet (slower, cheaper).
  • Direct Connect: Dedicated private link to cloud (fast, expensive, low latency).

Q20: What is a DDoS attack and how do you mitigate in cloud?

Answer: Distributed Denial of Service. Mitigation:

  1. AWS Shield (Standard free, Advanced paid).
  2. Azure DDoS Protection (Basic/Standard).
  3. GCP Cloud Armor.
  4. Use WAF rate limitingAuto Scaling (absorb traffic), CloudFront (geographic distribution), and load balancers.
  5. Scatter public IPs across regions.

Part 5: Monitoring, Logging & Troubleshooting

Q21: Explain the three pillars of observability.

  1. Metrics (Prometheus, CloudWatch, Azure Monitor) – numeric time-series data.
  2. Logs (ELK, Loki, CloudWatch Logs) – discrete events.
  3. Traces (Jaeger, X-Ray) – request lifecycle across services.

Q22: Your website is slow. How do you troubleshoot in cloud?

  1. Check load balancer metrics (high latency, 5xx errors).
  2. Review CloudWatch/Stackdriver metrics (CPU, memory, RDS connections).
  3. Check CDN (CloudFront) – cache hit ratio.
  4. Look at DB slow query logs (RDS Performance Insights).
  5. Use tracing (X-Ray) to find bottleneck service.
  6. Check scaling – maybe Auto Scaling hasn’t kicked in.

Q23: What is a dead letter queue (DLQ) and why use it?

Answer: In SQS/SNS/EventBridge – a queue that captures messages that failed processing after max retries. Prevents data loss, allows later analysis & replay.
Example: Lambda processes SQS messages; on failure → send to DLQ → trigger alarm.

Part 6: Scenario-Based & Design Questions

Q24: Design a serverless API that handles 10K requests/sec.

Answer:

  • API Gateway (with caching, throttling, WAF).
  • Lambda (provisioned concurrency) or Lambda + SQS for async processing.
  • DynamoDB (global table, on-demand) or Aurora Serverless.
  • CloudFront for static assets.
  • Step Functions for orchestration.
  • Cost: Lambda reserved concurrency, DynamoDB auto-scaling.

Q25: How do you migrate 10 TB of on-prem data to cloud?

Options:

  • Online: AWS DataSync, Azure Migrate, or rsync over VPN/Direct Connect (slow for 10 TB).
  • Offline: AWS Snowball Edge, Azure Data Box, GCP Transfer Appliance (ship physical device).
  • Incremental: After initial offline sync, use DataSync for delta.

Q26: Your cloud bill doubled overnight. What do you do?

  1. Check Cost Explorer / AWS CUR – filter by service, region, tag.
  2. Look for unattached resources (EBS volumes, idle load balancers).
  3. Check data transfer (NAT gateway costs, cross-region replication).
  4. Review orphaned snapshots/AMIs.
  5. Enable S3 lifecycle policies (unexpected Glacier retrieval?).
  6. Set budget alerts for next time.

Part 7: DevOps & CI/CD (For cloud engineers)

Q27: Explain Blue/Green deployment vs Canary.

  • Blue/Green: Two identical environments. Switch router (load balancer) from blue (old) to green (new) – instant rollback.
  • Canary: Gradually route % traffic to new version (5% → 25% → 100%). Better for risk reduction, requires monitoring.

Q28: What is GitOps?

Answer: Git as single source of truth. Tools like ArgoCD or Flux reconcile live cluster state with Git manifests. Every change = pull request → automated sync. Benefits: audit trail, rollback by reverting commit, developer self-service.

Q29: How do you implement a CI/CD pipeline for a cloud app?

Example (AWS):

  1. Developer pushes to GitHub → CodeCommit.
  2. CodeBuild runs tests, builds Docker image → pushes to ECR.
  3. CodeDeploy or ArgoCD updates ECS/K8s.
  4. CloudWatch monitors.
    Alternative: GitHub Actions + Terraform Cloud.

Part 8: Advanced / Leadership Questions

Q30: What is eventual consistency? Give a cloud example.

Answer: In distributed systems, after an update, reads may return old data for a short time. Example: AWS S3 (new objects are read-after-write consistent, but overwrites/deletes are eventually consistent). DynamoDB global tables also eventually consistent.

Q31: Explain Chaos Engineering.

Answer: Experimentally inject failures (e.g., kill pods, network latency) to test system resilience. Tools: AWS FIS, Gremlin, Chaos Mesh. You start with “blast radius” small (one instance) then expand.

Q32: How would you reduce cloud costs by 30% without sacrificing performance?

  1. Rightsize EC2 (CloudWatch + Compute Optimizer).
  2. Purchase Savings Plans for steady workloads.
  3. Use Spot Instances for batch/non-prod.
  4. S3 Intelligent-Tiering + delete old versions.
  5. Remove unattached IPs/EBS volumes.
  6. Use Graviton processors (AWS) – 20% cheaper.
  7. Enable auto-scaling for DB read replicas.
  8. Monitor NAT gateway – consider VPC endpoints.

Quick Reference: How to Answer Any Cloud Question

Question TypeFramework
DesignStart with requirements (traffic, data, latency). Draw (in mind) LB → App → DB. Add HA, DR, security, cost.
Troubleshooting“I’d check metrics → logs → traces → recent changes → rollback if needed.”
ComparisonTable format (Pro/Con) or decision tree.
SecurityMention IAM, encryption (at rest/in transit), least privilege, audit logging.
Best PracticeReference well-architected framework (AWS/Azure/GCP).

Final Pro Tip:

For senior roles, be ready to whiteboard a small architecture and defend your choices (cost vs availability vs security). Always ask clarifying questions before answering.

🤞 Sign up for our newsletter!

We don’t spam! Read more in our privacy policy

Scroll to Top