Kubernetes for MLOps
Master container orchestration for production ML workloads
Quick Start
- Read the conceptual guides in this folder
- Practice with labs in
../../../module-01/k8s/
Learn: docs/module-01/k8s/ → Theory and concepts
Do: module-01/k8s/ → Hands-on practiceLearning Objectives
By the end of this section, you will be able to:
- Understand why Kubernetes is essential for production MLOps workloads
- Set up a local Kubernetes development environment
- Deploy and manage containerized applications using Kubernetes
- Configure networking, storage, and load balancing
- Manage application configuration and secrets
- Use Helm for package management
- Deploy ML models as scalable microservices
Prerequisites
- Completion of Docker fundamentals
- Basic Linux command line familiarity
- Understanding of microservices architecture
- YAML syntax basics
Study Path
1. Overview
Read: Overview - Why Kubernetes for MLOps
2. Key Concepts
Core Objects:
Workloads:
Storage:
Configuration:
Network:
3. Architecture
Read: Architecture Overview
4. Helm
Read: Helm Package Manager
5. Monitoring
Read: Monitoring & Observability
Kubernetes Versions
This module uses Kubernetes v1.32 "Penelope" (current stable release as of 2025).
Tool Versions Used
| Tool | Version | Purpose |
|---|---|---|
| kubectl | v1.32.x | Kubernetes CLI |
| minikube | v1.37.0+ | Local K8s cluster |
| kind | v0.24.0+ | Docker-based K8s |
| helm | v3.16.x | Package manager |
Why This Module Matters
Docker is Great, But...
After completing Docker training, you can:
- Build and run containers locally
- Use Docker Compose for multi-container apps
- Share images via registries
Why You Need Kubernetes
Production Realities:
- What happens when a container fails?
- How do you scale to handle 10x traffic?
- How do you deploy without downtime?
- How do you manage secrets securely?
- How do you run across multiple nodes/servers?
Kubernetes Solves These By:
- Auto-restart failed containers
- Scale applications automatically
- Rolling updates with zero downtime
- Built-in secrets management
- Multi-node orchestration
MLOps-Specific Benefits
- Model Serving: Deploy ML models as scalable microservices
- Batch Jobs: Run training jobs as Kubernetes Jobs/CronJobs
- Resource Management: Allocate GPU/TPU resources for ML workloads
- A/B Testing: Run multiple model versions simultaneously
- Hybrid Cloud: Run on-prem, AWS EKS, GCP GKE, Azure AKS
Quick Reference
Essential kubectl Commands
bash
# Cluster info
kubectl cluster-info
kubectl version
# Pod operations
kubectl get pods
kubectl describe pod <pod-name>
kubectl logs <pod-name>
kubectl exec -it <pod-name> -- /bin/bash
# Deployment operations
kubectl get deployments
kubectl apply -f deployment.yaml
kubectl rollout status deployment/<name>
kubectl scale deployment/<name> --replicas=3
# Service operations
kubectl get services
kubectl port-forward <pod-name> 8080:80
# Debugging
kubectl logs <pod-name> --previous
kubectl describe pod <pod-name>
kubectl get eventsBest Practices Summary
- Always use declarative YAML - Don't use imperative commands
- Set resource requests/limits - Prevent resource starvation
- Use liveness and readiness probes - Enable self-healing
- Namespaces separation - Dev/staging/prod isolation
- Secrets management - Never commit secrets to git
- Health checks - Always define startup, readiness, liveness probes
- Rollback strategy - Keep deployment history
- Monitor everything - Metrics, logs, and traces
Common Pitfalls
| Pitfall | Why It's Bad | Solution |
|---|---|---|
| Running as root | Security risk | Use security contexts |
| No resource limits | Noisy neighbors | Set requests/limits |
| :latest tag | Unpredictable updates | Pin specific versions |
| Hardcoded config | Not portable | Use ConfigMaps/Secrets |
| Monolithic pods | Poor scaling | One container per pod |
| Ignoring probes | Failed pods restart forever | Add health checks |
Additional Resources
Official Documentation
Books
- Mastering Kubernetes (4th Edition) - Our primary reference
- Kubernetes Up & Running
Communities
Next Steps
After completing this section:
- Practice with real-world scenarios in
../../../module-01/k8s/ - Deploy an ML model as a Kubernetes service
- Explore Kubernetes monitoring and observability
- Learn about GitOps with ArgoCD/Flux
Practice Labs: ../../../module-01/k8s/
Return to: Module 1 | Study Guide