Kubernetes for MLOps

Master container orchestration for production ML workloads

Quick Start

Read the conceptual guides in this folder
Practice with labs in ../../../module-01/k8s/

Learn:  docs/module-01/k8s/       →  Theory and concepts
Do:     module-01/k8s/             →  Hands-on practice

Learning Objectives

By the end of this section, you will be able to:

Understand why Kubernetes is essential for production MLOps workloads
Set up a local Kubernetes development environment
Deploy and manage containerized applications using Kubernetes
Configure networking, storage, and load balancing
Manage application configuration and secrets
Use Helm for package management
Deploy ML models as scalable microservices

Prerequisites

Completion of Docker fundamentals
Basic Linux command line familiarity
Understanding of microservices architecture
YAML syntax basics

Study Path

1. Overview

Read: Overview - Why Kubernetes for MLOps

2. Key Concepts

Core Objects:

Workloads:

Storage:

Configuration:

Network:

Kubernetes Versions

This module uses Kubernetes v1.32 "Penelope" (current stable release as of 2025).

Tool Versions Used

Tool	Version	Purpose
kubectl	v1.32.x	Kubernetes CLI
minikube	v1.37.0+	Local K8s cluster
kind	v0.24.0+	Docker-based K8s
helm	v3.16.x	Package manager

Why This Module Matters

Docker is Great, But...

After completing Docker training, you can:

Build and run containers locally
Use Docker Compose for multi-container apps
Share images via registries

Why You Need Kubernetes

Production Realities:

What happens when a container fails?
How do you scale to handle 10x traffic?
How do you deploy without downtime?
How do you manage secrets securely?
How do you run across multiple nodes/servers?

Kubernetes Solves These By:

Auto-restart failed containers
Scale applications automatically
Rolling updates with zero downtime
Built-in secrets management
Multi-node orchestration

MLOps-Specific Benefits

Model Serving: Deploy ML models as scalable microservices
Batch Jobs: Run training jobs as Kubernetes Jobs/CronJobs
Resource Management: Allocate GPU/TPU resources for ML workloads
A/B Testing: Run multiple model versions simultaneously
Hybrid Cloud: Run on-prem, AWS EKS, GCP GKE, Azure AKS

Quick Reference

Essential kubectl Commands

bash

# Cluster info
kubectl cluster-info
kubectl version

# Pod operations
kubectl get pods
kubectl describe pod <pod-name>
kubectl logs <pod-name>
kubectl exec -it <pod-name> -- /bin/bash

# Deployment operations
kubectl get deployments
kubectl apply -f deployment.yaml
kubectl rollout status deployment/<name>
kubectl scale deployment/<name> --replicas=3

# Service operations
kubectl get services
kubectl port-forward <pod-name> 8080:80

# Debugging
kubectl logs <pod-name> --previous
kubectl describe pod <pod-name>
kubectl get events

Best Practices Summary

Always use declarative YAML - Don't use imperative commands
Set resource requests/limits - Prevent resource starvation
Use liveness and readiness probes - Enable self-healing
Namespaces separation - Dev/staging/prod isolation
Secrets management - Never commit secrets to git
Health checks - Always define startup, readiness, liveness probes
Rollback strategy - Keep deployment history
Monitor everything - Metrics, logs, and traces

Common Pitfalls

Pitfall	Why It's Bad	Solution
Running as root	Security risk	Use security contexts
No resource limits	Noisy neighbors	Set requests/limits
:latest tag	Unpredictable updates	Pin specific versions
Hardcoded config	Not portable	Use ConfigMaps/Secrets
Monolithic pods	Poor scaling	One container per pod
Ignoring probes	Failed pods restart forever	Add health checks

Additional Resources

Official Documentation

Books

Mastering Kubernetes (4th Edition) - Our primary reference
Kubernetes Up & Running

Communities

Next Steps

After completing this section:

Practice with real-world scenarios in ../../../module-01/k8s/
Deploy an ML model as a Kubernetes service
Explore Kubernetes monitoring and observability
Learn about GitOps with ArgoCD/Flux

Practice Labs: ../../../module-01/k8s/

Return to: Module 1 | Study Guide

Branching Strategies

LocalStack Labs

Key Concepts - Core Objects

Key Concepts - Workloads

Key Concepts - Storage

Key Concepts - Configuration

Key Concepts - Network

Architecture

GitHub Actions

Grafana

Grafana Mimir (Metrics)

Grafana Loki (Logs)

Grafana Tempo (Traces)

Grafana Pyroscope (Profiles)

Kubernetes for MLOps

Quick Start

Learning Objectives

Prerequisites

Study Path

1. Overview

2. Key Concepts

3. Architecture

4. Helm

5. Monitoring

Kubernetes Versions

Tool Versions Used

Why This Module Matters

Docker is Great, But...

Why You Need Kubernetes

MLOps-Specific Benefits

Quick Reference

Essential kubectl Commands

Best Practices Summary

Common Pitfalls

Additional Resources

Official Documentation

Books

Communities

Next Steps

Kubernetes for MLOps ​

Quick Start ​

Learning Objectives ​

Prerequisites ​

Study Path ​

1. Overview ​

2. Key Concepts ​

3. Architecture ​

4. Helm ​

5. Monitoring ​

Kubernetes Versions ​

Tool Versions Used ​

Why This Module Matters ​

Docker is Great, But... ​

Why You Need Kubernetes ​

MLOps-Specific Benefits ​

Quick Reference ​

Essential kubectl Commands ​

Best Practices Summary ​

Common Pitfalls ​

Additional Resources ​

Official Documentation ​

Books ​

Communities ​

Next Steps ​

Kubernetes for MLOps

Quick Start

Learning Objectives

Prerequisites

Study Path

1. Overview

2. Key Concepts

3. Architecture

4. Helm

5. Monitoring

Kubernetes Versions

Tool Versions Used

Why This Module Matters

Docker is Great, But...

Why You Need Kubernetes

MLOps-Specific Benefits

Quick Reference

Essential kubectl Commands

Best Practices Summary

Common Pitfalls

Additional Resources

Official Documentation

Books

Communities

Next Steps