Chapter 20
Kubernetes Fundamentals for GenAI Deployment
Learning Objective
Learn how Kubernetes runs and scales containerized GenAI services.
What it means
Kubernetes is a container orchestration platform. It manages deployment, scaling, networking, self-healing, and rolling updates for containers. In production, it helps run GenAI APIs, agent services, retrieval services, caches, and supporting tools.
Why it matters
GenAI workloads may receive unpredictable traffic. Kubernetes can scale pods, restart failed services, route traffic, manage configuration, and support zero-downtime deployments.
Healthcare Example
During peak document-processing hours, the clinical summary API may need more replicas. Kubernetes can scale the API pods while keeping the endpoint stable.
Key Concepts
Cluster
The complete Kubernetes environment
Node
A machine running workloads
Pod
Smallest deployable unit containing one or more containers
Deployment
Manages replicas and rolling updates
Service
Stable network endpoint for pods
ConfigMap/Secret
Configuration and sensitive values
Code: Kubernetes Deployment & Service
apiVersion: apps/v1
kind: Deployment
metadata:
name: genai-api
spec:
replicas: 3
selector:
matchLabels:
app: genai-api
template:
metadata:
labels:
app: genai-api
spec:
containers:
- name: genai-api
image: healthcare-genai-api:1.0
ports:
- containerPort: 8000
---
apiVersion: v1
kind: Service
metadata:
name: genai-api-service
spec:
selector:
app: genai-api
ports:
- port: 80
targetPort: 8000Common Mistakes
- No resource limits.
- No health probes.
- Secrets stored in plain YAML.
- No autoscaling strategy.
- No rollback plan.
Interview Q&A
Q: What is a pod?
A: A pod is the smallest deployable unit in Kubernetes and usually runs one application container.
Q: Why Kubernetes for GenAI?
A: It provides scaling, self-healing, rolling updates, service discovery, and operational consistency for AI services.
Architect Takeaway
Kubernetes manages the operational life of containers: scaling, healing, networking, and deployment safety.