GoofyCubes
20

Chapter 20

Kubernetes Fundamentals for GenAI Deployment

Learning Objective

Learn how Kubernetes runs and scales containerized GenAI services.

What it means

Kubernetes is a container orchestration platform. It manages deployment, scaling, networking, self-healing, and rolling updates for containers. In production, it helps run GenAI APIs, agent services, retrieval services, caches, and supporting tools.

Why it matters

GenAI workloads may receive unpredictable traffic. Kubernetes can scale pods, restart failed services, route traffic, manage configuration, and support zero-downtime deployments.

Healthcare Example

During peak document-processing hours, the clinical summary API may need more replicas. Kubernetes can scale the API pods while keeping the endpoint stable.

Key Concepts

Cluster

The complete Kubernetes environment

Node

A machine running workloads

Pod

Smallest deployable unit containing one or more containers

Deployment

Manages replicas and rolling updates

Service

Stable network endpoint for pods

ConfigMap/Secret

Configuration and sensitive values

Code: Kubernetes Deployment & Service

apiVersion: apps/v1
kind: Deployment
metadata:
  name: genai-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: genai-api
  template:
    metadata:
      labels:
        app: genai-api
    spec:
      containers:
        - name: genai-api
          image: healthcare-genai-api:1.0
          ports:
            - containerPort: 8000
---
apiVersion: v1
kind: Service
metadata:
  name: genai-api-service
spec:
  selector:
    app: genai-api
  ports:
    - port: 80
      targetPort: 8000

Common Mistakes

  • No resource limits.
  • No health probes.
  • Secrets stored in plain YAML.
  • No autoscaling strategy.
  • No rollback plan.

Interview Q&A

Q: What is a pod?

A: A pod is the smallest deployable unit in Kubernetes and usually runs one application container.

Q: Why Kubernetes for GenAI?

A: It provides scaling, self-healing, rolling updates, service discovery, and operational consistency for AI services.

Architect Takeaway

Kubernetes manages the operational life of containers: scaling, healing, networking, and deployment safety.