How to Set Up a Production-Grade Redis Cluster on Kubernetes: Complete Tutorial

So you need Redis running on Kubernetes, and you want it to be production-ready - meaning it should survive pod restarts, handle failovers gracefully, and keep your data safe. You've come to the right place! For teams managing databases on Kubernetes, see our PostgreSQL HA guide for similar patterns.

In this tutorial, we'll walk through setting up a Redis Cluster on Kubernetes step by step. I'll explain everything in plain English, show you real YAML you can use, and help you avoid the common pitfalls that trip people up. By the end, you'll have a Redis Cluster that's ready for production workloads. Understanding StatefulSets is essential for stateful applications like Redis.

What You'll Learn: We'll cover StatefulSets, persistent storage, failover handling, security best practices, and monitoring. This is a complete, production-focused setup - not just a quick demo.

What is Redis Cluster? (The Simple Explanation)

Before we dive into Kubernetes, let's make sure we're on the same page about what Redis Cluster actually is. Think of it like a team of Redis servers working together.

Masters and Replicas

In a Redis Cluster, you have:

Master nodes: These handle read and write operations. They're the "workers" that do the actual work.
Replica nodes: These are backups of masters. They copy data from masters and can take over if a master fails.

For a production setup, you typically want at least 3 masters (for high availability) and 1 replica per master (so 3 masters + 3 replicas = 6 nodes total). This way, if one master dies, its replica can step in immediately. Similar high availability patterns apply to other stateful services.

Hash Slots (The Data Distribution Magic)

Redis Cluster splits your data into 16,384 "slots." Each master is responsible for a range of these slots. When you store a key like user:123, Redis calculates which slot it belongs to (using a hash function) and routes it to the correct master.

Redis Cluster with 3 Masters (simplified): Master 1: Slots 0-5460 (handles ~33% of keys) Master 2: Slots 5461-10922 (handles ~33% of keys) Master 3: Slots 10923-16383 (handles ~33% of keys) Each master has 1 replica that mirrors its data.

This means your data is automatically distributed across multiple nodes, which gives you both performance (parallel processing) and resilience (if one node fails, you only lose access to that portion of data temporarily). For scaling strategies, see our case studies on distributed systems.

Failover (When Things Go Wrong)

If a master node dies, Redis Cluster automatically promotes its replica to become the new master. This happens within seconds, and your application can keep working (though you might see a brief hiccup). This is called "automatic failover."

💡 Pro Tip: Redis Cluster requires at least 3 master nodes to work properly. With only 2 masters, if one fails, the cluster can't reach a majority consensus and will stop accepting writes. Always use an odd number of masters (3, 5, or 7) for best results.

Why Kubernetes? (And Why StatefulSets?)

Kubernetes is perfect for running Redis Cluster because it handles all the hard stuff: scheduling pods, managing storage, handling network connectivity, and restarting failed containers. But you need to use the right Kubernetes resources. For comprehensive Kubernetes guides, see our technical deep dives.

StatefulSet vs. Deployment

You might be wondering: "Why not just use a regular Deployment?" Here's the key difference:

Deployment: Pods are interchangeable. If pod-0 dies, Kubernetes might create a new pod with a different name. This is fine for stateless apps, but Redis needs stable identities.
StatefulSet: Pods have stable, predictable names (redis-0, redis-1, redis-2) and stable network identities. When a pod restarts, it keeps the same name and can reconnect to the cluster using its identity.

StatefulSets also give you:

Ordered deployment: Pods start one at a time (redis-0, then redis-1, then redis-2), which is important for cluster initialization.
Stable storage: Each pod gets its own PersistentVolume that follows it around, even if the pod moves to a different node.
Ordered termination: Pods shut down in reverse order, which helps with graceful shutdowns.

Step 1: Create a Namespace

Let's start simple. Create a namespace to keep our Redis resources organized:

apiVersion: v1
kind: Namespace
metadata:
 name: redis-cluster
 labels:
 name: redis-cluster

Apply it:

kubectl apply -f namespace.yaml

💡 Why namespaces? Namespaces help you organize resources and apply policies (like network policies or resource quotas) to a group of related pods. It's like having separate folders for different projects.

Step 2: Set Up Persistent Storage

Redis needs to store data somewhere that survives pod restarts. That's where PersistentVolumes come in. Think of them as external hard drives that Kubernetes attaches to your pods.

Understanding StorageClasses

A StorageClass tells Kubernetes what kind of storage to provision. Different cloud providers have different options:

AWS (EKS): gp3 or gp2 for general-purpose SSD storage
GCP (GKE): standard or premium-rwo for persistent disks
Azure (AKS): managed-premium or managed-standard
Local/minikube: standard or hostpath

Check what StorageClasses are available in your cluster:

kubectl get storageclass

For production, you'll want SSD-based storage (like gp3 on AWS) for better performance. For development, the default storage class is usually fine.

⚠️ Important: Make sure your StorageClass supports ReadWriteOnce (RWO) access mode. Each Redis pod needs its own volume, so they can't share storage. Some storage classes only support ReadWriteMany (RWX), which won't work for Redis.

Step 3: Create a ConfigMap for Redis Configuration

A ConfigMap lets you store Redis configuration in Kubernetes instead of baking it into the container image. This makes it easy to change settings without rebuilding images.

Here's a production-ready Redis configuration:

apiVersion: v1
kind: ConfigMap
metadata:
 name: redis-cluster-config
 namespace: redis-cluster
data:
 redis.conf: |
 # Network
 bind 0.0.0.0
 protected-mode yes
 port 6379
 
 # Cluster mode
 cluster-enabled yes
 cluster-config-file /data/nodes.conf
 cluster-node-timeout 5000
 cluster-announce-ip ${POD_IP}
 cluster-announce-port 6379
 cluster-announce-bus-port 16379
 
 # Persistence
 appendonly yes
 appendfsync everysec
 save ""
 
 # Memory
 maxmemory 2gb
 maxmemory-policy allkeys-lru
 
 # Logging
 loglevel notice
 
 # Security (we'll add AUTH later)
 # requirepass your-strong-password-here

Let me explain the important parts:

cluster-enabled yes: This turns on cluster mode. Without this, Redis runs in standalone mode.
cluster-config-file: Where Redis stores cluster topology info. This file is managed by Redis automatically.
cluster-announce-ip: This tells other nodes how to reach this pod. We use ${POD_IP} which we'll set as an environment variable.
appendonly yes: Enables AOF (Append Only File) persistence, which logs every write operation. This is safer than RDB snapshots for production.
maxmemory: Sets a memory limit. Adjust this based on your pod's memory requests/limits.

💡 Memory Sizing: Set maxmemory to about 75% of your pod's memory limit. If your pod has 4GB RAM, set maxmemory to 3GB. This leaves room for Redis overhead and prevents OOM kills.

Step 4: Create a Headless Service

A "headless" service (one without a cluster IP) gives each pod a stable DNS name. This is crucial for Redis Cluster because pods need to discover each other.

apiVersion: v1
kind: Service
metadata:
 name: redis-cluster
 namespace: redis-cluster
 labels:
 app: redis-cluster
spec:
 type: ClusterIP
 clusterIP: None # This makes it headless
 ports:
 - port: 6379
 name: redis
 targetPort: 6379
 - port: 16379
 name: cluster
 targetPort: 16379
 selector:
 app: redis-cluster

With this service, each pod gets a DNS name like:

redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local
redis-cluster-1.redis-cluster.redis-cluster.svc.cluster.local
redis-cluster-2.redis-cluster.redis-cluster.svc.cluster.local

Port 6379 is for regular Redis operations, and port 16379 is for cluster bus communication (gossip protocol between nodes).

Step 5: Create the StatefulSet

Now for the main event - the StatefulSet. This is where we define our Redis pods. We'll create 6 pods total: 3 masters and 3 replicas.

apiVersion: apps/v1
kind: StatefulSet
metadata:
 name: redis-cluster
 namespace: redis-cluster
spec:
 serviceName: redis-cluster
 replicas: 6
 selector:
 matchLabels:
 app: redis-cluster
 template:
 metadata:
 labels:
 app: redis-cluster
 spec:
 containers:
 - name: redis
 image: redis:7.2-alpine
 ports:
 - containerPort: 6379
 name: redis
 - containerPort: 16379
 name: cluster
 env:
 - name: POD_IP
 valueFrom:
 fieldRef:
 fieldPath: status.podIP
 command:
 - /bin/sh
 - -c
 - |
 # Copy config and set permissions
 cp /etc/redis/redis.conf /data/redis.conf
 
 # Start Redis
 exec redis-server /data/redis.conf
 volumeMounts:
 - name: data
 mountPath: /data
 - name: config
 mountPath: /etc/redis
 resources:
 requests:
 memory: "2Gi"
 cpu: "500m"
 limits:
 memory: "4Gi"
 cpu: "2000m"
 livenessProbe:
 exec:
 command:
 - redis-cli
 - ping
 initialDelaySeconds: 30
 periodSeconds: 10
 timeoutSeconds: 5
 readinessProbe:
 exec:
 command:
 - redis-cli
 - ping
 initialDelaySeconds: 10
 periodSeconds: 5
 timeoutSeconds: 3
 volumeClaimTemplates:
 - metadata:
 name: data
 spec:
 accessModes: [ "ReadWriteOnce" ]
 storageClassName: gp3 # Change this to match your cluster
 resources:
 requests:
 storage: 20Gi

Let me break down the important parts:

Environment Variables

POD_IP is injected by Kubernetes and tells Redis what IP address to advertise to other cluster members. This is critical for cluster discovery.

Command Override

We override the default Redis command to:

Copy the config from the ConfigMap to /data (where Redis expects it)
Start Redis with our custom config

Volume Claims

volumeClaimTemplates creates a PersistentVolumeClaim for each pod automatically. Each pod gets its own 20GB volume that persists even if the pod is deleted and recreated.

⚠️ Storage Size: 20GB is a starting point. Adjust based on your data size. Remember: Redis Cluster distributes data across masters, so if you have 60GB of data and 3 masters, each master stores about 20GB.

Resource Limits

We set memory and CPU requests/limits to prevent Redis from consuming all cluster resources. Adjust these based on your workload:

Memory: Redis is memory-intensive. 2-4GB per pod is typical for production.
CPU: Redis is generally CPU-light unless you're doing heavy operations. 500m-2000m is usually sufficient.

Health Probes

Liveness and readiness probes help Kubernetes know when a pod is healthy:

Liveness probe: If this fails, Kubernetes restarts the pod.
Readiness probe: If this fails, Kubernetes removes the pod from service (stops sending traffic to it).

Apply all the resources:

kubectl apply -f configmap.yaml
kubectl apply -f service.yaml
kubectl apply -f statefulset.yaml

Watch the pods come up:

kubectl get pods -n redis-cluster -w

You should see 6 pods starting up one by one (that's the StatefulSet's ordered deployment in action).

Step 6: Initialize the Cluster

Once all pods are running, you need to tell Redis to form a cluster. The pods are running, but they're not connected to each other yet. We'll use redis-cli to initialize the cluster.

First, let's create a simple script to do this. We'll run it from one of the pods:

# Get into one of the Redis pods
kubectl exec -it redis-cluster-0 -n redis-cluster -- sh

# Inside the pod, run this command to initialize the cluster
redis-cli --cluster create \
 redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379 \
 redis-cluster-1.redis-cluster.redis-cluster.svc.cluster.local:6379 \
 redis-cluster-2.redis-cluster.redis-cluster.svc.cluster.local:6379 \
 redis-cluster-3.redis-cluster.redis-cluster.svc.cluster.local:6379 \
 redis-cluster-4.redis-cluster.redis-cluster.svc.cluster.local:6379 \
 redis-cluster-5.redis-cluster.redis-cluster.svc.cluster.local:6379 \
 --cluster-replicas 1

The --cluster-replicas 1 flag tells Redis to create 1 replica for each master. So with 6 nodes, you get 3 masters and 3 replicas.

Redis will ask you to confirm the slot distribution. Type yes and press Enter.

💡 Easier Way: If you want to automate this, you can create a Kubernetes Job that runs the cluster initialization. Here's a simple version:

apiVersion: batch/v1
kind: Job
metadata:
 name: redis-cluster-init
 namespace: redis-cluster
spec:
 template:
 spec:
 restartPolicy: OnFailure
 containers:
 - name: init
 image: redis:7.2-alpine
 command:
 - /bin/sh
 - -c
 - |
 sleep 10
 redis-cli --cluster create \
 redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379 \
 redis-cluster-1.redis-cluster.redis-cluster.svc.cluster.local:6379 \
 redis-cluster-2.redis-cluster.redis-cluster.svc.cluster.local:6379 \
 redis-cluster-3.redis-cluster.redis-cluster.svc.cluster.local:6379 \
 redis-cluster-4.redis-cluster.redis-cluster.svc.cluster.local:6379 \
 redis-cluster-5.redis-cluster.redis-cluster.svc.cluster.local:6379 \
 --cluster-replicas 1 \
 --cluster-yes

The --cluster-yes flag auto-confirms the slot distribution, so you don't need to type "yes" manually.

Step 7: Verify the Cluster

Let's check that everything is working:

# Get cluster info
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli cluster info

# Check cluster nodes
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli cluster nodes

# Test writing and reading data
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli -c set mykey "Hello Redis Cluster"
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli -c get mykey

The -c flag tells redis-cli to run in cluster mode, which handles automatic redirection if your key is on a different node.

You should see output like this from cluster nodes:

Redis cluster nodes status showing 3 master nodes and 3 replica nodes with their slot assignments, connection status, and cluster health

Redis cluster nodes status showing master and replica distribution with slot assignments

abc123... redis-cluster-0.redis-cluster...:6379@16379 master - 0 1234567890 0 connected 0-5460 def456... redis-cluster-1.redis-cluster...:6379@16379 master - 0 1234567890 1 connected 5461-10922 ghi789... redis-cluster-2.redis-cluster...:6379@16379 master - 0 1234567890 2 connected 10923-16383 jkl012... redis-cluster-3.redis-cluster...:6379@16379 slave abc123... 0 1234567890 0 connected mno345... redis-cluster-4.redis-cluster...:6379@16379 slave def456... 0 1234567890 1 connected pqr678... redis-cluster-5.redis-cluster...:6379@16379 slave ghi789... 0 1234567890 2 connected

This shows 3 masters (each handling a range of slots) and 3 replicas (slaves) backing them up.

Step 8: Understanding Failover and Pod Rescheduling

One of the coolest things about running Redis Cluster on Kubernetes is how gracefully it handles failures. Let's see what happens when things go wrong.

What Happens When a Pod Dies?

When a master pod fails (crashes, OOM kill, node failure, etc.), here's the sequence:

Kubernetes detects the failure: The liveness probe fails, or Kubernetes notices the pod is gone.
Kubernetes reschedules the pod: A new pod with the same name (e.g., redis-cluster-0) is created on a healthy node.
Redis Cluster detects the missing master: Other nodes notice the master is gone (via the cluster bus/gossip protocol).
Automatic failover: The replica of the failed master is promoted to master automatically.
New pod rejoins: When the new pod starts, it reads the cluster state from nodes.conf and rejoins as a replica of the new master.

This all happens automatically! Your application might see a brief connection error, but it should reconnect and keep working.

💡 Testing Failover: You can test this by deleting a master pod: kubectl delete pod redis-cluster-0 -n redis-cluster. Watch the cluster recover using redis-cli cluster nodes.

Persistent Storage Makes This Work

The key to seamless recovery is the PersistentVolume. When a pod is recreated:

The new pod gets the same PersistentVolume (because StatefulSets use stable volume claims)
The nodes.conf file is still there with cluster topology info
Redis reads this file and knows how to rejoin the cluster

Without persistent storage, each pod restart would lose cluster state and you'd have to reinitialize the cluster every time.

Step 9: Adding Security (AUTH and ACLs)

Running Redis without authentication in production is like leaving your house unlocked. Let's fix that.

Create a Secret for the Password

First, create a Kubernetes Secret to store the Redis password:

# Generate a strong password (do this on your local machine)
openssl rand -base64 32

# Create the secret
kubectl create secret generic redis-auth \
 --from-literal=password='your-generated-password-here' \
 -n redis-cluster

Update the ConfigMap

Now update your Redis config to require authentication. Modify the ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
 name: redis-cluster-config
 namespace: redis-cluster
data:
 redis.conf: |
 #... (previous config)...
 
 # Security
 requirepass ${REDIS_PASSWORD}
 masterauth ${REDIS_PASSWORD}

requirepass sets the password clients need to connect. masterauth sets the password replicas use to authenticate with masters.

Update the StatefulSet

Add the password as an environment variable in your StatefulSet:

env:
- name: POD_IP
 valueFrom:
 fieldRef:
 fieldPath: status.podIP
- name: REDIS_PASSWORD
 valueFrom:
 secretKeyRef:
 name: redis-auth
 key: password

And update the command to substitute the password:

command:
- /bin/sh
- -c
- |
 # Substitute environment variables in config
 envsubst < /etc/redis/redis.conf > /data/redis.conf
 
 # Start Redis
 exec redis-server /data/redis.conf

You'll need to install envsubst in your container (it's part of gettext package), or use a custom Redis image that includes it.

💡 Simpler Alternative: Instead of using envsubst, you can use a Redis init script that writes the password directly to the config file. Or use a custom entrypoint script that handles this.

Using Redis ACLs (Access Control Lists)

For even better security, you can use Redis ACLs to create different users with different permissions:

# In your redis.conf
user default off
user app-user on >app-password ~* &* +@all
user read-only on >read-password ~* -@all +@read +@keyspace

This creates:

default user: Disabled (can't connect)
app-user: Full access with password "app-password"
read-only: Read-only access with password "read-password"

Then connect using: redis-cli -u redis://app-user:[email protected]:6379

Network Policies (Optional but Recommended)

Network policies restrict which pods can talk to Redis. This adds defense in depth:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
 name: redis-cluster-netpol
 namespace: redis-cluster
spec:
 podSelector:
 matchLabels:
 app: redis-cluster
 policyTypes:
 - Ingress
 ingress:
 - from:
 - namespaceSelector:
 matchLabels:
 name: production # Only allow from production namespace
 - podSelector:
 matchLabels:
 app: my-app # Or specific app pods
 ports:
 - protocol: TCP
 port: 6379

⚠️ Network Policy Gotcha: Make sure your cluster's CNI (Container Network Interface) supports NetworkPolicies. Not all Kubernetes setups do (e.g., some basic setups don't). Check with: kubectl get networkpolicies

Step 10: Setting Up Monitoring

You can't fix what you can't see. Let's set up monitoring so you know what's happening in your Redis Cluster.

Install Redis Exporter

Redis Exporter is a sidecar container that exposes Redis metrics in Prometheus format. Update your StatefulSet to include it:

containers:
- name: redis
 #... (existing redis container)...
- name: redis-exporter
 image: oliver006/redis_exporter:latest
 ports:
 - containerPort: 9121
 name: metrics
 env:
 - name: REDIS_ADDR
 value: "redis://localhost:6379"
 - name: REDIS_PASSWORD
 valueFrom:
 secretKeyRef:
 name: redis-auth
 key: password
 resources:
 requests:
 memory: "64Mi"
 cpu: "50m"
 limits:
 memory: "128Mi"
 cpu: "100m"

Create a Service for Metrics

Expose the metrics endpoint:

apiVersion: v1
kind: Service
metadata:
 name: redis-cluster-metrics
 namespace: redis-cluster
 labels:
 app: redis-cluster
spec:
 type: ClusterIP
 ports:
 - port: 9121
 name: metrics
 targetPort: 9121
 selector:
 app: redis-cluster

Configure Prometheus Scraping

If you're using Prometheus Operator, create a ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
 name: redis-cluster
 namespace: redis-cluster
spec:
 selector:
 matchLabels:
 app: redis-cluster
 endpoints:
 - port: metrics
 interval: 30s
 path: /metrics

Key Metrics to Watch

Here are the important metrics you should alert on:

redis_up: Is Redis responding? (should be 1)
redis_connected_clients: Number of connected clients
redis_used_memory: Memory usage (alert if approaching maxmemory)
redis_keyspace_hits / redis_keyspace_misses: Cache hit ratio
redis_rejected_connections_total: Connection rejections (indicates overload)
redis_cluster_state: Cluster state (should be "ok")

Grafana Dashboard

Import a Redis dashboard into Grafana. There are several good ones on Grafana Labs:

Dashboard ID 11835: "Redis Dashboard for Prometheus"
Dashboard ID 763: "Redis"

Grafana dashboard for Redis Cluster showing memory usage, command rates, latency metrics, cluster health, connected clients, and key statistics

Grafana dashboard with Redis Cluster metrics including memory, latency, throughput, and cluster health

These dashboards show you memory usage, command rates, latency, and cluster health at a glance.

Common Gotchas and Tips

Here are the things that trip people up (and how to avoid them):

1. DNS Resolution Issues

Problem: Pods can't find each other using DNS names.
Solution: Make sure you're using the full FQDN (Fully Qualified Domain Name) like redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local, or at least redis-cluster-0.redis-cluster. Short names like redis-cluster-0 might not work depending on your DNS setup.

2. Storage Class Mismatch

Problem: PVCs are stuck in "Pending" state.
Solution: Check that your StorageClass exists and supports RWO access mode. List storage classes: kubectl get storageclass

3. Cluster Initialization Timing

Problem: Trying to initialize the cluster before all pods are ready.
Solution: Wait for all pods to be in "Running" state with 1/1 ready" before running cluster init. Use: kubectl wait --for=condition=ready pod -l app=redis-cluster -n redis-cluster --timeout=300s

4. Memory Limits Too Low

Problem: Pods getting OOM killed.
Solution: Increase memory limits and adjust maxmemory in config. Remember: maxmemory should be ~75% of your pod's memory limit.

5. Forgetting to Use Cluster Mode in Clients

Problem: Application can't connect or gets MOVED errors.
Solution: Make sure your Redis client library supports cluster mode and is configured to use it. Most modern clients do (like redis-py-cluster for Python, ioredis for Node.js).

💡 Client Connection String: For cluster-aware clients, you typically only need to provide one node's address. The client will discover the rest of the cluster automatically. Example: redis://redis-cluster-0.redis-cluster:6379

Connecting Your Application

Once your Redis Cluster is running, here's how to connect from your application pods:

Connection String Format

Use the headless service DNS name. Most cluster-aware clients can discover all nodes from just one address:

# Python (redis-py-cluster)
from rediscluster import RedisCluster
startup_nodes = [{"host": "redis-cluster-0.redis-cluster", "port": "6379"}]
rc = RedisCluster(startup_nodes=startup_nodes, password="your-password", decode_responses=True)

// Node.js (ioredis)
const Redis = require('ioredis');
const cluster = new Redis.Cluster([
 { host: 'redis-cluster-0.redis-cluster', port: 6379 }
], {
 redisOptions: { password: 'your-password' }
});

Wrapping Up

Congratulations! You now have a production-grade Redis Cluster running on Kubernetes. Let's recap what we built:

✅ 6-node Redis Cluster (3 masters + 3 replicas) with automatic failover
✅ Persistent storage that survives pod restarts
✅ Stable network identities using StatefulSets and headless services
✅ Security with authentication and optional network policies
✅ Monitoring with Prometheus and Grafana

This setup will handle:

Pod failures and automatic recovery
Node failures (Kubernetes reschedules pods)
Data persistence across restarts
High availability (cluster continues working if up to 1 master fails)

Next Steps: Consider setting up backups (RDB snapshots or AOF file backups), tuning memory policies based on your workload, and setting up alerts for critical metrics like memory usage and cluster state.

If you run into issues or want help optimizing your Redis setup, feel free to reach out. Happy clustering! 🚀

How to Set Up a Production-Grade Redis Cluster on Kubernetes