So you need Redis running on Kubernetes, and you want it to be production-ready - meaning it should survive pod restarts, handle failovers gracefully, and keep your data safe. You've come to the right place! For teams managing databases on Kubernetes, see our PostgreSQL HA guide for similar patterns.
In this tutorial, we'll walk through setting up a Redis Cluster on Kubernetes step by step. I'll explain everything in plain English, show you real YAML you can use, and help you avoid the common pitfalls that trip people up. By the end, you'll have a Redis Cluster that's ready for production workloads. Understanding StatefulSets is essential for stateful applications like Redis.
What is Redis Cluster? (The Simple Explanation)
Before we dive into Kubernetes, let's make sure we're on the same page about what Redis Cluster actually is. Think of it like a team of Redis servers working together.
Masters and Replicas
In a Redis Cluster, you have:
- Master nodes: These handle read and write operations. They're the "workers" that do the actual work.
- Replica nodes: These are backups of masters. They copy data from masters and can take over if a master fails.
For a production setup, you typically want at least 3 masters (for high availability) and 1 replica per master (so 3 masters + 3 replicas = 6 nodes total). This way, if one master dies, its replica can step in immediately. Similar high availability patterns apply to other stateful services.
Hash Slots (The Data Distribution Magic)
Redis Cluster splits your data into 16,384 "slots." Each master is responsible for a range of these slots. When you store a key like user:123, Redis calculates which slot it belongs to (using a hash function) and routes it to the correct master.
This means your data is automatically distributed across multiple nodes, which gives you both performance (parallel processing) and resilience (if one node fails, you only lose access to that portion of data temporarily). For scaling strategies, see our case studies on distributed systems.
Failover (When Things Go Wrong)
If a master node dies, Redis Cluster automatically promotes its replica to become the new master. This happens within seconds, and your application can keep working (though you might see a brief hiccup). This is called "automatic failover."
Why Kubernetes? (And Why StatefulSets?)
Kubernetes is perfect for running Redis Cluster because it handles all the hard stuff: scheduling pods, managing storage, handling network connectivity, and restarting failed containers. But you need to use the right Kubernetes resources. For comprehensive Kubernetes guides, see our technical deep dives.
StatefulSet vs. Deployment
You might be wondering: "Why not just use a regular Deployment?" Here's the key difference:
- Deployment: Pods are interchangeable. If pod-0 dies, Kubernetes might create a new pod with a different name. This is fine for stateless apps, but Redis needs stable identities.
- StatefulSet: Pods have stable, predictable names (redis-0, redis-1, redis-2) and stable network identities. When a pod restarts, it keeps the same name and can reconnect to the cluster using its identity.
StatefulSets also give you:
- Ordered deployment: Pods start one at a time (redis-0, then redis-1, then redis-2), which is important for cluster initialization.
- Stable storage: Each pod gets its own PersistentVolume that follows it around, even if the pod moves to a different node.
- Ordered termination: Pods shut down in reverse order, which helps with graceful shutdowns.
Step 1: Create a Namespace
Let's start simple. Create a namespace to keep our Redis resources organized:
apiVersion: v1
kind: Namespace
metadata:
name: redis-cluster
labels:
name: redis-clusterApply it:
kubectl apply -f namespace.yamlStep 2: Set Up Persistent Storage
Redis needs to store data somewhere that survives pod restarts. That's where PersistentVolumes come in. Think of them as external hard drives that Kubernetes attaches to your pods.
Understanding StorageClasses
A StorageClass tells Kubernetes what kind of storage to provision. Different cloud providers have different options:
- AWS (EKS):
gp3orgp2for general-purpose SSD storage - GCP (GKE):
standardorpremium-rwofor persistent disks - Azure (AKS):
managed-premiumormanaged-standard - Local/minikube:
standardorhostpath
Check what StorageClasses are available in your cluster:
kubectl get storageclassFor production, you'll want SSD-based storage (like gp3 on AWS) for better performance. For development, the default storage class is usually fine.
ReadWriteOnce (RWO) access mode. Each Redis pod needs its own volume, so they can't share storage. Some storage classes only support ReadWriteMany (RWX), which won't work for Redis.Step 3: Create a ConfigMap for Redis Configuration
A ConfigMap lets you store Redis configuration in Kubernetes instead of baking it into the container image. This makes it easy to change settings without rebuilding images.
Here's a production-ready Redis configuration:
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-cluster-config
namespace: redis-cluster
data:
redis.conf: |
# Network
bind 0.0.0.0
protected-mode yes
port 6379
# Cluster mode
cluster-enabled yes
cluster-config-file /data/nodes.conf
cluster-node-timeout 5000
cluster-announce-ip ${POD_IP}
cluster-announce-port 6379
cluster-announce-bus-port 16379
# Persistence
appendonly yes
appendfsync everysec
save ""
# Memory
maxmemory 2gb
maxmemory-policy allkeys-lru
# Logging
loglevel notice
# Security (we'll add AUTH later)
# requirepass your-strong-password-hereLet me explain the important parts:
cluster-enabled yes: This turns on cluster mode. Without this, Redis runs in standalone mode.cluster-config-file: Where Redis stores cluster topology info. This file is managed by Redis automatically.cluster-announce-ip: This tells other nodes how to reach this pod. We use${POD_IP}which we'll set as an environment variable.appendonly yes: Enables AOF (Append Only File) persistence, which logs every write operation. This is safer than RDB snapshots for production.maxmemory: Sets a memory limit. Adjust this based on your pod's memory requests/limits.
maxmemory to about 75% of your pod's memory limit. If your pod has 4GB RAM, set maxmemory to 3GB. This leaves room for Redis overhead and prevents OOM kills.Step 4: Create a Headless Service
A "headless" service (one without a cluster IP) gives each pod a stable DNS name. This is crucial for Redis Cluster because pods need to discover each other.
apiVersion: v1
kind: Service
metadata:
name: redis-cluster
namespace: redis-cluster
labels:
app: redis-cluster
spec:
type: ClusterIP
clusterIP: None # This makes it headless
ports:
- port: 6379
name: redis
targetPort: 6379
- port: 16379
name: cluster
targetPort: 16379
selector:
app: redis-clusterWith this service, each pod gets a DNS name like:
redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.localredis-cluster-1.redis-cluster.redis-cluster.svc.cluster.localredis-cluster-2.redis-cluster.redis-cluster.svc.cluster.local
Port 6379 is for regular Redis operations, and port 16379 is for cluster bus communication (gossip protocol between nodes).
Step 5: Create the StatefulSet
Now for the main event - the StatefulSet. This is where we define our Redis pods. We'll create 6 pods total: 3 masters and 3 replicas.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis-cluster
namespace: redis-cluster
spec:
serviceName: redis-cluster
replicas: 6
selector:
matchLabels:
app: redis-cluster
template:
metadata:
labels:
app: redis-cluster
spec:
containers:
- name: redis
image: redis:7.2-alpine
ports:
- containerPort: 6379
name: redis
- containerPort: 16379
name: cluster
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
command:
- /bin/sh
- -c
- |
# Copy config and set permissions
cp /etc/redis/redis.conf /data/redis.conf
# Start Redis
exec redis-server /data/redis.conf
volumeMounts:
- name: data
mountPath: /data
- name: config
mountPath: /etc/redis
resources:
requests:
memory: "2Gi"
cpu: "500m"
limits:
memory: "4Gi"
cpu: "2000m"
livenessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
readinessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: gp3 # Change this to match your cluster
resources:
requests:
storage: 20GiLet me break down the important parts:
Environment Variables
POD_IP is injected by Kubernetes and tells Redis what IP address to advertise to other cluster members. This is critical for cluster discovery.
Command Override
We override the default Redis command to:
- Copy the config from the ConfigMap to
/data(where Redis expects it) - Start Redis with our custom config
Volume Claims
volumeClaimTemplates creates a PersistentVolumeClaim for each pod automatically. Each pod gets its own 20GB volume that persists even if the pod is deleted and recreated.
Resource Limits
We set memory and CPU requests/limits to prevent Redis from consuming all cluster resources. Adjust these based on your workload:
- Memory: Redis is memory-intensive. 2-4GB per pod is typical for production.
- CPU: Redis is generally CPU-light unless you're doing heavy operations. 500m-2000m is usually sufficient.
Health Probes
Liveness and readiness probes help Kubernetes know when a pod is healthy:
- Liveness probe: If this fails, Kubernetes restarts the pod.
- Readiness probe: If this fails, Kubernetes removes the pod from service (stops sending traffic to it).
Apply all the resources:
kubectl apply -f configmap.yaml
kubectl apply -f service.yaml
kubectl apply -f statefulset.yamlWatch the pods come up:
kubectl get pods -n redis-cluster -wYou should see 6 pods starting up one by one (that's the StatefulSet's ordered deployment in action).
Step 6: Initialize the Cluster
Once all pods are running, you need to tell Redis to form a cluster. The pods are running, but they're not connected to each other yet. We'll use redis-cli to initialize the cluster.
First, let's create a simple script to do this. We'll run it from one of the pods:
# Get into one of the Redis pods
kubectl exec -it redis-cluster-0 -n redis-cluster -- sh
# Inside the pod, run this command to initialize the cluster
redis-cli --cluster create \
redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-1.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-2.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-3.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-4.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-5.redis-cluster.redis-cluster.svc.cluster.local:6379 \
--cluster-replicas 1The --cluster-replicas 1 flag tells Redis to create 1 replica for each master. So with 6 nodes, you get 3 masters and 3 replicas.
Redis will ask you to confirm the slot distribution. Type yes and press Enter.
apiVersion: batch/v1
kind: Job
metadata:
name: redis-cluster-init
namespace: redis-cluster
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: init
image: redis:7.2-alpine
command:
- /bin/sh
- -c
- |
sleep 10
redis-cli --cluster create \
redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-1.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-2.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-3.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-4.redis-cluster.redis-cluster.svc.cluster.local:6379 \
redis-cluster-5.redis-cluster.redis-cluster.svc.cluster.local:6379 \
--cluster-replicas 1 \
--cluster-yesThe --cluster-yes flag auto-confirms the slot distribution, so you don't need to type "yes" manually.
Step 7: Verify the Cluster
Let's check that everything is working:
# Get cluster info
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli cluster info
# Check cluster nodes
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli cluster nodes
# Test writing and reading data
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli -c set mykey "Hello Redis Cluster"
kubectl exec -it redis-cluster-0 -n redis-cluster -- redis-cli -c get mykeyThe -c flag tells redis-cli to run in cluster mode, which handles automatic redirection if your key is on a different node.
You should see output like this from cluster nodes:

This shows 3 masters (each handling a range of slots) and 3 replicas (slaves) backing them up.
Step 8: Understanding Failover and Pod Rescheduling
One of the coolest things about running Redis Cluster on Kubernetes is how gracefully it handles failures. Let's see what happens when things go wrong.
What Happens When a Pod Dies?
When a master pod fails (crashes, OOM kill, node failure, etc.), here's the sequence:
- Kubernetes detects the failure: The liveness probe fails, or Kubernetes notices the pod is gone.
- Kubernetes reschedules the pod: A new pod with the same name (e.g.,
redis-cluster-0) is created on a healthy node. - Redis Cluster detects the missing master: Other nodes notice the master is gone (via the cluster bus/gossip protocol).
- Automatic failover: The replica of the failed master is promoted to master automatically.
- New pod rejoins: When the new pod starts, it reads the cluster state from
nodes.confand rejoins as a replica of the new master.
This all happens automatically! Your application might see a brief connection error, but it should reconnect and keep working.
kubectl delete pod redis-cluster-0 -n redis-cluster. Watch the cluster recover using redis-cli cluster nodes.Persistent Storage Makes This Work
The key to seamless recovery is the PersistentVolume. When a pod is recreated:
- The new pod gets the same PersistentVolume (because StatefulSets use stable volume claims)
- The
nodes.conffile is still there with cluster topology info - Redis reads this file and knows how to rejoin the cluster
Without persistent storage, each pod restart would lose cluster state and you'd have to reinitialize the cluster every time.
Step 9: Adding Security (AUTH and ACLs)
Running Redis without authentication in production is like leaving your house unlocked. Let's fix that.
Create a Secret for the Password
First, create a Kubernetes Secret to store the Redis password:
# Generate a strong password (do this on your local machine)
openssl rand -base64 32
# Create the secret
kubectl create secret generic redis-auth \
--from-literal=password='your-generated-password-here' \
-n redis-clusterUpdate the ConfigMap
Now update your Redis config to require authentication. Modify the ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-cluster-config
namespace: redis-cluster
data:
redis.conf: |
#... (previous config)...
# Security
requirepass ${REDIS_PASSWORD}
masterauth ${REDIS_PASSWORD}requirepass sets the password clients need to connect. masterauth sets the password replicas use to authenticate with masters.
Update the StatefulSet
Add the password as an environment variable in your StatefulSet:
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: redis-auth
key: passwordAnd update the command to substitute the password:
command:
- /bin/sh
- -c
- |
# Substitute environment variables in config
envsubst < /etc/redis/redis.conf > /data/redis.conf
# Start Redis
exec redis-server /data/redis.confYou'll need to install envsubst in your container (it's part of gettext package), or use a custom Redis image that includes it.
envsubst, you can use a Redis init script that writes the password directly to the config file. Or use a custom entrypoint script that handles this.Using Redis ACLs (Access Control Lists)
For even better security, you can use Redis ACLs to create different users with different permissions:
# In your redis.conf
user default off
user app-user on >app-password ~* &* +@all
user read-only on >read-password ~* -@all +@read +@keyspaceThis creates:
- default user: Disabled (can't connect)
- app-user: Full access with password "app-password"
- read-only: Read-only access with password "read-password"
Then connect using: redis-cli -u redis://app-user:[email protected]:6379
Network Policies (Optional but Recommended)
Network policies restrict which pods can talk to Redis. This adds defense in depth:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: redis-cluster-netpol
namespace: redis-cluster
spec:
podSelector:
matchLabels:
app: redis-cluster
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: production # Only allow from production namespace
- podSelector:
matchLabels:
app: my-app # Or specific app pods
ports:
- protocol: TCP
port: 6379kubectl get networkpoliciesStep 10: Setting Up Monitoring
You can't fix what you can't see. Let's set up monitoring so you know what's happening in your Redis Cluster.
Install Redis Exporter
Redis Exporter is a sidecar container that exposes Redis metrics in Prometheus format. Update your StatefulSet to include it:
containers:
- name: redis
#... (existing redis container)...
- name: redis-exporter
image: oliver006/redis_exporter:latest
ports:
- containerPort: 9121
name: metrics
env:
- name: REDIS_ADDR
value: "redis://localhost:6379"
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: redis-auth
key: password
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "128Mi"
cpu: "100m"Create a Service for Metrics
Expose the metrics endpoint:
apiVersion: v1
kind: Service
metadata:
name: redis-cluster-metrics
namespace: redis-cluster
labels:
app: redis-cluster
spec:
type: ClusterIP
ports:
- port: 9121
name: metrics
targetPort: 9121
selector:
app: redis-clusterConfigure Prometheus Scraping
If you're using Prometheus Operator, create a ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: redis-cluster
namespace: redis-cluster
spec:
selector:
matchLabels:
app: redis-cluster
endpoints:
- port: metrics
interval: 30s
path: /metricsKey Metrics to Watch
Here are the important metrics you should alert on:
redis_up: Is Redis responding? (should be 1)redis_connected_clients: Number of connected clientsredis_used_memory: Memory usage (alert if approaching maxmemory)redis_keyspace_hits/redis_keyspace_misses: Cache hit ratioredis_rejected_connections_total: Connection rejections (indicates overload)redis_cluster_state: Cluster state (should be "ok")
Grafana Dashboard
Import a Redis dashboard into Grafana. There are several good ones on Grafana Labs:
- Dashboard ID 11835: "Redis Dashboard for Prometheus"
- Dashboard ID 763: "Redis"

These dashboards show you memory usage, command rates, latency, and cluster health at a glance.
Common Gotchas and Tips
Here are the things that trip people up (and how to avoid them):
1. DNS Resolution Issues
Problem: Pods can't find each other using DNS names.
Solution: Make sure you're using the full FQDN (Fully Qualified Domain Name) like redis-cluster-0.redis-cluster.redis-cluster.svc.cluster.local, or at least redis-cluster-0.redis-cluster. Short names like redis-cluster-0 might not work depending on your DNS setup.
2. Storage Class Mismatch
Problem: PVCs are stuck in "Pending" state.
Solution: Check that your StorageClass exists and supports RWO access mode. List storage classes: kubectl get storageclass
3. Cluster Initialization Timing
Problem: Trying to initialize the cluster before all pods are ready.
Solution: Wait for all pods to be in "Running" state with 1/1 ready" before running cluster init. Use: kubectl wait --for=condition=ready pod -l app=redis-cluster -n redis-cluster --timeout=300s
4. Memory Limits Too Low
Problem: Pods getting OOM killed.
Solution: Increase memory limits and adjust maxmemory in config. Remember: maxmemory should be ~75% of your pod's memory limit.
5. Forgetting to Use Cluster Mode in Clients
Problem: Application can't connect or gets MOVED errors.
Solution: Make sure your Redis client library supports cluster mode and is configured to use it. Most modern clients do (like redis-py-cluster for Python, ioredis for Node.js).
redis://redis-cluster-0.redis-cluster:6379Connecting Your Application
Once your Redis Cluster is running, here's how to connect from your application pods:
Connection String Format
Use the headless service DNS name. Most cluster-aware clients can discover all nodes from just one address:
# Python (redis-py-cluster)
from rediscluster import RedisCluster
startup_nodes = [{"host": "redis-cluster-0.redis-cluster", "port": "6379"}]
rc = RedisCluster(startup_nodes=startup_nodes, password="your-password", decode_responses=True)// Node.js (ioredis)
const Redis = require('ioredis');
const cluster = new Redis.Cluster([
{ host: 'redis-cluster-0.redis-cluster', port: 6379 }
], {
redisOptions: { password: 'your-password' }
});Wrapping Up
Congratulations! You now have a production-grade Redis Cluster running on Kubernetes. Let's recap what we built:
- ✅ 6-node Redis Cluster (3 masters + 3 replicas) with automatic failover
- ✅ Persistent storage that survives pod restarts
- ✅ Stable network identities using StatefulSets and headless services
- ✅ Security with authentication and optional network policies
- ✅ Monitoring with Prometheus and Grafana
This setup will handle:
- Pod failures and automatic recovery
- Node failures (Kubernetes reschedules pods)
- Data persistence across restarts
- High availability (cluster continues working if up to 1 master fails)
If you run into issues or want help optimizing your Redis setup, feel free to reach out. Happy clustering! 🚀