Failover with Redis Sentinel

September 3, 2015 by Donatas Abraitis

At Vinted, we use a data structures server Redis for many things including Resque, news feed, application, etc. We are not able to restart or upgrade Redis instances without having zero downtime. High availability is critical for us. Therefore, we decided to try database services like Redis Sentinel or Redis Cluster.

The first thing we did was test Redis Cluster. However, due to a lack of client-side software we decided not to go with this solution. Redis Cluster itself is stable, but it’s client-side is very basic and lacks advanced functionalities, such as pipelining, which we use.

Once we were finished with testing Redis Cluster, we moved onto Redis Sentinel. Redis Sentinel monitors slave servers and elects a new master if the quorum is satisfied. In our case, we tested it with 3 nodes (quorum=2). It is not worth going into details about Redis Sentinel, as the configuration is very simple.

We run multiple mini clusters, each one formed by one master and two slaves. This allows us to run as many instances inside one server (due to listening via different port numbers).

If we need to launch another cluster, we simply add the role redis-shards-<country> and Chef will automatically spawn what is needed.

The most interesting thing about Sentinel is that it writes the state into the configuration file. As a result this file cannot be overwritten. This means that Chef will regenerate these files if they do not exist.

Technical details

Failover

Every time Redis completes a failover, it calls sentinelStartFailover(). Sentinels exchange hello messages using Pub/Sub and update the last_pub_time variable.

So, let’s dig deeper into this. Here is the snippet (Systemtap) used to probe the user-space:

probe process("/usr/local/bin/redis-server").function("sentinelStartFailover")
{
        elapsed = gettimeofday_ms() - $master->last_pub_time;
        printf("%d.%03ds\n", (elapsed / 1000), (elapsed % 1000));
}

Manual failover using redis-cli took 0.835s, while failover with configured timeout took 5.843s.

Measuring how quickly manual failover can converge was crucial for us, as we care about latency. Failing fast is also integral, so it is important to adjust these timers to determine whether it is enough to perform manual failovers for maintenance, or if it is preferable to go with configured timeouts.

Migration process

Stop all sentinel instances, to avoid electing new master;
Make sure every redis instance is master;
Sentinel master node replicates from origin;
Sentinel slaves replicate from sentinel master;
After everything is in sync, stop syncing master from origin and start sentinel instances.

Monitoring

We monitor Redis instances using Redistop.rb.

We don’t use the built-in monitoring tool (redis-cli -p <port> monitor), because it is more intrusive (~12%) than our own. In addition, our own tool allows us to monitor how many requests we have per second per instance, sort by latency, sort by count, and see the most used keys and commands.

~$ ruby redistop.rb -R
Probing...Type CTRL+C to stop probing.

PID   REQ/S
2345
1025
785
757
519
462
204

Total:  6116 req/s

~$ ruby redistop.rb -F
Probing...Type CTRL+C to stop probing.

PID   COUNT LATENCY     CMD
925   <0.000023>  zrangebyscore
324   <0.000032>  zrangebyscore
293   <0.000033>  get
255   <0.000014>  get
252   <0.000017>  hget
249   <0.000015>  get
248   <0.000018>  hget
230   <0.000039>  hget
225   <0.000053>  zrangebyscore
179   <0.000018>  hget

~$ ruby redistop.rb -K
Probing...Type CTRL+C to stop probing.

COUNT KEY
get
hget
 zrangebyscore
 fr:ab_test_ids
 pl:ab_test_ids
 de_babies:ab_test_ids
 cz:ab_test_ids

Lessons Learned

Redis Cluster is a very cool service, but due to the immaturity of client-side we decided to postpone using it.
Redis Sentinel failover is implemented as expected. Manual failover works instantly.
Migration from standalone instance to Redis Sentinel is very simple.
Monitoring Redis instances became very easy for us as we can inspect the most interesting things.

Vinted Engineering

These are the voyages of code tailors that help create Vinted