Monitoring growing memcached

The engineers at Vinted began the year working on the improvement of the caching layer. We managed to significantly improve the performance of our Rails application and substantially reduce the number of database queries. But this additional load to our caching layer brought up some new challenges.

At Vinted we primarily use memcached for caching. Naturally, increasing the number of things we cache, also increased the load on our caching infrastructure. So much, in fact, that the memcached eviction count began to grow significantly, as did the memcached connection count, increasing by 30%. Despite our best efforts to “cache as much as possible”, performance gains were small.

Growth Factor

Previously, we had 30 memcached instances, each with 1G of memory, adding up to approximately 30G of memory for memcached cache. We increased memory from 1G to 4G per instance and so expected to have 4x growth (120G). But the real numbers were a bit different. Instead of 4G, we only had 3.5G per instance, with 500M of memory lost due to slab count and item size.

First, we tried memcached-tool to figure out what the sizes of items were. We found out that some slabs were just too big. Using the default slab configuration, we are able to insert an item of 1857 bytes into the chunk (1856-2320 bytes). The rest of memory is wasted (463 bytes). We decided to increase the growth factor to 1.07 to reflect our needs. After this change, we managed to claim back around 8-9G of memory for memcached. Here is the output from memcached-tool before changes:

...
  #  Item_Size  Max_age   Pages   Count   Full?  Evicted Evict_Time OOM
 12     1.2K        11s       1      53      no        0        0    0
 13     1.4K        11s       1     707      no        0        0    0
 14     1.8K        11s       2     800      no        0        0    0
...

and the output after changes:

...
  #  Item_Size  Max_age   Pages   Count   Full?  Evicted Evict_Time OOM
 36     1.3K        23s       1       7      no        0        0    0
 37     1.4K        13s       1      15      no        0        0    0
 38     1.5K        11s       2     907      no        0        0    0
 39     1.6K        12s       1     225      no        0        0    0
 40     1.7K        19s       1      13      no        0        0    0
...

You may notice that items are spread much better than before, thus saving the memory.

Connection Count

We handle memcached connection sharding and balancing them with twemproxy a.k.a. nutcracker. It’s not as easy to see actual connection count per instance or whether they are equally spread out across backends, because nutcracker uses persistent connections to memcached. So, instead we used systemtap to figure out how many queries are sent to which backend:

probe process("/usr/local/bin/nutcracker").function("server_connect")
{
        _sname = user_string($server->name->data);
        count[_sname]++;
}

probe timer.s(1)
{
        foreach(mc in count- limit 64) {
                printf("%s\t%7.5d\n", mc, count[mc]);
                total += count[mc];
        }
        printf("Total connections: %d\n", total);
        delete count;
        delete total;
}

With this script we can monitor how many queries are sent to each memcached instance as well as the total count of queries per second.

Errors

To see the error count per twemproxy we use:

$ telnet 127.0.0.1 22222 | grep -E "server_timedout|server_err"

But this output only gives us numbers without the following:

  • timestamps (at which backend had failed)
  • information on what was the actual error
  • how many bytes were sent and received.

After digging into twemproxy’s source we saw that we can catch what we need. Using systemtap, it is possible to extract these details:

probe process("/usr/local/bin/nutcracker").function("server_close")
{
        server_name = user_string(@cast($conn->owner, "server")->name->data);
        strerr = errno_str($conn->err);
        if($conn->err)
                printf("%s\t%s(%d)\t%s (r:%d s:%d)\n",
                        ctime(gettimeofday_s()),
                        server_name,
                        $conn,
                        strerr,
                        $conn->recv_bytes,
                        $conn->send_bytes);
}

We saw many errors like:

Mon Apr 13 14:49:40 2015        example-memcached.vinted.net:1101(27478416)   EINVAL (r:31126323 s:3213800)

So by having this in place we can catch what is set on the other side using sysdig. Collect:

$ sysdig -z -w memcached.dig.gz fd.cip=<twemproxy_ip> and fd.sport=1101

Analyze:

$ sysdig -r memcached.dig.gz evt.datetime contains "14:49:40"

After analyzing the results we figured out that EINVAL is set where item size is over 1MB. In this case, it was:

set 6f425180black_base.bank_codes 0 86700 2987402

It turned out that we were caching bank codes that were 2987402 bytes in size, so almost 3MB.

After improving our caching and doing these tweaks we were able to achieve a reduction of 100ms per request. There are many profiling and monitoring tools. What is important is to understand what you really need and how to use each one appropriately.