Redis development guidelines

Redis instances

GitLab uses Redis for the following distinct purposes:

Caching (mostly via Rails.cache).
As a job processing queue with Sidekiq.
To manage the shared application state.
To store CI trace chunks.
As a Pub/Sub queue backend for ActionCable.
Rate limiting state storage.
Sessions.

In most environments (including the GDK), all of these point to the same Redis instance.

On GitLab.com, we use separate Redis instances. See the Redis SRE guide for more details on our setup.

Every application process is configured to use the same Redis servers, so they can be used for inter-process communication in cases where PostgreSQL is less appropriate. For example, transient state or data that is written much more often than it is read.

If Geo is enabled, each Geo node gets its own, independent Redis database.

We have development documentation on adding a new Redis instance.

Key naming

Redis is a flat namespace with no hierarchy, which means we must pay attention to key names to avoid collisions. Typically we use colon-separated elements to provide a semblance of structure at application level. An example might be projects:1:somekey.

Although we split our Redis usage by purpose into distinct categories, and those may map to separate Redis servers in a Highly Available configuration like GitLab.com, the default Omnibus and GDK setups share a single Redis server. This means that keys should always be globally unique across all categories.

It is usually better to use immutable identifiers - project ID rather than full path, for instance - in Redis key names. If full path is used, the key stops being consulted if the project is renamed. If the contents of the key are invalidated by a name change, it is better to include a hook that expires the entry, instead of relying on the key changing.

Multi-key commands

GitLab supports Redis Cluster only for the Redis rate-limiting type, introduced in epic 823.

This imposes an additional constraint on naming: where GitLab is performing operations that require several keys to be held on the same Redis server - for instance, diffing two sets held in Redis - the keys should ensure that by enclosing the changeable parts in curly braces. For example:

project:{1}:set_a
project:{1}:set_b
project:{2}:set_c

set_a and set_b are guaranteed to be held on the same Redis server, while set_c is not.

Currently, we validate this in the development and test environments with the RedisClusterValidator, which is enabled for the cache and shared_state Redis instances..

Developers are highly encouraged to use hash-tags where appropriate to facilitate future adoption of Redis Cluster in more Redis types. For example, the Namespace model uses hash-tags for its config cache keys.

Redis in structured logging

For GitLab Team Members: There are basic and advanced videos that show how you can work with the Redis structured logging fields on GitLab.com.

Our structured logging for web requests and Sidekiq jobs contains fields for the duration, call count, bytes written, and bytes read per Redis instance, along with a total for all Redis instances. For a particular request, this might look like:

Field	Value
`json.queue_duration_s`	0.01
`json.redis_cache_calls`	1
`json.redis_cache_duration_s`	0
`json.redis_cache_read_bytes`	109
`json.redis_cache_write_bytes`	49
`json.redis_calls`	2
`json.redis_duration_s`	0.001
`json.redis_read_bytes`	111
`json.redis_shared_state_calls`	1
`json.redis_shared_state_duration_s`	0
`json.redis_shared_state_read_bytes`	2
`json.redis_shared_state_write_bytes`	206
`json.redis_write_bytes`	255

As all of these fields are indexed, it is then straightforward to investigate Redis usage in production. For instance, to find the requests that read the most data from the cache, we can just sort by redis_cache_read_bytes in descending order.

The slow log

NOTE: There is a video showing how to see the slow log (GitLab internal) on GitLab.com

On GitLab.com, entries from the Redis slow log are available in the pubsub-redis-inf-gprd* index with the redis.slowlog tag. This shows commands that have taken a long time and may be a performance concern.

The fluent-plugin-redis-slowlog project is responsible for taking the slowlog entries from Redis and passing to Fluentd (and ultimately Elasticsearch).

Analyzing the entire keyspace

The Redis Keyspace Analyzer project contains tools for dumping the full key list and memory usage of a Redis instance, and then analyzing those lists while eliminating potentially sensitive data from the results. It can be used to find the most frequent key patterns, or those that use the most memory.

Currently this is not run automatically for the GitLab.com Redis instances, but is run manually on an as-needed basis.

N+1 calls problem

Introduced in spec/support/helpers/redis_commands/recorder.rb via f696f670

RedisCommands::Recorder is a tool for detecting Redis N+1 calls problem from tests.

Redis is often used for caching purposes. Usually, cache calls are lightweight and cannot generate enough load to affect the Redis instance. However, it is still possible to trigger expensive cache recalculations without knowing that. Use this tool to analyze Redis calls, and define expected limits for them.

Create a test

It is implemented as a ActiveSupport::Notifications instrumenter.

You can create a test that verifies that a testable code only makes a single Redis call:

it 'avoids N+1 Redis calls' do
  control = RedisCommands::Recorder.new { visit_page }

  expect(control.count).to eq(1)
end

or a test that verifies the number of specific Redis calls:

it 'avoids N+1 sadd Redis calls' do
  control = RedisCommands::Recorder.new { visit_page }

  expect(control.by_command(:sadd).count).to eq(1)
end

You can also provide a pattern to capture only specific Redis calls:

it 'avoids N+1 Redis calls to forks_count key' do
  control = RedisCommands::Recorder.new(pattern: 'forks_count') { visit_page }

  expect(control.count).to eq(1)
end

You also can use special matchers exceed_redis_calls_limit and exceed_redis_command_calls_limit to define an upper limit for a number of Redis calls:

it 'avoids N+1 Redis calls' do
  control = RedisCommands::Recorder.new { visit_page }

  expect(control).not_to exceed_redis_calls_limit(1)
end

it 'avoids N+1 sadd Redis calls' do
  control = RedisCommands::Recorder.new { visit_page }

  expect(control).not_to exceed_redis_command_calls_limit(:sadd, 1)
end

These tests can help to identify N+1 problems related to Redis calls, and make sure that the fix for them works as expected.

Utility classes

We have some extra classes to help with specific use cases. These are mostly for fine-grained control of Redis usage, so they wouldn't be used in combination with the Rails.cache wrapper: we'd either use Rails.cache or these classes and literal Redis commands.

We prefer using Rails.cache so we can reap the benefits of future optimizations done to Rails. Ruby objects are marshalled when written to Redis, so we must pay attention to store neither huge objects, nor untrusted user input.

Typically we would only use these classes when at least one of the following is true:

We want to manipulate data on a non-cache Redis instance.
Rails.cache does not support the operations we want to perform.

`Gitlab::Redis::{Cache,SharedState,Queues}`

These classes wrap the Redis instances (using Gitlab::Redis::Wrapper) to make it convenient to work with them directly. The typical use is to call .with on the class, which takes a block that yields the Redis connection. For example:

# Get the value of `key` from the shared state (persistent) Redis
Gitlab::Redis::SharedState.with { |redis| redis.get(key) }

# Check if `value` is a member of the set `key`
Gitlab::Redis::Cache.with { |redis| redis.sismember(key, value) }

`Gitlab::Redis::Boolean`

In Redis, every value is a string. Gitlab::Redis::Boolean makes sure that booleans are encoded and decoded consistently.

`Gitlab::Redis::HLL`

The Redis PFCOUNT, PFADD, and PFMERGE commands operate on HyperLogLogs, a data structure that allows estimating the number of unique elements with low memory usage. For more information, see HyperLogLogs in Redis.

Gitlab::Redis::HLL provides a convenient interface for adding and counting values in HyperLogLogs.

`Gitlab::SetCache`

For cases where we need to efficiently check the whether an item is in a group of items, we can use a Redis set. Gitlab::SetCache provides an #include? method that uses the SISMEMBER command, as well as #read to fetch all entries in the set.

This is used by the RepositorySetCache to provide a convenient way to use sets to cache repository data like branch names.

Background migration

Redis-based migrations involve using the SCAN command to scan the entire Redis instance for certain key patterns. For large Redis instances, the migration might exceed the time limit for regular or post-deployment migrations. RedisMigrationWorker performs long-running Redis migrations as a background migration.

To perform a background migration by creating a class:

module Gitlab
  module BackgroundMigration
    module Redis
      class BackfillCertainKey
        def perform(keys)
        # implement logic to clean up or backfill keys
        end

        def scan_match_pattern
        # define the match pattern for the `SCAN` command
        end

        def redis
        # define the exact Redis instance
        end
      end
    end
  end
end

To trigger the worker through a post-deployment migration:

class ExampleBackfill < Gitlab::Database::Migration[2.1]
  disable_ddl_transaction!

  MIGRATION='BackfillCertainKey'

  def up
    queue_redis_migration_job(MIGRATION)
  end
end