Production Architecture

Our core infrastructure is currently hosted on several cloud providers, all with different functions. This document does not cover servers that are not integral to the public facing operations of

On this page

Diagram of the Architecture

Source, GitLab internal use only

Infrastructure "Services" and Their SLx's

In order for us to reach the goals around availability and latency for, we started by setting a target internal SLA for the service as a whole from the user's perspective. From those targets, we can work backwards through the architecture to determine what the Service Level objectives should be for the infrastructure "services" that support

Since we are relying on hardware that itself only offers an SLA of 99.9% availability, we face an "SLA inversion" (read more about this here or here). For example, in the current situation, each time an NFS server goes down, this results in an outage of Since we are only guaranteed 99.9% uptime per NFS server, the maximum SLA for as a whole will be <= (99.9%)N, where N is the number of NFS servers. To overcome this, the service that is offered by the NFS servers either needs to be redesigned in some way (e.g. through using Gitaly), or the application that depends on it needs to have a way to not go down when the NFS service is unavailable (i.e. graceful degradation). Similar considerations apply to things such as the cache, background jobs processing, availability of the database, and so on.

To tackle this challenge, we consider the following elements of the infrastructure to be "services" that should be able to meet their own internal SLAs:

Internal Networking Scheme

A visualization of the whole address space can be found here (GitLab internal use only).


Virtual Network Name: GitLabProd

Resource Group: GitLabProd

IP space:

Subnet Name Subnet Range Tier Domain
ExternalLBProd Load balancers
InternalLBProd Load balancers
DBProd Databases
RedisProd Databases
ElasticSearchProd Databases
ConsulProd Support Services
VaultProd Support Services
DeployProd Support Services
LogProd Logging
APIProd Services
GitProd Services
SidekiqProd Services
WebProd Services
RegistryProd Services
StorageProd Storage


Virtual Network Name: GitLabCanary

Resource Group: GitLabCanary

IP space:

Subnet Name Subnet Range Tier Domain
ExternalLBCanary Load balancers
InternalLBCanary Load balancers
APICanary Services
GitCanary Services
SidekiqCanary Services
WebCanary Services
RegistryCanary Services


Virtual Network Name: GitLabStaging

Resource Group: GitLabStaging

IP space:

Resource Group Subnet Range Tier Domain
ExternalLBStaging Load balancers
InternalLBStaging Load balancers
DBStaging Databases
RedisStaging Databases
ElasticSearchStaging Databases
ConsulStaging Support Services
VaultStaging Support Services
DeployStaging Support Services
LogStaging Logging
APIStaging Services
GitStaging Services
SidekiqStaging Services
WebStaging Services
RegistryStaging Services
StorageStaging Storage


The main portion of is hosted on Microsoft Azure. We have the following servers there.

Note that these numbers can fluctuate to adapt to the platform needs.

We also use availability sets to ensure that a minimum number of servers in each group are available at any given time. This ensures that Azure will not reboot all instances in the same availability set at the same time for anything that is planned.

All our servers run the latest Ubuntu LTS unless there is a specific need to do otherwise. Every server is configured with a fully fledged set of firewall rules for increased security.

Load Balancers

We utilize Azure load balancers in front of our HAProxy nodes. This allows us to leverage on the Azure infrastructure for HA as well as taking advantage of the power of HAProxy.

Additionally, we utilize an Azure load balancer to manage PostgreSQL failovers.

Service nodes

Different services have different resource utilization patterns so we use a variety of instance types across our service nodes that are consistent for each group. We have recently isolated traffic by type on dedicated pools of nodes. We hope you noticed the performance improvement.

Digital Ocean

Digital Ocean houses several servers that do not need to directly interact with our main infrastructure. There are many of these that do a variety of things, however not all will be listed here.

The primary things on Digital Ocean at this time are:


We host our DNS with route53 and we have several EC2 instances for various purposes. The servers you will interact with most are listed Below

Google Cloud

We are currently investigating Google Cloud.


See how it's doing, for more information on that, visit the monitoring handbook.

Technology at GitLab

We use a lot of cool (but boring) technologies here at GitLab. Below is a non-exhaustive list of tech we use here.