Concourse CI/CD for GDS Service
Reliability Engineering operates a continuous integration / continuous deployment (CI/CD) service based on [Concourse][concourse] for GDS tenants, and are responsible for the support and reliability of the service.
The service is known informally as the “shared concourse”, “multi-tenant concourse” or “big concourse” and can be accessed at the following URL with a valid github login.
Concourse is made up of two major components:
webcomponents (api, authentication, user interface, task scheduling)
workercomponents (job container execution runtime)
All components are deployed as EC2 instances in AWS.
For more details on the inner workings of these major components refer to the concourse internals documentation.
Concourse has a concept of teams to provide some level of isolation between pipeline execution.
In our deployment we give each team it’s own dedicated set of worker components.
For details of design decisions and history of the project please see the reliability-engineering ADR documentation
Deployment is described by terraform with changes continuously deployed on merge to master.
There are two terraform projects:
Each deployment has a deployment pipeline in its
main team that allows the
concourse to continuously deploy itself. These can be found at the following locations:
Both staging and production deployments are based on the terraform modules that can be found in tech-ops/reliability-engineering repository.
The Concourse only allows access from the GDS Internal IP Network which means you may need to be on VPN to access the UI.
The Automate team offer in hours support. Any issues out of hours will be recorded and handled during work hours.
Support process and tasks
- Keep interruptible documentation up to date
- Support users on the
- Maintain a secure and reliable service level
The most important events for our users are:
- their deployment pipelines are available
- their jobs are executing in a timely manor
We also consider the following important but not as critical for our users:
- access to the monitoring dashboards
- the ability to have changes to the scale of the team’s worker pool reviewed and merged promptly
Note, these lists may not be exclusive and are expected to change as our system develops.
Service Level Indicators and Objectives (SLIs and SLOs)
We need to define clear SLIs/SLOs for the concourse service, it is currently best effort.