Dashboard - Devops (draft)

This document collects information on devops dashboard responsibilities. There is a strong overlap with release process and it would make sense to integrate this with a larger set of release documentation.

Links

AWS dashboard documentation

CI pipeline reporting

Discord > CORE-CHAIN > pipeline-notifications (https://discord.com/channels/509062983046135829/920004743135379527)

Devops channel

Discord > OPERATIONS (INTERNAL) > devops (https://discord.com/channels/509062983046135829/979506033590403123)

AWS Build Pipelines

EzOps is responsible for creating and maintaining the CI build pipelines, including the reporting of pipeline status. This currently consists of:

  • Development pipeline

  • Staging pipeline

  • Production pipeline

The status of pipeline builds is reported out to Discord.

Next steps:

  • Aleph One to detail latest release process and the related repositories and branches

  • Make sure the AWS dashboard documentation has appropriate documentation for all pipelines. If not, open Jira ticket with details

AWS Services

EzOps is responsible for setup and monitoring of services required by the Dashboard.

Next steps:

  • Aleph One team review the AWS dashboard infrastructure documentation

  • For any services that are not documented open a Jira task and detail the items that need additional documentation

  • Open Jira tickets for any additional (Discord or AWS UI) monitoring or reporting on services that would be helpful

AWS Servers

EzOps is responsible for setup and monitoring of backend and frontend development, testing, and production servers.

Next steps:

  • Aleph One team review the AWS dashboard infrastructure documentation

  • For any infrastructure that is not documented open a Jira task and detail the items that need additional documentation

  • Open Jira tickets for any additional (Discord or AWS UI) monitoring or reporting that would be helpful

Release

Based on a recent discussion we need to better define the responsibilities of EzOps during releases. This might include:

  • Monitoring pipelines for failure

  • Fixing issues found during deployment

  • Executing rollbacks in case of failure

  • Other?

Next steps:

  • I recommend the Aleph One team create a release checklist and clearly define the areas where devops is needed. This needs to distinguish between release management tasks and devops tasks. For example, this checklist is used by release management for chain releases:

  • We can then review the checklist with EzOps and make sure they are okay with their assignments.

  • As part of this checklist, we should set up “release windows” where the EzOps team is available to support production releases.

  • Decide if we want to have an additional “staging” server (and pipeline) that is separate from the current staging Testnet server. If so, open a Jira story to track the setup of this server.