Dashboard - Devops (draft)
This document collects information on devops dashboard responsibilities. There is a strong overlap with release process and it would make sense to integrate this with a larger set of release documentation.
Links
AWS dashboard documentation | https://fioprotocol.atlassian.net/wiki/spaces/FO/pages/88539399 |
CI pipeline reporting | Discord > CORE-CHAIN > pipeline-notifications (https://discord.com/channels/509062983046135829/920004743135379527) |
Devops channel | Discord > OPERATIONS (INTERNAL) > devops (https://discord.com/channels/509062983046135829/979506033590403123) |
AWS Build Pipelines
EzOps is responsible for creating and maintaining the CI build pipelines, including the reporting of pipeline status. This currently consists of:
Development pipeline
Staging pipeline
Production pipeline
The status of pipeline builds is reported out to Discord.
Next steps:
Aleph One to detail latest release process and the related repositories and branches
Make sure the AWS dashboard documentation has appropriate documentation for all pipelines. If not, open Jira ticket with details
AWS Services
EzOps is responsible for setup and monitoring of services required by the Dashboard.
Next steps:
Aleph One team review the AWS dashboard infrastructure documentation
For any services that are not documented open a Jira task and detail the items that need additional documentation
Open Jira tickets for any additional (Discord or AWS UI) monitoring or reporting on services that would be helpful
AWS Servers
EzOps is responsible for setup and monitoring of backend and frontend development, testing, and production servers.
Next steps:
Aleph One team review the AWS dashboard infrastructure documentation
For any infrastructure that is not documented open a Jira task and detail the items that need additional documentation
Open Jira tickets for any additional (Discord or AWS UI) monitoring or reporting that would be helpful
Release
Based on a recent discussion we need to better define the responsibilities of EzOps during releases. This might include:
Monitoring pipelines for failure
Fixing issues found during deployment
Executing rollbacks in case of failure
Other?
Next steps:
I recommend the Aleph One team create a release checklist and clearly define the areas where devops is needed. This needs to distinguish between release management tasks and devops tasks. For example, this checklist is used by release management for chain releases: https://fioprotocol.atlassian.net/wiki/spaces/FD/pages/9601824
We can then review the checklist with EzOps and make sure they are okay with their assignments.
As part of this checklist, we should set up “release windows” where the EzOps team is available to support production releases.
Decide if we want to have an additional “staging” server (and pipeline) that is separate from the current staging Testnet server. If so, open a Jira story to track the setup of this server.