252 lines
7.1 KiB
Markdown
252 lines
7.1 KiB
Markdown
# Modules for Grafana alerts and dashboards
|
|
<!-- TOC -->
|
|
|
|
|
|
## Alerting
|
|
|
|
Please check documentation about Grafana alerting [here](https://itdoc.schwarz/x/X11nf) and [official documentation](https://grafana.com/docs/grafana/latest/alerting/) for deeper look.
|
|
|
|
The Terraform modules are separated per resource type, check README in each module directory for spefic examples.
|
|
Below is example for alerts using „**Prometheus/Thanos**“ datasource and sending notification to „**Google Chat**“.
|
|
|
|
## Authentication
|
|
|
|
Set Grafana credentials as Terraform variables:
|
|
|
|
```bash
|
|
export TF_VAR_grafana_url="https://grafana.example.com"
|
|
export TF_VAR_grafana_username="admin"
|
|
export TF_VAR_grafana_password="super-secret"
|
|
```
|
|
|
|
These credentials are used by all modules to authenticate with the Grafana API.
|
|
|
|
---
|
|
|
|
## Directory Structure
|
|
|
|
Organize alerts, templates, and Terraform code as follows:
|
|
|
|
```
|
|
.
|
|
├── alerts/
|
|
│ ├── common-infra/
|
|
│ │ ├── loki/
|
|
│ │ │ └── alert-loki.yaml
|
|
│ │ └── thanos/
|
|
│ │ └── alert-thanos.yaml
|
|
│ ├── oncall/
|
|
│ │ └── alert-oncall.yaml
|
|
│ └── heartbeats/
|
|
│ └── alert-heartbeat.yaml
|
|
├── templates/
|
|
│ └── myteam/
|
|
│ └── gchat-message.tmpl
|
|
└── main.tf
|
|
```
|
|
|
|
- **Alerts**: YAML files defining rule groups (`apiVersion: 1, groups: [...]`).
|
|
- **Templates**: Notification templates for Google Chat contact points.
|
|
- **Terraform code**: References modules and binds everything together.
|
|
|
|
---
|
|
|
|
## Defining Secrets
|
|
|
|
Datasource URLs and credentials should be stored in Terraform variables, not hardcoded.
|
|
|
|
**Example: Environment Variables**
|
|
|
|
```bash
|
|
export TF_VAR_thanos_coin_prd_url="https://thanos.example.com"
|
|
export TF_VAR_thanos_coin_prd_user="reader"
|
|
export TF_VAR_thanos_coin_prd_pass="password"
|
|
|
|
export TF_VAR_loki_coin_prd_url="https://loki.example.com"
|
|
export TF_VAR_loki_coin_prd_user="reader"
|
|
export TF_VAR_loki_coin_prd_pass="password"
|
|
|
|
export TF_VAR_opsgenie_api_key="xxxxxx"
|
|
```
|
|
|
|
---
|
|
|
|
## Module Usage
|
|
|
|
### Datasources
|
|
|
|
Define multiple datasources (Prometheus, Loki, etc.) with unique keys for URL/username/password:
|
|
|
|
```hcl
|
|
module "datasource" {
|
|
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/datasource?ref=main"
|
|
|
|
datasources = {
|
|
Thanos-Common-Infra-PRD = {
|
|
type = "prometheus"
|
|
url_key = "thanos_coin_prd"
|
|
user_key = "thanos_coin_prd"
|
|
pass_key = "thanos_coin_prd"
|
|
is_default = true
|
|
}
|
|
Loki-Common-Infra-PRD = {
|
|
type = "loki"
|
|
url_key = "loki_coin_prd"
|
|
user_key = "loki_coin_prd"
|
|
pass_key = "loki_coin_prd"
|
|
}
|
|
}
|
|
|
|
datasource_urls = {
|
|
thanos_coin_prd = var.thanos_coin_prd_url
|
|
loki_coin_prd = var.loki_coin_prd_url
|
|
}
|
|
|
|
datasource_users = {
|
|
thanos_coin_prd = var.thanos_coin_prd_user
|
|
loki_coin_prd = var.loki_coin_prd_user
|
|
}
|
|
|
|
datasource_passwords = {
|
|
thanos_coin_prd = var.thanos_coin_prd_pass
|
|
loki_coin_prd = var.loki_coin_prd_pass
|
|
}
|
|
}
|
|
```
|
|
|
|
### Contact Points
|
|
|
|
**Google Chat**
|
|
|
|
Each Google Chat space is configured as a contact point:
|
|
|
|
```hcl
|
|
module "gchat-contact-point-coin" {
|
|
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/contact-point-gchat?ref=main"
|
|
gchat_url = var.gchat_url_coin
|
|
contact_point_name = "gchat-coin"
|
|
templates_dir = "templates/coin"
|
|
template_prefix = "coin-"
|
|
disable_provenance = true
|
|
}
|
|
```
|
|
|
|
**OpsGenie**
|
|
|
|
OpsGenie contact points use API keys:
|
|
|
|
```hcl
|
|
module "opsgenie-contact-point" {
|
|
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/contact-point-opsgenie?ref=main"
|
|
contact_point_name = "opsgenie-dev"
|
|
opsgenie_api_key = var.opsgenie_api_key
|
|
}
|
|
```
|
|
|
|
### Alert Folders
|
|
|
|
Organize alerts in Grafana folders for logical separation:
|
|
|
|
```hcl
|
|
module "alert-folder" {
|
|
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/alert-folder?ref=main"
|
|
alert-folder = "Common-Infra-Alerts"
|
|
}
|
|
```
|
|
|
|
### Notification Policies
|
|
|
|
Map folders to contact points (e.g., send “Common-Infra-Alerts” to Google Chat):
|
|
|
|
```hcl
|
|
module "notification-policy" {
|
|
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/notification-policy?ref=main"
|
|
default_contact_point_uid = module.gchat-contact-point-coin.contact_point_name
|
|
group_by = ["alertname"]
|
|
|
|
folder_policies = {
|
|
"Common-Infra-Alerts" = module.gchat-contact-point-coin.contact_point_name
|
|
"Common-Infra-OnCall-Alerts" = module.opsgenie-contact-point.contact_name
|
|
}
|
|
}
|
|
```
|
|
|
|
### Alert Definitions
|
|
|
|
Alert rules are defined in YAML and applied via the module:
|
|
|
|
```hcl
|
|
module "alerting-coin" {
|
|
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/alerts?ref=main"
|
|
alerts_dir = "alerts/common-infra/thanos"
|
|
datasource_uid = module.datasource.datasource_uids["Thanos-Common-Infra-PRD"]
|
|
folder_uid = module.alert-folder.folder_uid
|
|
receiver = module.gchat-contact-point-coin.contact_point_name
|
|
disable_provenance = true
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Alert YAML Format
|
|
|
|
Alerts are defined in YAML Grafana format. The easiest way to get example from scratch is to define alert in Grafana UI and then export it using „Export rules“ button.
|
|
However, make sure to remove some fields which are not needed and provided by Terraform module logic automatically:
|
|
|
|
- `datasourceUid` (defined with `alerts` module)
|
|
- `notification_settings` ( defined with `notification policy` module)
|
|
|
|
Each file must have `apiVersion: 1` and define groups:
|
|
|
|
```yaml
|
|
apiVersion: 1
|
|
groups:
|
|
- name: infra-alerts
|
|
interval: 1m
|
|
rules:
|
|
- uid: pod-restart-alert
|
|
title: "Pod Restart Count High"
|
|
condition: C
|
|
data:
|
|
- refId: A
|
|
relativeTimeRange:
|
|
from: 600
|
|
to: 0
|
|
model:
|
|
expr: increase(kube_pod_container_status_restarts_total{}[5m]) > 3
|
|
instant: true
|
|
refId: A
|
|
- refId: C
|
|
datasourceUid: __expr__
|
|
model:
|
|
conditions:
|
|
- evaluator:
|
|
params: [0]
|
|
type: gt
|
|
operator:
|
|
type: and
|
|
query:
|
|
params: [C]
|
|
reducer:
|
|
type: last
|
|
type: query
|
|
expression: A
|
|
type: threshold
|
|
noDataState: OK
|
|
execErrState: Error
|
|
for: 5m
|
|
annotations:
|
|
description: "Pod is restarting too often"
|
|
```
|
|
|
|
---
|
|
|
|
|
|
## Updating Alerts or Contact Points
|
|
|
|
- Add new YAML files under `alerts/` for additional rules.
|
|
- Add new modules in `main.tf` for new datasources or contact points.
|
|
- Run `terraform apply` to sync changes to Grafana.
|
|
|
|
|
|
You can examples for alerts and templates in [examples](./examples/) folder.
|