terraform-modules/grafana/README.md
2025-08-26 18:38:36 +02:00

252 lines
7.2 KiB
Markdown

# Modules for Grafana alerts and dashboards
<!-- TOC -->
## Alerting
Please check documentation about Grafana alerting [here](https://itdoc.schwarz/x/X11nf) and [official documentation](https://grafana.com/docs/grafana/latest/alerting/) for deeper look.
The Terraform modules are separated per resource type, check README in each module directory for spefic examples.
Below is example for alerts using „**Prometheus/Thanos**“ datasource and sending notification to „**Google Chat**“.
## Authentication
Set Grafana credentials as Terraform variables:
```bash
export TF_VAR_grafana_url="https://grafana.example.com"
export TF_VAR_grafana_username="admin"
export TF_VAR_grafana_password="super-secret"
```
These credentials are used by all modules to authenticate with the Grafana API.
---
## Directory Structure
Organize alerts, templates, and Terraform code as follows:
```
.
├── alerts/
│ ├── common-infra/
│ │ ├── loki/
│ │ │ └── alert-loki.yaml
│ │ └── thanos/
│ │ └── alert-thanos.yaml
│ ├── oncall/
│ │ └── alert-oncall.yaml
│ └── heartbeats/
│ └── alert-heartbeat.yaml
├── templates/
│ └── myteam/
│ └── gchat-message.tmpl
└── main.tf
```
- **Alerts**: YAML files defining rule groups (`apiVersion: 1, groups: [...]`).
- **Templates**: Notification templates for Google Chat contact points.
- **Terraform code**: References modules and binds everything together.
---
## Defining Secrets
Datasource URLs and credentials should be stored in Terraform variables, not hardcoded.
**Example: Environment Variables**
```bash
export TF_VAR_thanos_coin_prd_url="https://thanos.example.com"
export TF_VAR_thanos_coin_prd_user="reader"
export TF_VAR_thanos_coin_prd_pass="password"
export TF_VAR_loki_coin_prd_url="https://loki.example.com"
export TF_VAR_loki_coin_prd_user="reader"
export TF_VAR_loki_coin_prd_pass="password"
export TF_VAR_opsgenie_api_key="xxxxxx"
```
---
## Module Usage
### Datasources
Define multiple datasources (Prometheus, Loki, etc.) with unique keys for URL/username/password:
```hcl
module "datasource" {
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/datasource?ref=main"
datasources = {
Thanos-Common-Infra-PRD = {
type = "prometheus"
url_key = "thanos_coin_prd"
basic_auth_user_key = "thanos_coin_prd"
pass_key = "thanos_coin_prd"
is_default = true
}
Loki-Common-Infra-PRD = {
type = "loki"
url_key = "loki_coin_prd"
basic_auth_user_key = "loki_coin_prd"
pass_key = "loki_coin_prd"
}
}
datasource_urls = {
thanos_coin_prd = var.thanos_coin_prd_url
loki_coin_prd = var.loki_coin_prd_url
}
datasource_users = {
thanos_coin_prd = var.thanos_coin_prd_user
loki_coin_prd = var.loki_coin_prd_user
}
datasource_passwords = {
thanos_coin_prd = var.thanos_coin_prd_pass
loki_coin_prd = var.loki_coin_prd_pass
}
}
```
### Contact Points
**Google Chat**
Each Google Chat space is configured as a contact point:
```hcl
module "gchat-contact-point-coin" {
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/contact-point-gchat?ref=main"
gchat_url = var.gchat_url_coin
contact_point_name = "gchat-coin"
templates_dir = "templates/coin"
template_prefix = "coin-"
disable_provenance = true
}
```
**OpsGenie**
OpsGenie contact points use API keys:
```hcl
module "opsgenie-contact-point" {
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/contact-point-opsgenie?ref=main"
contact_point_name = "opsgenie-dev"
opsgenie_api_key = var.opsgenie_api_key
}
```
### Alert Folders
Organize alerts in Grafana folders for logical separation:
```hcl
module "alert-folder" {
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/alert-folder?ref=main"
alert-folder = "Common-Infra-Alerts"
}
```
### Notification Policies
Map folders to contact points (e.g., send “Common-Infra-Alerts” to Google Chat):
```hcl
module "notification-policy" {
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/notification-policy?ref=main"
default_contact_point_uid = module.gchat-contact-point-coin.contact_point_name
group_by = ["alertname"]
folder_policies = {
"Common-Infra-Alerts" = module.gchat-contact-point-coin.contact_point_name
"Common-Infra-OnCall-Alerts" = module.opsgenie-contact-point.contact_name
}
}
```
### Alert Definitions
Alert rules are defined in YAML and applied via the module:
```hcl
module "alerting-coin" {
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/alerts?ref=main"
alerts_dir = "alerts/common-infra/thanos"
datasource_uid = module.datasource.datasource_uids["Thanos-Common-Infra-PRD"]
folder_uid = module.alert-folder.folder_uid
receiver = module.gchat-contact-point-coin.contact_point_name
disable_provenance = true
}
```
---
## Alert YAML Format
Alerts are defined in YAML Grafana format. The easiest way to get example from scratch is to define alert in Grafana UI and then export it using „Export rules“ button.
However, make sure to remove some fields which are not needed and provided by Terraform module logic automatically:
- `datasourceUid` (defined with `alerts` module)
- `notification_settings` ( defined with `notification policy` module)
Each file must have `apiVersion: 1` and define groups:
```yaml
apiVersion: 1
groups:
- name: infra-alerts
interval: 1m
rules:
- uid: pod-restart-alert
title: "Pod Restart Count High"
condition: C
data:
- refId: A
relativeTimeRange:
from: 600
to: 0
model:
expr: increase(kube_pod_container_status_restarts_total{}[5m]) > 3
instant: true
refId: A
- refId: C
datasourceUid: __expr__
model:
conditions:
- evaluator:
params: [0]
type: gt
operator:
type: and
query:
params: [C]
reducer:
type: last
type: query
expression: A
type: threshold
noDataState: OK
execErrState: Error
for: 5m
annotations:
description: "Pod is restarting too often"
```
---
## Updating Alerts or Contact Points
- Add new YAML files under `alerts/` for additional rules.
- Add new modules in `main.tf` for new datasources or contact points.
- Run `terraform apply` to sync changes to Grafana.
You can examples for alerts and templates in [examples](./examples/) folder.