7.2 KiB
Modules for Grafana alerts and dashboards
Alerting
Please check documentation about Grafana alerting here and official documentation for deeper look.
The Terraform modules are separated per resource type, check README in each module directory for spefic examples. Below is example for alerts using „Prometheus/Thanos“ datasource and sending notification to „Google Chat“.
Authentication
Set Grafana credentials as Terraform variables:
export TF_VAR_grafana_url="https://grafana.example.com"
export TF_VAR_grafana_username="admin"
export TF_VAR_grafana_password="super-secret"
These credentials are used by all modules to authenticate with the Grafana API.
Directory Structure
Organize alerts, templates, and Terraform code as follows:
.
├── alerts/
│ ├── common-infra/
│ │ ├── loki/
│ │ │ └── alert-loki.yaml
│ │ └── thanos/
│ │ └── alert-thanos.yaml
│ ├── oncall/
│ │ └── alert-oncall.yaml
│ └── heartbeats/
│ └── alert-heartbeat.yaml
├── templates/
│ └── myteam/
│ └── gchat-message.tmpl
└── main.tf
- Alerts: YAML files defining rule groups (
apiVersion: 1, groups: [...]). - Templates: Notification templates for Google Chat contact points.
- Terraform code: References modules and binds everything together.
Defining Secrets
Datasource URLs and credentials should be stored in Terraform variables, not hardcoded.
Example: Environment Variables
export TF_VAR_thanos_coin_prd_url="https://thanos.example.com"
export TF_VAR_thanos_coin_prd_user="reader"
export TF_VAR_thanos_coin_prd_pass="password"
export TF_VAR_loki_coin_prd_url="https://loki.example.com"
export TF_VAR_loki_coin_prd_user="reader"
export TF_VAR_loki_coin_prd_pass="password"
export TF_VAR_opsgenie_api_key="xxxxxx"
Module Usage
Datasources
Define multiple datasources (Prometheus, Loki, etc.) with unique keys for URL/username/password:
module "datasource" {
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/datasource?ref=main"
datasources = {
Thanos-Common-Infra-PRD = {
type = "prometheus"
url_key = "thanos_coin_prd"
basic_auth_user_key = "thanos_coin_prd"
pass_key = "thanos_coin_prd"
is_default = true
}
Loki-Common-Infra-PRD = {
type = "loki"
url_key = "loki_coin_prd"
basic_auth_user_key = "loki_coin_prd"
pass_key = "loki_coin_prd"
}
}
datasource_urls = {
thanos_coin_prd = var.thanos_coin_prd_url
loki_coin_prd = var.loki_coin_prd_url
}
datasource_users = {
thanos_coin_prd = var.thanos_coin_prd_user
loki_coin_prd = var.loki_coin_prd_user
}
datasource_passwords = {
thanos_coin_prd = var.thanos_coin_prd_pass
loki_coin_prd = var.loki_coin_prd_pass
}
}
Contact Points
Google Chat
Each Google Chat space is configured as a contact point:
module "gchat-contact-point-coin" {
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/contact-point-gchat?ref=main"
gchat_url = var.gchat_url_coin
contact_point_name = "gchat-coin"
templates_dir = "templates/coin"
template_prefix = "coin-"
disable_provenance = true
}
OpsGenie
OpsGenie contact points use API keys:
module "opsgenie-contact-point" {
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/contact-point-opsgenie?ref=main"
contact_point_name = "opsgenie-dev"
opsgenie_api_key = var.opsgenie_api_key
}
Alert Folders
Organize alerts in Grafana folders for logical separation:
module "alert-folder" {
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/alert-folder?ref=main"
alert-folder = "Common-Infra-Alerts"
}
Notification Policies
Map folders to contact points (e.g., send “Common-Infra-Alerts” to Google Chat):
module "notification-policy" {
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/notification-policy?ref=main"
default_contact_point_uid = module.gchat-contact-point-coin.contact_point_name
group_by = ["alertname"]
folder_policies = {
"Common-Infra-Alerts" = module.gchat-contact-point-coin.contact_point_name
"Common-Infra-OnCall-Alerts" = module.opsgenie-contact-point.contact_name
}
}
Alert Definitions
Alert rules are defined in YAML and applied via the module:
module "alerting-coin" {
source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/alerts?ref=main"
alerts_dir = "alerts/common-infra/thanos"
datasource_uid = module.datasource.datasource_uids["Thanos-Common-Infra-PRD"]
folder_uid = module.alert-folder.folder_uid
receiver = module.gchat-contact-point-coin.contact_point_name
disable_provenance = true
}
Alert YAML Format
Alerts are defined in YAML Grafana format. The easiest way to get example from scratch is to define alert in Grafana UI and then export it using „Export rules“ button. However, make sure to remove some fields which are not needed and provided by Terraform module logic automatically:
datasourceUid(defined withalertsmodule)notification_settings( defined withnotification policymodule)
Each file must have apiVersion: 1 and define groups:
apiVersion: 1
groups:
- name: infra-alerts
interval: 1m
rules:
- uid: pod-restart-alert
title: "Pod Restart Count High"
condition: C
data:
- refId: A
relativeTimeRange:
from: 600
to: 0
model:
expr: increase(kube_pod_container_status_restarts_total{}[5m]) > 3
instant: true
refId: A
- refId: C
datasourceUid: __expr__
model:
conditions:
- evaluator:
params: [0]
type: gt
operator:
type: and
query:
params: [C]
reducer:
type: last
type: query
expression: A
type: threshold
noDataState: OK
execErrState: Error
for: 5m
annotations:
description: "Pod is restarting too often"
Updating Alerts or Contact Points
- Add new YAML files under
alerts/for additional rules. - Add new modules in
main.tffor new datasources or contact points. - Run
terraform applyto sync changes to Grafana.
You can examples for alerts and templates in examples folder.