# Modules for Grafana alerts and dashboards ## Alerting Please check documentation about Grafana alerting [here](https://itdoc.schwarz/x/X11nf) and [official documentation](https://grafana.com/docs/grafana/latest/alerting/) for deeper look. The Terraform modules are separated per resource type, check README in each module directory for spefic examples. Below is example for alerts using „**Prometheus/Thanos**“ datasource and sending notification to „**Google Chat**“. ## Authentication Set Grafana credentials as Terraform variables: ```bash export TF_VAR_grafana_url="https://grafana.example.com" export TF_VAR_grafana_username="admin" export TF_VAR_grafana_password="super-secret" ``` These credentials are used by all modules to authenticate with the Grafana API. --- ## Directory Structure Organize alerts, templates, and Terraform code as follows: ``` . ├── alerts/ │ ├── common-infra/ │ │ ├── loki/ │ │ │ └── alert-loki.yaml │ │ └── thanos/ │ │ └── alert-thanos.yaml │ ├── oncall/ │ │ └── alert-oncall.yaml │ └── heartbeats/ │ └── alert-heartbeat.yaml ├── templates/ │ └── myteam/ │ └── gchat-message.tmpl └── main.tf ``` - **Alerts**: YAML files defining rule groups (`apiVersion: 1, groups: [...]`). - **Templates**: Notification templates for Google Chat contact points. - **Terraform code**: References modules and binds everything together. --- ## Defining Secrets Datasource URLs and credentials should be stored in Terraform variables, not hardcoded. **Example: Environment Variables** ```bash export TF_VAR_thanos_coin_prd_url="https://thanos.example.com" export TF_VAR_thanos_coin_prd_user="reader" export TF_VAR_thanos_coin_prd_pass="password" export TF_VAR_loki_coin_prd_url="https://loki.example.com" export TF_VAR_loki_coin_prd_user="reader" export TF_VAR_loki_coin_prd_pass="password" export TF_VAR_opsgenie_api_key="xxxxxx" ``` --- ## Module Usage ### Datasources Define multiple datasources (Prometheus, Loki, etc.) with unique keys for URL/username/password: ```hcl module "datasource" { source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/datasource?ref=main" datasources = { Thanos-Common-Infra-PRD = { type = "prometheus" url_key = "thanos_coin_prd" basic_auth_user_key = "thanos_coin_prd" pass_key = "thanos_coin_prd" is_default = true } Loki-Common-Infra-PRD = { type = "loki" url_key = "loki_coin_prd" basic_auth_user_key = "loki_coin_prd" pass_key = "loki_coin_prd" } } datasource_urls = { thanos_coin_prd = var.thanos_coin_prd_url loki_coin_prd = var.loki_coin_prd_url } datasource_users = { thanos_coin_prd = var.thanos_coin_prd_user loki_coin_prd = var.loki_coin_prd_user } datasource_passwords = { thanos_coin_prd = var.thanos_coin_prd_pass loki_coin_prd = var.loki_coin_prd_pass } } ``` ### Contact Points **Google Chat** Each Google Chat space is configured as a contact point: ```hcl module "gchat-contact-point-coin" { source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/contact-point-gchat?ref=main" gchat_url = var.gchat_url_coin contact_point_name = "gchat-coin" templates_dir = "templates/coin" template_prefix = "coin-" disable_provenance = true } ``` **OpsGenie** OpsGenie contact points use API keys: ```hcl module "opsgenie-contact-point" { source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/contact-point-opsgenie?ref=main" contact_point_name = "opsgenie-dev" opsgenie_api_key = var.opsgenie_api_key } ``` ### Alert Folders Organize alerts in Grafana folders for logical separation: ```hcl module "alert-folder" { source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/alert-folder?ref=main" alert-folder = "Common-Infra-Alerts" } ``` ### Notification Policies Map folders to contact points (e.g., send “Common-Infra-Alerts” to Google Chat): ```hcl module "notification-policy" { source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/notification-policy?ref=main" default_contact_point_uid = module.gchat-contact-point-coin.contact_point_name group_by = ["alertname"] folder_policies = { "Common-Infra-Alerts" = module.gchat-contact-point-coin.contact_point_name "Common-Infra-OnCall-Alerts" = module.opsgenie-contact-point.contact_name } } ``` ### Alert Definitions Alert rules are defined in YAML and applied via the module: ```hcl module "alerting-coin" { source = "git::https://commerce-platform.git.onstackit.cloud/commerce-platform-public/terraform-modules//grafana/alerts?ref=main" alerts_dir = "alerts/common-infra/thanos" datasource_uid = module.datasource.datasource_uids["Thanos-Common-Infra-PRD"] folder_uid = module.alert-folder.folder_uid receiver = module.gchat-contact-point-coin.contact_point_name disable_provenance = true } ``` --- ## Alert YAML Format Alerts are defined in YAML Grafana format. The easiest way to get example from scratch is to define alert in Grafana UI and then export it using „Export rules“ button. However, make sure to remove some fields which are not needed and provided by Terraform module logic automatically: - `datasourceUid` (defined with `alerts` module) - `notification_settings` ( defined with `notification policy` module) Each file must have `apiVersion: 1` and define groups: ```yaml apiVersion: 1 groups: - name: infra-alerts interval: 1m rules: - uid: pod-restart-alert title: "Pod Restart Count High" condition: C data: - refId: A relativeTimeRange: from: 600 to: 0 model: expr: increase(kube_pod_container_status_restarts_total{}[5m]) > 3 instant: true refId: A - refId: C datasourceUid: __expr__ model: conditions: - evaluator: params: [0] type: gt operator: type: and query: params: [C] reducer: type: last type: query expression: A type: threshold noDataState: OK execErrState: Error for: 5m annotations: description: "Pod is restarting too often" ``` --- ## Updating Alerts or Contact Points - Add new YAML files under `alerts/` for additional rules. - Add new modules in `main.tf` for new datasources or contact points. - Run `terraform apply` to sync changes to Grafana. You can examples for alerts and templates in [examples](./examples/) folder.